30.3. Syntax

The assembly syntax is supposed to be upward compatible with that described in Sections 1.3 and 1.4 of The Art of Computer Programming, Volume 1. Draft versions of those chapters as well as other MMIX information is located at http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html. Most code examples from the mmixal package located there should work unmodified when assembled and linked as single files, with a few noteworthy exceptions (Section 30.4 Differences to mmixal).

Before an instruction is emitted, the current location is aligned to the next four-byte boundary. If a label is defined at the beginning of the line, its value will be the aligned value.

In addition to the traditional hex-prefix 0x, a hexadecimal number can also be specified by the prefix character #.

After all operands to an MMIX instruction or directive have been specified, the rest of the line is ignored, treated as a comment.

30.3.1. Special Characters

The characters * and # are line comment characters; each start a comment at the beginning of a line, but only at the beginning of a line. A # prefixes a hexadecimal number if found elsewhere on a line.

Two other characters, % and !, each start a comment anywhere on the line. Thus you can't use the modulus and not operators in expressions normally associated with these two characters.

A ; is a line separator, treated as a new-line, so separate instructions can be specified on a single line.

30.3.2. Symbols

The character : is permitted in identifiers. There are two exceptions to it being treated as any other symbol character: if a symbol begins with :, it means that the symbol is in the global namespace and that the current prefix should not be prepended to that symbol. The : is then not considered part of the symbol. For a symbol in the label position (first on a line), a : at the end of a symbol is silently stripped off. A label is permitted, but not required, to be followed by a :, as with many other assembly formats.

The character @ in an expression, is a synonym for ., the current location.

In addition to the common forward and backward local symbol formats (Section 6.3 Symbol Names), they can be specified with upper-case B and F, as in 8B and 9F. A local label defined for the current position is written with a H appended to the number:
3H LDB $0,$1,2
This and traditional local-label formats cannot be mixed: a label must be defined and referred to using the same format.

There's a minor caveat: just as for the ordinary local symbols, the local symbols are translated into ordinary symbols using control characters are to hide the ordinal number of the symbol. Unfortunately, these symbols are not translated back in error messages. Thus you may see confusing error messages when local symbols are used. Control characters \003 (control-C) and \004 (control-D) are used for the MMIX-specific local-symbol syntax.

The symbol Main is handled specially; it is always global.

By defining the symbols __.MMIX.start..text and __.MMIX.start..data, the address of respectively the .text and .data segments of the final program can be defined, though when linking more than one object file, the code or data in the object file containing the symbol is not guaranteed to be start at that position; just the final executable.

30.3.3. Register names

Local and global registers are specified as $0 to $255. The recognized special register names are rJ, rA, rB, rC, rD, rE, rF, rG, rH, rI, rK, rL, rM, rN, rO, rP, rQ, rR, rS, rT, rU, rV, rW, rX, rY, rZ, rBB, rTT, rWW, rXX, rYY and rZZ. A leading : is optional for special register names.

Local and global symbols can be equated to register names and used in place of ordinary registers.

Similarly for special registers, local and global symbols can be used. Also, symbols equated from numbers and constant expressions are allowed in place of a special register, except when either of the options -no-predefined-syms and -fixed-special-register-names are specified. Then only the special register names above are allowed for the instructions having a special register operand; GET and PUT.

30.3.4. Assembler Directives

LOC

The LOC directive sets the current location to the value of the operand field, which may include changing sections. If the operand is a constant, the section is set to either .data if the value is 0x2000000000000000 or larger, else it is set to .text. Within a section, the current location may only be changed to monotonically higher addresses. A LOC expression must be a previously defined symbol or a "pure" constant.

An example, which sets the label prev to the current location, and updates the current location to eight bytes forward:
prev LOC @+8

When a LOC has a constant as its operand, a symbol __.MMIX.start..text or __.MMIX.start..data is defined depending on the address as mentioned above. Each such symbol is interpreted as special by the linker, locating the section at that address. Note that if multiple files are linked, the first object file with that section will be mapped to that address (not necessarily the file with the LOC definition).

LOCAL

Example:
 LOCAL external_symbol
 LOCAL 42
 .local asymbol

This directive-operation generates a link-time assertion that the operand does not correspond to a global register. The operand is an expression that at link-time resolves to a register symbol or a number. A number is treated as the register having that number. There is one restriction on the use of this directive: the pseudo-directive must be placed in a section with contents, code or data.

IS

The IS directive:
asymbol IS an_expression
sets the symbol asymbol to an_expression. A symbol may not be set more than once using this directive. Local labels may be set using this directive, for example:
5H IS @+4

GREG

This directive reserves a global register, gives it an initial value and optionally gives it a symbolic name. Some examples:

areg GREG
breg GREG data_value
     GREG data_buffer
     .greg creg, another_data_value

The symbolic register name can be used in place of a (non-special) register. If a value isn't provided, it defaults to zero. Unless the option -no-merge-gregs is specified, non-zero registers allocated with this directive may be eliminated by as; another register with the same value used in its place. Any of the instructions CSWAP, GO, LDA, LDBU, LDB, LDHT, LDOU, LDO, LDSF, LDTU, LDT, LDUNC, LDVTS, LDWU, LDW, PREGO, PRELD, PREST, PUSHGO, STBU, STB, STCO, STHT, STOU, STSF, STTU, STT, STUNC, SYNCD, SYNCID, can have a value nearby an initial value in place of its second and third operands. Here, "nearby" is defined as within the range 0…255 from the initial value of such an allocated register.

buffer1 BYTE 0,0,0,0,0
buffer2 BYTE 0,0,0,0,0
 …
 GREG buffer1
 LDOU $42,buffer2
In the example above, the Y field of the LDOUI instruction (LDOU with a constant Z) will be replaced with the global register allocated for buffer1, and the Z field will have the value 5, the offset from buffer1 to buffer2. The result is equivalent to this code:
buffer1 BYTE 0,0,0,0,0
buffer2 BYTE 0,0,0,0,0
 …
tmpreg GREG buffer1
 LDOU $42,tmpreg,(buffer2-buffer1)

Global registers allocated with this directive are allocated in order higher-to-lower within a file. Other than that, the exact order of register allocation and elimination is undefined. For example, the order is undefined when more than one file with such directives are linked together. With the options -x and -linker-allocated-gregs, GREG directives for two-operand cases like the one mentioned above can be omitted. Sufficient global registers will then be allocated by the linker.

BYTE

The BYTE directive takes a series of operands separated by a comma. If an operand is a string (Section 4.6.1.1 Strings), each character of that string is emitted as a byte. Other operands must be constant expressions without forward references, in the range 0…255. If you need operands having expressions with forward references, use .byte (Section 8.7 .byte expressions). An operand can be omitted, defaulting to a zero value.

WYDE, TETRA, OCTA

The directives WYDE, TETRA and OCTA emit constants of two, four and eight bytes size respectively. Before anything else happens for the directive, the current location is aligned to the respective constant-size boundary. If a label is defined at the beginning of the line, its value will be that after the alignment. A single operand can be omitted, defaulting to a zero value emitted for the directive. Operands can be expressed as strings (Section 4.6.1.1 Strings), in which case each character in the string is emitted as a separate constant of the size indicated by the directive.

PREFIX

The PREFIX directive sets a symbol name prefix to be prepended to all symbols (except local symbols, Section 30.3.2 Symbols), that are not prefixed with :, until the next PREFIX directive. Such prefixes accumulate. For example,
 PREFIX a
 PREFIX b
c IS 0
defines a symbol abc with the value 0.

BSPEC, ESPEC

A pair of BSPEC and ESPEC directives delimit a section of special contents (without specified semantics). Example:
 BSPEC 42
 TETRA 1,2,3
 ESPEC
The single operand to BSPEC must be number in the range 0…255. The BSPEC number 80 is used by the GNU binutils implementation.