# VM This is an outline of the VM that drives this language. # Primitives * Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE. * Addresses point to single bytes. * Signed numbers use two's complement. | Type | Size (bits) | | - | - | | Address | 64 | | Word | 64 | | Halfword | 32 | | Byte | 8 | # Registers CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide. * IP - Instruction pointer * SP - Stack pointer * FP - Frame pointer * FLAGS - CPU flags * (9 unused registers) * STATUS - Generic status code * R0-R49 ## CPU Flags CPU flags are addressed by bit index, going from right to left. * `00` - Halt flag * `01` - Compare flag ### Flag ideas * "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired behavior - for debugging * Overwriting a register without its value being used * Mixing arithmetic with bit twiddling on the same target ## Register ideas * NULL - a register that will always be zero for reading and will not change after writing. * Other possible names: Z, NIL # Instructions ## Arithmetic Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0. * Add * **Params**: REG1, REG2 * `REG1 = REG1 + REG2` * Unsigned addition * Mul * **Params**: REG1, REG2 * `REG1 = REG1 * REG2` * Unsigned multiplication * Div * **Params**: REG1, REG2 * `REG1 = REG1 / REG2` * Unsigned division * Mod * **Params**: REG1, REG2 * `REG1 = REG1 % REG2` (exact semantics TBD) * INeg * **Params**: REG1 * `REG1 = REG1 * -1` * Signed negative * And * **Params**: REG1, REG2 * `REG1 = REG1 & REG2` * Or * **Params**: REG1, REG2 * `REG1 = REG1 | REG2` * Inv * **Params**: REG1 * `REG1 = ~REG1` * Not * **Params**: REG1 * ``` if REG1 == 0 { REG1 = 0; } else { REG1 = 1; } ``` * Boolean NOT; equivalent of C's `!` unary operator * Xor * **Params**: REG1, REG2 * `REG1 = REG1 ^ REG2` * Shl * **Params**: REG1, REG2 * `REG1 = REG1 << REG2` * Shr * **Params**: REG1, REG2 * `REG1 = REG1 >> REG2` * Does not sign extend ### TODO * Add signed instructions (iadd, imul, etc) * Sign-extending SHR * Overflow flag? ## Control flow * CmpEq * **Params**: REG1, REG2 * ``` if REG1 == REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; } ``` * Sets the COMPARE flag to 1 if REG1 == REG2 * CmpLt * **Params**: REG1, REG2 * ``` if REG1 < REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; } ``` * Sets the COMPARE flag to 1 if REG1 < REG2 * Jz * **Params**: REG1 * ``` if FLAGS[1] == 0 { IP = REG1; } ``` * Jumps to the address in REG1 if COMPARE flag is 0. * Jnz * **Params**: REG1 * ``` if FLAGS[1] != 0 { IP = REG1; } ``` * Jumps to the address in REG1 if COMPARE flag is 1. ## Data movement * Load * **Params**: REG1, REG2 * ``` REG1 = MEM[REG2]; ``` * Sets REG1 to the value at the memory address in REG2. * Store * **Params**: REG1, REG2 * ``` MEM[REG2] = REG1; ``` * Sets the value at the memory address in REG2 to the value in REG1. * StoreImm32 * **Params**: REG1, IMM_32 * `REG1 = IMM_32` * Sets REG1 to the specified 32-bit number. * MemCopy * **Params**: REG1, REG2 * `MEM[REG1] = MEM[REG2]` * Copies the value at the memory address in REG2 to the memory address in REG1. * RegCopy * **Params**: REG1, REG2 * `REG1 = REG2` * Copies the value in REG2 into REG1. ## Miscellaneous * Halt * **Params**: (none) * `FLAGS[0] = 1` * Halts the machine * Nop * **Params**: (none) * Does nothing ## Other instructions TODO * Call * Takes address and number of bytes on the stack that are for args(?) * Updates SP, FP, IP, storing previous values starting at the new FP * Ret * Uses FP to determine previous SP, FP, and IP and restores them * Push * Pop * More immediate stores? # Binary format The binary format is composed of a header followed by sections that make up the content of the blob. ## Header The header is composed of: * 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11). * 16 bits - Version of the file * 16 bits - The number of sections in the file * 32 bits - Unused * section descriptions detailed below Total length: 128 bits ## Sections The rest of the content is a list of sections. A section's layout is a section header, followed by the section contents. ### Section header * 8 bits - Section kind * 0x00 - Data * 0x10 - Code * 0xFF - Meta * 24 bits - Unused * 32 bits - Checksum of the section * 64 bits - Length of the section Total length: 128 bits ### Data section The data section contains static data that is initialized to some known value. * 64 bits - load location - where in memory the contents of this section are put. ### Code section The code section contains executable code. * 64 bits - load location - where in memory the contents of this section are put. The remaining length of the section is the code itself. ### Meta section The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded. * 64 bits - the number of key-value entries The remaining length of the section are the key-value pairs. The layout for a key-value pair is the key, followed immediately by the value. The key is always a string, and the value may be any type of data. A key starts with the length of the string, followed by the key string itself. A value starts with the length of the data, followed by the value data itself. The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory. A VM must provide support for the following meta-values: * `entry` - a 64-bit address for where the VM should begin executing code. # General TODO * Interrupts * MMIO regions * Execution pipeline * Helps to define when certain side effects happen (e.g. when the IP increments) * Paging?