14 KiB
VM
This is an outline of the VM that drives this language.
Primitives
- Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
- Addresses point to single bytes.
- Signed numbers use two's complement.
| Type | Size (bits) |
|---|---|
| Address | 64 |
| Word | 64 |
| Halfword | 32 |
| Byte | 8 |
Registers
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
- IP - Instruction pointer
- SP - Stack pointer
- FP - Frame pointer
- FLAGS - CPU flags
- (9 unused registers)
- STATUS - Generic status code
- R0-R49
CPU Flags
CPU flags are addressed by bit index, going from right to left.
00- Halt flag01- Compare flag
Flag ideas
- "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
behavior - for debugging
- Overwriting a register without its value being used
- Mixing arithmetic with bit twiddling on the same target
Register ideas
- NULL - a register that will always be zero for reading and will not change after writing.
- Other possible names: Z, NIL
Instructions
Instructions attempt to be as small as possible while conforming to 8-bit, 16-bit, 32-bit, or 64-bit alignment. All instructions have 16-bit opcodes.
Arithmetic
Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0.
- Add
- Opcode: 0x0000
- Params: REG1, REG2
REG1 = REG1 + REG2- Unsigned addition
-
- Mul
- Opcode: 0x0001
- Params: REG1, REG2
REG1 = REG1 * REG2- Unsigned multiplication
-
- Div
- Opcode: 0x0002
- Params: REG1, REG2
REG1 = REG1 / REG2- Unsigned division
-
- Mod
- Opcode: 0x0003
- Params: REG1, REG2
REG1 = REG1 % REG2(exact semantics TBD)-
- INeg
- Opcode: 0x0004
- Params: REG1
REG1 = REG1 * -1- Signed negative
-
- And
- Opcode: 0x0005
- Params: REG1, REG2
REG1 = REG1 & REG2-
- Or
- Opcode: 0x0006
- Params: REG1, REG2
REG1 = REG1 | REG2-
- Inv
- Opcode: 0x0007
- Params: REG1
REG1 = ~REG1-
- Not
- Opcode: 0x0008
- Params: REG1
-
* Boolean NOT; equivalent of C's `!` unary operator * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0000000000001000 | ...... | XXXXXXXXXX | +----------------------------------------+ - Xor
- Opcode: 0x0009
- Params: REG1, REG2
REG1 = REG1 ^ REG2-
- Shl
- Opcode: 0x000A
- Params: REG1, REG2
REG1 = REG1 << REG2-
- Shr
- Opcode: 0x000B
- Params: REG1, REG2
REG1 = REG1 >> REG2- Does not sign extend
-
TODO
- Add signed instructions (iadd, imul, etc)
- Sign-extending SHR
- Overflow flag?
Control flow
-
CmpEq
- Opcode: 0x1000
- Params: REG1, REG2
-
if REG1 == REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; }
* Sets the COMPARE flag to 1 if REG1 == REG2 * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0001000000000000 | ...... | ...... | XXXX | +-------------------------------------------+ -
CmpLt
- Opcode: 0x1001
- Params: REG1, REG2
-
if REG1 < REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; }
* Sets the COMPARE flag to 1 if REG1 < REG2 * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0001000000000001 | ...... | ...... | XXXX | +-------------------------------------------+ -
Jmp
- Opcode: 0x1100
- Params: REG1
IP = REG1;- Jumps to the address in REG1 unconditionally.
-
32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000000000000 | ...... | XXXXXXXXXX | +----------------------------------------+
-
Jz
- Opcode: 0x1101
- Params: REG1
-
if FLAGS[1] == 0 { IP = REG1; }
* Jumps to the address in REG1 if COMPARE flag is 0. * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000000000001 | ...... | XXXXXXXXXX | +----------------------------------------+ -
Jnz
- Opcode: 0x1002
- Params: REG1
-
if FLAGS[1] != 0 { IP = REG1; }
* Jumps to the address in REG1 if COMPARE flag is 1. * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000000000002 | ...... | XXXXXXXXXX | +----------------------------------------+
Data movement
- Load
- Opcode: 0x2000
- Params: REG1, REG2
-
* Sets REG1 to the value at the memory address in REG2. * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010000000000000 | ...... | ...... | XXXX | +-------------------------------------------+ - RegCopy
- Opcode: 0x2001
- Params: REG1, REG2
REG1 = REG2- Copies the value in REG2 into REG1.
-
- StoreImm64
- Opcode: 0x2100
- Params: REG1, IMM_64
REG1 = IMM_64- Sets REG1 to the specified 64-bit number.
- StoreImm32
- Opcode: 0x2101
- Params: REG1, IMM_32
REG1 = IMM_32- Sets REG1 to the specified 32-bit number.
-
- MemCopy
- Opcode: 0x2200
- Params: REG1, REG2
MEM[REG1] = MEM[REG2]- Copies the value at the memory address in REG2 to the memory address in REG1.
-
- Store
- Opcode: 0x2201
- Params: REG1, REG2
-
* Sets the value at the memory address in REG2 to the value in REG1. * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010001000000001 | REG1.. | REG2.. | XXXX | +-------------------------------------------+
Miscellaneous
- Halt
- Opcode: 0xF000
- Params: (none)
FLAGS[0] = 1- Halts the machine
-
- Nop
- Opcode: 0xF001
- Params: (none)
- Does nothing
-
Other instructions TODO
- Call
- Takes address and number of bytes on the stack that are for args(?)
- Updates SP, FP, IP, storing previous values starting at the new FP
- Ret
- Uses FP to determine previous SP, FP, and IP and restores them
- Push
- Pop
- More immediate stores?
- Idea: Store42 (or whatever number of bits) that maximizes the usage of a 64-bit instruction
Binary object format
The binary object format is composed of a header followed by sections that make up the content of the object.
Header
The header is composed of:
- 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
- 16 bits - Version of the file
- 16 bits - The number of sections in the file
- 32 bits - Unused
- section descriptions detailed below
Total length: 128 bits
Sections
The rest of the object is a list of sections. A section's layout is a section header, followed by the section contents.
Section header
- 8 bits - Section kind
- 0x00 - Data
- 0x10 - Code
- 0xFF - Meta
- 24 bits - Unused
- 32 bits - Checksum of the section
- 64 bits - Length of the section
Total length: 128 bits
Data section
The data section contains static data that is initialized to some known value.
- 64 bits - load location - where in memory the contents of this section are put.
Code section
The code section contains executable code.
- 64 bits - load location - where in memory the contents of this section are put.
The remaining length of the section is the code itself.
Meta section
The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded.
- 64 bits - the number of key-value entries
The remaining length of the section are the key-value pairs.
The layout for a key-value pair is the key, followed immediately by the value. The key is always a string, and the value may be any type of data. A key starts with the length of the string, followed by the key string itself. A value starts with the length of the data, followed by the value data itself.
The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory.
A VM must provide support for the following meta-values:
entry- a 64-bit address for where the VM should begin executing code.
General TODO
- Interrupts
- MMIO regions
- Execution pipeline
- Helps to define when certain side effects happen (e.g. when the IP increments)
- Paging?