Major changes inlude: * Bit the bullet and now instructions have their length hard-coded * Move from_utf8 object parsing to be done by their objects (instead of a Parser god object) * A list of AST sections are assembled into an Object using the new vm::obj::assemble module. * Changed the object layout some in the spec, and adjusted code to match this. Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
14 KiB
VM
This is an outline of the VM that drives this language.
Primitives
- Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
- Addresses point to single bytes.
- Signed numbers use two's complement.
| Type | Size (bits) |
|---|---|
| Address | 64 |
| Word | 64 |
| Halfword | 32 |
| Byte | 8 |
Registers
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
- IP - Instruction pointer
- SP - Stack pointer
- FP - Frame pointer
- FLAGS - CPU flags
- NULL - Always zero for reading and will never change after writing.
- (8 unused registers)
- STATUS - Generic status code
- R0-R49
CPU Flags
CPU flags are addressed by bit index, going from right to left.
00- Halt flag01- Compare flag
Flag ideas
- "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
behavior - for debugging
- Overwriting a register without its value being used
- Mixing arithmetic with bit twiddling on the same target
Register ideas
* Other possible names: Z, NIL
Instructions
Instructions attempt to be as small as possible while conforming to 8-bit, 16-bit, 32-bit, or 64-bit alignment. All instructions have 16-bit opcodes.
Arithmetic
Arithmetic instructions store their result in the first register specified. Overflow is handled by wrapping around to 0.
- Add
- Opcode: 0x0000
- Params: REG1, REG2
REG1 = REG1 + REG2- Unsigned addition
-
- Mul
- Opcode: 0x0001
- Params: REG1, REG2
REG1 = REG1 * REG2- Unsigned multiplication
-
- Div
- Opcode: 0x0002
- Params: REG1, REG2
REG1 = REG1 / REG2- Unsigned division
-
- Mod
- Opcode: 0x0003
- Params: REG1, REG2
REG1 = REG1 % REG2(exact semantics TBD)-
- INeg
- Opcode: 0x0004
- Params: REG1
REG1 = REG1 * -1- Signed negative
-
- And
- Opcode: 0x0005
- Params: REG1, REG2
REG1 = REG1 & REG2-
- Or
- Opcode: 0x0006
- Params: REG1, REG2
REG1 = REG1 | REG2-
- Inv
- Opcode: 0x0007
- Params: REG1
REG1 = ~REG1-
- Not
- Opcode: 0x0008
- Params: REG1
-
* Boolean NOT; equivalent of C's `!` unary operator * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0000000000001000 | ...... | XXXXXXXXXX | +----------------------------------------+ - Xor
- Opcode: 0x0009
- Params: REG1, REG2
REG1 = REG1 ^ REG2-
- Shl
- Opcode: 0x000A
- Params: REG1, REG2
REG1 = REG1 << REG2-
- Shr
- Opcode: 0x000B
- Params: REG1, REG2
REG1 = REG1 >> REG2- Does not sign extend
-
TODO
- Add signed instructions (iadd, imul, etc)
- Sign-extending SHR
- Overflow flag?
Control flow
-
CmpEq
- Opcode: 0x1000
- Params: REG1, REG2
-
if REG1 == REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; }
* Sets the COMPARE flag to 1 if REG1 == REG2 * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0001000000000000 | ...... | ...... | XXXX | +-------------------------------------------+ -
CmpLt
- Opcode: 0x1001
- Params: REG1, REG2
-
if REG1 < REG2 { FLAGS[1] = 1; } else { FLAGS[1] = 0; }
* Sets the COMPARE flag to 1 if REG1 < REG2 * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0001000000000001 | ...... | ...... | XXXX | +-------------------------------------------+ -
Jmp
- Opcode: 0x1100
- Params: REG1
IP = REG1;- Jumps to the address in REG1 unconditionally.
-
32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000100000000 | ...... | XXXXXXXXXX | +----------------------------------------+
-
Jz
- Opcode: 0x1101
- Params: REG1
-
if FLAGS[1] == 0 { IP = REG1; }
* Jumps to the address in REG1 if COMPARE flag is 0. * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000100000001 | ...... | XXXXXXXXXX | +----------------------------------------+ -
Jnz
- Opcode: 0x1002
- Params: REG1
-
if FLAGS[1] != 0 { IP = REG1; }
* Jumps to the address in REG1 if COMPARE flag is 1. * ``` 32 16 10 0 opcode reg1 unused / / / +----------------------------------------+ | 0001000100000002 | ...... | XXXXXXXXXX | +----------------------------------------+
Data movement
- Load
- Opcode: 0x2000
- Params: REG1, REG2
-
* Sets REG1 to the value at the memory address in REG2. * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010000000000000 | ...... | ...... | XXXX | +-------------------------------------------+ - RegCopy
- Opcode: 0x2001
- Params: REG1, REG2
REG1 = REG2- Copies the value in REG2 into REG1.
-
- StoreImm64
- Opcode: 0x2100
- Params: REG1, IMM_64
REG1 = IMM_64- Sets REG1 to the specified 64-bit number.
- StoreImm32
- Opcode: 0x2101
- Params: REG1, IMM_32
REG1 = IMM_32- Sets REG1 to the specified 32-bit number.
-
- MemCopy
- Opcode: 0x2200
- Params: REG1, REG2
MEM[REG1] = MEM[REG2]- Copies the value at the memory address in REG2 to the memory address in REG1.
-
- Store
- Opcode: 0x2201
- Params: REG1, REG2
-
* Sets the value at the memory address in REG2 to the value in REG1. * ``` 32 16 10 4 0 opcode reg1 reg2 unused / / / / +-------------------------------------------+ | 0010001000000001 | REG1.. | REG2.. | XXXX | +-------------------------------------------+
Miscellaneous
- Halt
- Opcode: 0xF000
- Params: (none)
FLAGS[0] = 1- Halts the machine
-
- Nop
- Opcode: 0xF001
- Params: (none)
- Does nothing
-
Other instructions TODO
- Call
- Takes address and number of bytes on the stack that are for args(?)
- Updates SP, FP, IP, storing previous values starting at the new FP
- Ret
- Uses FP to determine previous SP, FP, and IP and restores them
- Push
- Pop
- More immediate stores?
- Idea: Store42 (or whatever number of bits) that maximizes the usage of a 64-bit instruction
Binary object format
The binary object format is composed of a header followed by sections that make up the content of the object.
Header
The header is composed of:
- 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
- 32 bits - Version of the file
- 32 bits - The number of sections in the file
- section descriptions detailed below
Sections
The rest of the object is a list of sections. A section's layout is a section header, followed by the section contents.
Section header
- 8 bits - Section kind
- 0x00 - Data
- 0x10 - Code
- 0xFF - Meta
- 64 bits - Length of the section
Data section
The data section contains static data that is initialized to some known value.
- 64 bits - section load start - where in memory the content of this section begins
- 64 bits - section load end - where in memory the content of this section ends
Code section
The code section contains executable code.
- 64 bits - section load start - where in memory the content of this section begins
- 64 bits - section load end - where in memory the content of this section ends
The remaining length of the section is the code itself.
Meta section
The meta section holds a table of metadata about the binary in a key-value format of strings mapping to other strings. All strings are UTF-8 encoded.
- 64 bits - the number of key-value entries
The remaining length of the section are the key-value pairs.
The layout for a key-value pair is the key, followed immediately by the value. The key is a string, and the value is a 64-bit value. A key starts with the length of the string, followed by the key string itself. A value is just the 8 bytes of the number.
The meta section should be used to place data that's readable by the VM, but is not used by the executing program. Data in the meta section is not copied to the program memory.
A VM must provide support for the following meta-values:
entry- a 64-bit address for where the VM should begin executing code.
General TODO
- Interrupts
- MMIO regions
- Paging?