273 lines
6.3 KiB
Markdown
273 lines
6.3 KiB
Markdown
# VM
|
|
|
|
This is an outline of the VM that drives this language.
|
|
|
|
# Primitives
|
|
|
|
* Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
|
|
* Addresses point to single bytes.
|
|
* Signed numbers use two's complement.
|
|
|
|
| Type | Size (bits) |
|
|
| - | - |
|
|
| Address | 64 |
|
|
| Word | 64 |
|
|
| Halfword | 32 |
|
|
| Byte | 8 |
|
|
|
|
# Registers
|
|
|
|
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
|
|
|
|
* IP - Instruction pointer
|
|
* SP - Stack pointer
|
|
* FP - Frame pointer
|
|
* FLAGS - CPU flags
|
|
* (9 unused registers)
|
|
* STATUS - Generic status code
|
|
* R0-R49
|
|
|
|
## CPU Flags
|
|
|
|
CPU flags are addressed by bit index, going from right to left.
|
|
|
|
* `00` - Halt flag
|
|
* `01` - Compare flag
|
|
|
|
### Flag ideas
|
|
|
|
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
|
|
behavior - for debugging
|
|
* Overwriting a register without its value being used
|
|
* Mixing arithmetic with bit twiddling on the same target
|
|
|
|
## Register ideas
|
|
|
|
* NULL - a register that will always be zero for reading and will not change after writing.
|
|
* Other possible names: Z, NIL
|
|
|
|
# Instructions
|
|
|
|
## Arithmetic
|
|
|
|
Arithmetic instructions store their result in the first register specified. Overflow is handled by
|
|
wrapping around to 0.
|
|
|
|
* Add
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 + REG2`
|
|
* Unsigned addition
|
|
* Mul
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 * REG2`
|
|
* Unsigned multiplication
|
|
* Div
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 / REG2`
|
|
* Unsigned division
|
|
* Mod
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 % REG2` (exact semantics TBD)
|
|
* INeg
|
|
* **Params**: REG1
|
|
* `REG1 = REG1 * -1`
|
|
* Signed negative
|
|
* And
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 & REG2`
|
|
* Or
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 | REG2`
|
|
* Inv
|
|
* **Params**: REG1
|
|
* `REG1 = ~REG1`
|
|
* Not
|
|
* **Params**: REG1
|
|
* ```
|
|
if REG1 == 0 {
|
|
REG1 = 0;
|
|
} else {
|
|
REG1 = 1;
|
|
}
|
|
```
|
|
* Boolean NOT; equivalent of C's `!` unary operator
|
|
* Xor
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 ^ REG2`
|
|
* Shl
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 << REG2`
|
|
* Shr
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG1 >> REG2`
|
|
* Does not sign extend
|
|
|
|
### TODO
|
|
|
|
* Add signed instructions (iadd, imul, etc)
|
|
* Sign-extending SHR
|
|
* Overflow flag?
|
|
|
|
## Control flow
|
|
|
|
* CmpEq
|
|
* **Params**: REG1, REG2
|
|
* ```
|
|
if REG1 == REG2 {
|
|
FLAGS[1] = 1;
|
|
} else {
|
|
FLAGS[1] = 0;
|
|
}
|
|
```
|
|
* Sets the COMPARE flag to 1 if REG1 == REG2
|
|
* CmpLt
|
|
* **Params**: REG1, REG2
|
|
* ```
|
|
if REG1 < REG2 {
|
|
FLAGS[1] = 1;
|
|
} else {
|
|
FLAGS[1] = 0;
|
|
}
|
|
```
|
|
* Sets the COMPARE flag to 1 if REG1 < REG2
|
|
* Jz
|
|
* **Params**: REG1
|
|
* ```
|
|
if FLAGS[1] == 0 {
|
|
IP = REG1;
|
|
}
|
|
```
|
|
* Jumps to the address in REG1 if COMPARE flag is 0.
|
|
* Jnz
|
|
* **Params**: REG1
|
|
* ```
|
|
if FLAGS[1] != 0 {
|
|
IP = REG1;
|
|
}
|
|
```
|
|
* Jumps to the address in REG1 if COMPARE flag is 1.
|
|
|
|
## Data movement
|
|
|
|
* Load
|
|
* **Params**: REG1, REG2
|
|
* ```
|
|
REG1 = MEM[REG2];
|
|
```
|
|
* Sets REG1 to the value at the memory address in REG2.
|
|
* Store
|
|
* **Params**: REG1, REG2
|
|
* ```
|
|
MEM[REG2] = REG1;
|
|
```
|
|
* Sets the value at the memory address in REG2 to the value in REG1.
|
|
* StoreImm32
|
|
* **Params**: REG1, IMM_32
|
|
* `REG1 = IMM_32`
|
|
* Sets REG1 to the specified 32-bit number.
|
|
* MemCopy
|
|
* **Params**: REG1, REG2
|
|
* `MEM[REG1] = MEM[REG2]`
|
|
* Copies the value at the memory address in REG2 to the memory address in REG1.
|
|
* RegCopy
|
|
* **Params**: REG1, REG2
|
|
* `REG1 = REG2`
|
|
* Copies the value in REG2 into REG1.
|
|
|
|
## Miscellaneous
|
|
|
|
* Halt
|
|
* **Params**: (none)
|
|
* `FLAGS[0] = 1`
|
|
* Halts the machine
|
|
* Nop
|
|
* **Params**: (none)
|
|
* Does nothing
|
|
|
|
## Other instructions TODO
|
|
|
|
* Call
|
|
* Takes address and number of bytes on the stack that are for args(?)
|
|
* Updates SP, FP, IP, storing previous values starting at the new FP
|
|
* Ret
|
|
* Uses FP to determine previous SP, FP, and IP and restores them
|
|
* Push
|
|
* Pop
|
|
* More immediate stores?
|
|
|
|
# Binary format
|
|
|
|
The binary format is composed of a header followed by sections that make up the content of the blob.
|
|
|
|
## Header
|
|
|
|
The header is composed of:
|
|
|
|
* 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
|
|
* 16 bits - Version of the file
|
|
* 16 bits - The number of sections in the file
|
|
* 32 bits - Unused
|
|
* section descriptions detailed below
|
|
|
|
Total length: 128 bits
|
|
|
|
## Sections
|
|
|
|
The rest of the content is a list of sections. A section's layout is a section header, followed by
|
|
the section contents.
|
|
|
|
### Section header
|
|
|
|
* 8 bits - Section kind
|
|
* 0x00 - Data
|
|
* 0x10 - Code
|
|
* 0xFF - Meta
|
|
* 24 bits - Unused
|
|
* 32 bits - Checksum of the section
|
|
* 64 bits - Length of the section
|
|
|
|
Total length: 128 bits
|
|
|
|
### Data section
|
|
|
|
The data section contains static data that is initialized to some known value.
|
|
|
|
* 64 bits - load location - where in memory the contents of this section are put.
|
|
|
|
### Code section
|
|
|
|
The code section contains executable code.
|
|
|
|
* 64 bits - load location - where in memory the contents of this section are put.
|
|
|
|
The remaining length of the section is the code itself.
|
|
|
|
### Meta section
|
|
|
|
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
|
|
to other strings. All strings are UTF-8 encoded.
|
|
|
|
* 64 bits - the number of key-value entries
|
|
|
|
The remaining length of the section are the key-value pairs.
|
|
|
|
The layout for a key-value pair is the key, followed immediately by the value. The key is always a
|
|
string, and the value may be any type of data. A key starts with the length of the string, followed
|
|
by the key string itself. A value starts with the length of the data, followed by the value data
|
|
itself.
|
|
|
|
The meta section should be used to place data that's readable by the VM, but is not used by the
|
|
executing program. Data in the meta section is not copied to the program memory.
|
|
|
|
A VM must provide support for the following meta-values:
|
|
|
|
* `entry` - a 64-bit address for where the VM should begin executing code.
|
|
|
|
# General TODO
|
|
|
|
* Interrupts
|
|
* MMIO regions
|
|
* Execution pipeline
|
|
* Helps to define when certain side effects happen (e.g. when the IP increments)
|
|
* Paging?
|