Files
rasp/vm.md

497 lines
14 KiB
Markdown
Raw Normal View History

# VM
This is an outline of the VM that drives this language.
# Primitives
* Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
* Addresses point to single bytes.
* Signed numbers use two's complement.
| Type | Size (bits) |
| - | - |
| Address | 64 |
| Word | 64 |
| Halfword | 32 |
| Byte | 8 |
# Registers
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
* IP - Instruction pointer
* SP - Stack pointer
* FP - Frame pointer
* FLAGS - CPU flags
* NULL - Always zero for reading and will never change after writing.
* (8 unused registers)
* STATUS - Generic status code
* R0-R49
## CPU Flags
CPU flags are addressed by bit index, going from right to left.
* `00` - Halt flag
* `01` - Compare flag
### Flag ideas
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
behavior - for debugging
* Overwriting a register without its value being used
* Mixing arithmetic with bit twiddling on the same target
## Register ideas
* Other possible names: Z, NIL
# Instructions
Instructions attempt to be as small as possible while conforming to 8-bit, 16-bit, 32-bit, or 64-bit
alignment. All instructions have 16-bit opcodes.
## Arithmetic
Arithmetic instructions store their result in the first register specified. Overflow is handled by
wrapping around to 0.
* Add
* Opcode: 0x0000
* **Params**: REG1, REG2
* `REG1 = REG1 + REG2`
* Unsigned addition
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Mul
* Opcode: 0x0001
* **Params**: REG1, REG2
* `REG1 = REG1 * REG2`
* Unsigned multiplication
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Div
* Opcode: 0x0002
* **Params**: REG1, REG2
* `REG1 = REG1 / REG2`
* Unsigned division
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000010 | ...... | ...... | XXXX |
+-------------------------------------------+
```
2020-01-25 21:06:43 -05:00
* Mod
* Opcode: 0x0003
2020-01-25 21:06:43 -05:00
* **Params**: REG1, REG2
* `REG1 = REG1 % REG2` (exact semantics TBD)
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000011 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* INeg
* Opcode: 0x0004
* **Params**: REG1
* `REG1 = REG1 * -1`
* Signed negative
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000000100 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* And
* Opcode: 0x0005
* **Params**: REG1, REG2
* `REG1 = REG1 & REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000101 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Or
* Opcode: 0x0006
* **Params**: REG1, REG2
* `REG1 = REG1 | REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000110 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Inv
* Opcode: 0x0007
* **Params**: REG1
* `REG1 = ~REG1`
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000000111 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Not
* Opcode: 0x0008
* **Params**: REG1
* ```
if REG1 == 0 {
REG1 = 0;
} else {
REG1 = 1;
}
```
* Boolean NOT; equivalent of C's `!` unary operator
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000001000 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Xor
* Opcode: 0x0009
* **Params**: REG1, REG2
* `REG1 = REG1 ^ REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Shl
* Opcode: 0x000A
* **Params**: REG1, REG2
* `REG1 = REG1 << REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001010 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Shr
* Opcode: 0x000B
* **Params**: REG1, REG2
* `REG1 = REG1 >> REG2`
* Does not sign extend
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001011 | ...... | ...... | XXXX |
+-------------------------------------------+
```
### TODO
* Add signed instructions (iadd, imul, etc)
* Sign-extending SHR
* Overflow flag?
## Control flow
* CmpEq
* Opcode: 0x1000
* **Params**: REG1, REG2
* ```
if REG1 == REG2 {
FLAGS[1] = 1;
} else {
FLAGS[1] = 0;
}
```
* Sets the COMPARE flag to 1 if REG1 == REG2
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0001000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* CmpLt
* Opcode: 0x1001
* **Params**: REG1, REG2
* ```
if REG1 < REG2 {
FLAGS[1] = 1;
} else {
FLAGS[1] = 0;
}
```
* Sets the COMPARE flag to 1 if REG1 < REG2
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0001000000000001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Jmp
* Opcode: 0x1100
* **Params**: REG1
* `IP = REG1;`
* Jumps to the address in REG1 unconditionally.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000000 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Jz
* Opcode: 0x1101
* **Params**: REG1
* ```
if FLAGS[1] == 0 {
IP = REG1;
}
```
* Jumps to the address in REG1 if COMPARE flag is 0.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000001 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Jnz
* Opcode: 0x1002
* **Params**: REG1
* ```
if FLAGS[1] != 0 {
IP = REG1;
}
```
* Jumps to the address in REG1 if COMPARE flag is 1.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000002 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
## Data movement
* Load
* Opcode: 0x2000
* **Params**: REG1, REG2
* ```
REG1 = MEM[REG2];
```
* Sets REG1 to the value at the memory address in REG2.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* RegCopy
* Opcode: 0x2001
* **Params**: REG1, REG2
* `REG1 = REG2`
* Copies the value in REG2 into REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010000000000001 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
* StoreImm64
* Opcode: 0x2100
* **Params**: REG1, IMM_64
* `REG1 = IMM_64`
* Sets REG1 to the specified 64-bit number.
* StoreImm32
* Opcode: 0x2101
* **Params**: REG1, IMM_32
* `REG1 = IMM_32`
* Sets REG1 to the specified 32-bit number.
* ```
64 48 42 36 32 0
opcode reg1 reg2 unused
/ / / / immediate 32 bit value
/ / / / /
+------------------------------------------------------------------------------+
| 0010000100000001 | REG1.. | REG2.. | XXXX | IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII |
+------------------------------------------------------------------------------+
```
* MemCopy
* Opcode: 0x2200
* **Params**: REG1, REG2
* `MEM[REG1] = MEM[REG2]`
* Copies the value at the memory address in REG2 to the memory address in REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010001000000000 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
* Store
* Opcode: 0x2201
* **Params**: REG1, REG2
* ```
MEM[REG2] = REG1;
```
* Sets the value at the memory address in REG2 to the value in REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010001000000001 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
## Miscellaneous
* Halt
* Opcode: 0xF000
* **Params**: (none)
* `FLAGS[0] = 1`
* Halts the machine
* ```
16
opcode
/
+------------------+
| 1111000000000000 |
+------------------+
```
* Nop
* Opcode: 0xF001
* **Params**: (none)
* Does nothing
* ```
16
opcode
/
+------------------+
| 1111000000000001 |
+------------------+
```
## Other instructions TODO
* Call
* Takes address and number of bytes on the stack that are for args(?)
* Updates SP, FP, IP, storing previous values starting at the new FP
* Ret
* Uses FP to determine previous SP, FP, and IP and restores them
* Push
* Pop
* More immediate stores?
* Idea: Store42 (or whatever number of bits) that maximizes the usage of a 64-bit instruction
# Binary object format
The binary object format is composed of a header followed by sections that make up the content of
the object.
## Header
The header is composed of:
* 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
* 32 bits - Version of the file
* 32 bits - The number of sections in the file
* section descriptions detailed below
## Sections
The rest of the object is a list of sections. A section's layout is a section header, followed by
the section contents.
### Section header
* 8 bits - Section kind
* 0x00 - Data
* 0x10 - Code
* 0xFF - Meta
* 64 bits - Length of the section
### Data section
The data section contains static data that is initialized to some known value.
* 64 bits - section load start - where in memory the content of this section begins
* 64 bits - section load end - where in memory the content of this section ends
### Code section
The code section contains executable code.
* 64 bits - section load start - where in memory the content of this section begins
* 64 bits - section load end - where in memory the content of this section ends
The remaining length of the section is the code itself.
### Meta section
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
to other strings. All strings are UTF-8 encoded.
* 64 bits - the number of key-value entries
The remaining length of the section are the key-value pairs.
The layout for a key-value pair is the key, followed immediately by the value. The key is a string,
and the value is a 64-bit value. A key starts with the length of the string, followed by the key
string itself. A value is just the 8 bytes of the number.
The meta section should be used to place data that's readable by the VM, but is not used by the
executing program. Data in the meta section is not copied to the program memory.
A VM must provide support for the following meta-values:
* `entry` - a 64-bit address for where the VM should begin executing code.
# General TODO
* Interrupts
* MMIO regions
* Paging?