Files
rasp/vm.md
Alek Ratzloff 7a6c2d80ab Add call/ret/push/pop instructions
* Call/ret/push/pop are implemented and appear to be working
* Call/ret/push/pop is specified in the 0x2000 block, replacing the
  mov instruction
* Mov instruction is now specified in the 0x3000 block

Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
2020-02-26 10:14:48 -05:00

302 lines
8.0 KiB
Markdown

# VM
This is an outline of the VM that drives this language.
# Primitives
* Numbers are little endian (LE) at the byte level.
* Addresses point to single bytes.
* Signed numbers use two's complement.
| Type | Size (bits) |
| - | - |
| Address | 64 |
| Word | 64 |
| Halfword | 32 |
| Byte | 8 |
# Registers
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
* IP - Instruction pointer
* SP - Stack pointer
* FP - Frame pointer
* FLAGS - CPU flags
* STATUS - Generic status code
* NIL - Always zero for reading and will never change after writing.
* R0-R31
* (26 unused registers)
## CPU Flags
CPU flags are addressed by bit index, going from right to left.
* `00` - Halt flag
* `01` - Compare flag
### Flag ideas
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
behavior - for debugging
* Overwriting a register without its value being used
* Mixing arithmetic with bit twiddling on the same target
# Instructions
All instructions have 16-bit opcodes. There are three types of instructions:
* Those whose operations require a source and a destination.
* Those whose operations require two sources
* The sources of these instructions is implied by the instruction itself; e.g. the `CMPEQ`
instruction implicitly sets a bit in the `FLAGS` register.
* Those whose operations require a source, but no destination.
* Those whose operations require a destination, but no source.
* There aren't any of these instructions yet
* Those whose operations require neither a source nor a destination.
Destinations may be:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
Sources may be one of:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
* A 64-bit immediate value
Counting all source and destination value sizes as their own configuration, there are:
* 3 possible destination types
* 4 possible source types
Instructions have different layouts depending on whether its operation takes a source and/or
destination. For example, the `ADD` instruction takes a source and a destination, the `JMP`
instruction takes a source, and the `NOP` instruction takes neither a source nor a destination.
For instructions that take neither a source nor a destination, they are simply 16 bits long and
that's that. All other instructions are followed by a byte determining its source and/or
destination.
An instruction that has a source and destination looks like this:
```
| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |
```
An instruction that has either a source or a destination (but not both) looks like this:
```
| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |
```
An instruction that has neither a source nor a destination looks like this:
```
| XXXXXXXX | XXXXXXXX |
```
## Source/destination flags
| Bits | Source/destination |
| - | - |
| 0b0000 | Address (64 bit value) |
| 0b0001 | Address (32 bit value) |
| 0b0010 | Address (16 bit value) |
| 0b0011 | Address (8 bit value) |
| 0b0100 | 6-bit register |
| 0b0101 | Immediate (64 bits, source only) |
| 0b0110 | Immediate (32 bits, source only) |
| 0b0111 | Immediate (16 bits, source only) |
| 0b1000 | Immediate (8 bits, source only) |
## Arithmetic
Arithmetic instructions store their result in the first register specified. Overflow is handled by
wrapping around to 0.
* Add
* Opcode: 0x0000
* Params: Destination, source
* Sub
* Opcode: 0x0001
* Params: Destination, source
* Mul
* Opcode: 0x0002
* Params: Destination, source
* Div
* Opcode: 0x0003
* Params: Destination, source
* Mod
* Opcode: 0x0004
* Params: Destination, source
* And
* Opcode: 0x0005
* Params: Destination, source
* Or
* Opcode: 0x0006
* Params: Destination, source
* Xor
* Opcode: 0x0007
* Params: Destination, source
* Shl
* Opcode: 0x0008
* Params: Destination, source
* Shr
* Opcode: 0x0009
* Params: Destination, source
* INeg
* Opcode: 0x000a
* Params: Destination, source
* Inv
* Opcode: 0x000b
* Params: Destination, source
* Not
* Opcode: 0x000c
* Params: Destination, source
### TODO
* Add signed instructions (iadd, imul, etc)
* Sign-extending SHR
* Overflow flag?
## Control flow
* CmpEq
* Opcode: 0x1000
* Params: Source, source
* CmpLt
* Opcode: 0x1001
* Params: Source, source
* Jmp
* Opcode: 0x1002
* Params: Source
* Jz
* Opcode: 0x1003
* Params: Source
* Jnz
* Opcode: 0x1004
* Params: Source
## Functions
* Call
* Opcode: 0x2000
* Params: Source
* When this instruction is executed, these actions occur:
* Push the current stack frame pointer
* Push the IP of the next instruction
* Update the IP (i.e., jump) to the value at the given source.
* Ret
* Opcode: 0x2001
* When this instruction is executed, these actions occur:
* Update the stack pointer to the current frame pointer + 16.
* Pop the IP of the next instruction.
* Pop the old stack frame.
* Restore the last three values in an undefined order
* Push
* Opcode: 0x2002
* Params: Source
* Pop
* Opcode: 0x2003
* Params: Dest
## Data movement
* Mov
* Opcode: 0x3000
## Miscellaneous
* Halt
* Opcode: 0xF000
* Nop
* Opcode: 0xF001
* Dump
* Opcode: 0xF002
## Other instructions TODO
* Call
* Takes address and number of bytes on the stack that are for args(?)
* Updates SP, FP, IP, storing previous values starting at the new FP
* Ret
* Uses FP to determine previous SP, FP, and IP and restores them
* Push
* Pop
# Binary object format
The binary object format is composed of a header followed by sections that make up the content of
the object.
## Header
The header is composed of:
* 64 bits - A magic number (0xDEAD\_BEA7\_BA5E\_BA11).
* 32 bits - Version of the file
* 32 bits - The number of sections in the file
* section descriptions detailed below
## Sections
The rest of the object is a list of sections. A section's layout is a section header, followed by
the section contents.
### Section header
* 8 bits - Section kind
* 0x00 - Data
* 0xFF - Meta
* 64 bits - Length of the section
### Data section
The data section contains static data that is initialized to some known value.
* 16 bits - length of section name
* N bits - section name
* 64 bits - section load start - where in memory the content of this section begins
* 64 bits - section length - how long the memory content is
### Meta section
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
to other strings. All strings are UTF-8 encoded.
* 64 bits - the number of key-value entries
The remaining length of the section are the key-value pairs.
The layout for a key-value pair is the key, followed immediately by the value. The key is a string,
and the value is a 64-bit value. A key starts with the length of the string, followed by the key
string itself. A value is just the 8 bytes of the number.
The meta section should be used to place data that's readable by the VM, but is not used by the
executing program. Data in the meta section is not copied to the program memory.
A VM must provide support for the following meta-values:
* `entry` - a 64-bit address for where the VM should begin executing code.
# General TODO
* Interrupts
* MMIO regions
* Paging?
* Determine how address sizes are determined
* source size <= dest size - zero extend source and copy
* mov %r0, (label)u32
* source size > dest size - truncate to dest size
* mov (label)u32, %r0
* source size with unknown dest size - use dest size == source size
* mov %r0, (label)
* unknown source size with dest size - use dest size == source size
* mov (label), %r0
* unknown source size with unknown dest size - 64 bits
* mov (label), (%r0)