412 lines
13 KiB
Markdown
412 lines
13 KiB
Markdown
# VM
|
|
|
|
This is an outline of the VM that drives this language.
|
|
|
|
# Primitives
|
|
|
|
* Numbers are little endian (LE) at the byte level.
|
|
* Addresses point to single bytes.
|
|
* Signed numbers use two's complement.
|
|
|
|
| Type | Size (bits) |
|
|
| - | - |
|
|
| Address | 64 |
|
|
| Word | 64 |
|
|
| Halfword | 32 |
|
|
| Byte | 8 |
|
|
|
|
# Registers
|
|
|
|
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
|
|
|
|
* IP - Instruction pointer
|
|
* SP - Stack pointer
|
|
* FP - Frame pointer
|
|
* FLAGS - CPU flags
|
|
* STATUS - Generic status code
|
|
* NIL - Always zero for reading and will never change after writing.
|
|
* IVT - Interrupt vector table pointer
|
|
* R0-R31
|
|
* (25 reserved registers)
|
|
|
|
The following registers are caller-save (i.e., their value may change after a function call):
|
|
|
|
* FLAGS
|
|
* STATUS
|
|
* IVT
|
|
|
|
The rest are callee-save.
|
|
|
|
## CPU Flags
|
|
|
|
CPU flags are addressed by bit index, going from right to left.
|
|
|
|
* `00` - Halt flag
|
|
* `01` - Compare flag
|
|
|
|
### Flag ideas
|
|
|
|
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
|
|
behavior - for debugging
|
|
* Overwriting a register without its value being used
|
|
* Mixing arithmetic with bit twiddling on the same target
|
|
|
|
# Instructions
|
|
|
|
All instructions have 16-bit opcodes. There are three types of instructions:
|
|
|
|
* Those whose operations require a source and a destination.
|
|
* Those whose operations require two sources
|
|
* The sources of these instructions is implied by the instruction itself; e.g. the `CMPEQ`
|
|
instruction implicitly sets a bit in the `FLAGS` register.
|
|
* Those whose operations require a source, but no destination.
|
|
* Those whose operations require a destination, but no source.
|
|
* There aren't any of these instructions yet
|
|
* Those whose operations require neither a source nor a destination.
|
|
|
|
Destinations may be:
|
|
|
|
* A 64-bit address pointing at a 64-bit or 8-bit value
|
|
* A 6-bit register
|
|
|
|
Sources may be one of:
|
|
|
|
* A 64-bit address pointing at a 64-bit or 8-bit value
|
|
* A 6-bit register
|
|
* A 64-bit immediate value
|
|
|
|
Counting all source and destination value sizes as their own configuration, there are:
|
|
|
|
* 3 possible destination types
|
|
* 4 possible source types
|
|
|
|
Instructions have different layouts depending on whether its operation takes a source and/or
|
|
destination. For example, the `ADD` instruction takes a source and a destination, the `JMP`
|
|
instruction takes a source, and the `NOP` instruction takes neither a source nor a destination.
|
|
|
|
For instructions that take neither a source nor a destination, they are simply 16 bits long and
|
|
that's that. All other instructions are followed by a byte determining its source and/or
|
|
destination.
|
|
|
|
An instruction that has a source and destination looks like this:
|
|
|
|
```
|
|
| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |
|
|
```
|
|
|
|
An instruction that has either a source or a destination (but not both) looks like this:
|
|
|
|
```
|
|
| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |
|
|
```
|
|
|
|
An instruction that has neither a source nor a destination looks like this:
|
|
|
|
```
|
|
| XXXXXXXX | XXXXXXXX |
|
|
```
|
|
|
|
## Source/destination flags
|
|
|
|
| Bits | Source/destination |
|
|
| - | - |
|
|
| 0b0000 | Address (64 bit value) |
|
|
| 0b0001 | Address (32 bit value) |
|
|
| 0b0010 | Address (16 bit value) |
|
|
| 0b0011 | Address (8 bit value) |
|
|
| 0b0100 | 6-bit register |
|
|
| 0b0101 | Immediate (64 bits, source only) |
|
|
| 0b0110 | Immediate (32 bits, source only) |
|
|
| 0b0111 | Immediate (16 bits, source only) |
|
|
| 0b1000 | Immediate (8 bits, source only) |
|
|
|
|
|
|
## Arithmetic
|
|
|
|
Arithmetic instructions store their result in the first register specified. Overflow is handled by
|
|
wrapping around to 0.
|
|
|
|
* Add
|
|
* Opcode: 0x0000
|
|
* Params: Destination, source
|
|
* Sub
|
|
* Opcode: 0x0001
|
|
* Params: Destination, source
|
|
* Mul
|
|
* Opcode: 0x0002
|
|
* Params: Destination, source
|
|
* Div
|
|
* Opcode: 0x0003
|
|
* Params: Destination, source
|
|
* Mod
|
|
* Opcode: 0x0004
|
|
* Params: Destination, source
|
|
* And
|
|
* Opcode: 0x0005
|
|
* Params: Destination, source
|
|
* Or
|
|
* Opcode: 0x0006
|
|
* Params: Destination, source
|
|
* Xor
|
|
* Opcode: 0x0007
|
|
* Params: Destination, source
|
|
* Shl
|
|
* Opcode: 0x0008
|
|
* Params: Destination, source
|
|
* Shr
|
|
* Opcode: 0x0009
|
|
* Params: Destination, source
|
|
* INeg
|
|
* Opcode: 0x000a
|
|
* Params: Destination, source
|
|
* Inv
|
|
* Opcode: 0x000b
|
|
* Params: Destination, source
|
|
* Not
|
|
* Opcode: 0x000c
|
|
* Params: Destination, source
|
|
|
|
### TODO
|
|
|
|
* Add signed instructions (iadd, imul, etc)
|
|
* Sign-extending SHR
|
|
* Overflow flag?
|
|
|
|
## Control flow
|
|
|
|
* CmpEq
|
|
* Opcode: 0x1000
|
|
* Params: Source, source
|
|
* CmpLt
|
|
* Opcode: 0x1001
|
|
* Params: Source, source
|
|
* Jmp
|
|
* Opcode: 0x1002
|
|
* Params: Source
|
|
* Jz
|
|
* Opcode: 0x1003
|
|
* Params: Source
|
|
* Jnz
|
|
* Opcode: 0x1004
|
|
* Params: Source
|
|
|
|
## Functions
|
|
|
|
* Call
|
|
* Opcode: 0x2000
|
|
* Params: Source
|
|
* When this instruction is executed, these actions occur:
|
|
* Push the current stack frame pointer
|
|
* Push the IP of the next instruction
|
|
* Update the IP (i.e., jump) to the value at the given source.
|
|
* Update the frame pointer to the current stack pointer - 16
|
|
* Ret
|
|
* Opcode: 0x2001
|
|
* When this instruction is executed, these actions occur:
|
|
* Update the stack pointer to the current frame pointer + 16.
|
|
* Pop the IP of the next instruction.
|
|
* Pop the old stack frame.
|
|
* Restore the last three values in an undefined order
|
|
* Push
|
|
* Opcode: 0x2002
|
|
* Params: Source
|
|
* When this instruction is executed, these actions occur:
|
|
* Set the value in memory at the current stack pointer to the source value.
|
|
* Increment the stack pointer by the size of value at the source.
|
|
* Pop
|
|
* Opcode: 0x2003
|
|
* Params: Dest
|
|
* When this instruction is executed, these actions occur:
|
|
* Decrement the stack pointer by the size of value at the destination.
|
|
* Copy the value at the stack pointer into the destination.
|
|
* Int
|
|
* Opcode: 0x2004
|
|
* Params: Source, Source
|
|
* When this instruction is executed, these actions occur:
|
|
* Push the current stack frame pointer
|
|
* Push the IP of the next instruction to be called
|
|
* Push the FLAGS register
|
|
* Push the STATUS register
|
|
* Push the R0 register
|
|
* Push the R1 register
|
|
* Update the IP (i.e., jump) to the address of the given interrupt vector in the IVT
|
|
* Update the R0 register to the value in the first parameter
|
|
* Update the R1 register to the value in the second parameter
|
|
* Update the frame pointer to the current stack pointer - 48
|
|
* IRet
|
|
* Opcode: 0x2005
|
|
* When this instruction is executed, these actions occur:
|
|
* Update the stack pointer to the current frame pointer + 48
|
|
* Pop the old R1 value
|
|
* Pop the old R0 value
|
|
* Pop the old STATUS value
|
|
* Pop the old FLAGS value
|
|
* Pop the IP of the next instruction
|
|
* Pop the old stack frame
|
|
* Restore the last 6 values in an undefined order
|
|
|
|
## Data movement
|
|
|
|
* Mov
|
|
* Opcode: 0x3000
|
|
* Params: Source, Dest
|
|
|
|
## Miscellaneous
|
|
|
|
* Halt
|
|
* Opcode: 0xF000
|
|
* Nop
|
|
* Opcode: 0xF001
|
|
* Dump
|
|
* Opcode: 0xF002
|
|
|
|
# Interrupts
|
|
|
|
Interrupts are signaled explicitly from software or from hardware signaling the CPU. When an
|
|
interrupt signal is set, the CPU will finish whatever instruction it is executing, and then begin
|
|
handling the interrupt whose signal was set. Software interrupts may be invoked using the `int`
|
|
instruction, supplying the index of the interrupt to invoke. Hardware interrupts are invoked
|
|
directly by a hardware event, e.g. a keypress. Hardware and software interrupts are treated equally
|
|
in the CPU, and as such, they are all maskable.
|
|
|
|
## Interrupt vector table
|
|
|
|
Interrupts are defined by the IVT register. The address stored in the IVT register must be a
|
|
multiple of 64. The IVT always has 512 entries, with 8 bytes for each entry. Thus, the entire table
|
|
is 512 * 8 = 4096 bytes, or one page.
|
|
|
|
## Interrupt table entries
|
|
|
|
Interrupt table entries make up the interrupt vector table, each entry being 64 bits (8 bytes) long.
|
|
|
|
* 1 bit - Enabled
|
|
* 4 bits - Reserved, set to 0
|
|
* 59 bits - Interrupt address, multiplied by 64 for the start address
|
|
|
|
## Interrupt handling
|
|
|
|
After an interrupt is signaled, the CPU looks up the index of the interrupt in the IVT, calculates
|
|
its address, sets up the stack for the interrupt handler, and jumps to the interrupt handler's
|
|
address.
|
|
|
|
The interrupt stack is structured similarly to a normal call stack, but since interrupts may be
|
|
invoked at any time, it saves additional state. Interrupt handlers have two explicit arguments: the
|
|
interrupt index itself, and an auxiliary 64-bit value or pointer specific to that interrupt. The
|
|
index is stored in the R0 register, and the auxiliary value is stored in the R1 register. These
|
|
registers, along with the FP, IP, FLAGS, and STATUS registers are saved on the stack before calling
|
|
an interrupt handler.
|
|
|
|
Before an interrupt handler is called, these actions occur:
|
|
|
|
* Push the current stack frame pointer
|
|
* Push the IP of the next instruction to be called
|
|
* Push the FLAGS register
|
|
* Push the STATUS register
|
|
* Push the R0 register
|
|
* Push the R1 register
|
|
|
|
Interrupt handlers must be exited using the `iret` instruction.
|
|
|
|
## Exceptions
|
|
|
|
The first 256 interrupt vectors are reserved for CPU and hardware-sourced events - these are known
|
|
as exceptions. Exceptions may occur for a number of reasons:
|
|
|
|
* Illegal operation attempted, e.g. divide by zero or accessing protected memory
|
|
* Illegal operation attempted while handling an interrupt (double fault)
|
|
* A hardware event occurred, e.g. a timer tick
|
|
|
|
The following list defines all exceptions that the CPU may invoke. All other vectors in 0-255 not
|
|
defined in this table are reserved and may be used in the future.
|
|
|
|
* Divide by zero
|
|
* Interrupt vector: 0
|
|
* Auxiliary: N/A
|
|
* Invoked upon a divide-by-zero
|
|
* Invalid opcode
|
|
* Interrupt vector: 1
|
|
* Auxiliary: N/A
|
|
* Attempted to invoke an illegal opcode
|
|
* Illegal memory address
|
|
* Interrupt vector: 2
|
|
* Auxiliary: Memory address causing the interrupt
|
|
* Attempted to access a memory address in an illegal way - either it's out of bounds or is
|
|
protected in some way.
|
|
* Hardware event
|
|
* Interrupt vector: 3
|
|
* Auxiliary: Pointer to the hardware event structure.
|
|
* A hardware device has an event that needs attention.
|
|
* Interrupt vector 4-255: Reserved for future use
|
|
|
|
# Binary object format
|
|
|
|
The binary object format is composed of a header followed by sections that make up the content of
|
|
the object.
|
|
|
|
## Header
|
|
|
|
The header is composed of:
|
|
|
|
* 64 bits - A magic number (0xDEAD\_BEA7\_BA5E\_BA11).
|
|
* 32 bits - Version of the file
|
|
* 32 bits - The number of sections in the file
|
|
* section descriptions detailed below
|
|
|
|
## Sections
|
|
|
|
The rest of the object is a list of sections. A section's layout is a section header, followed by
|
|
the section contents.
|
|
|
|
### Section header
|
|
|
|
* 8 bits - Section kind
|
|
* 0x00 - Data
|
|
* 0xFF - Meta
|
|
* 64 bits - Length of the section
|
|
|
|
### Data section
|
|
|
|
The data section contains static data that is initialized to some known value.
|
|
|
|
* 16 bits - length of section name
|
|
* N bits - section name
|
|
* 64 bits - section load start - where in memory the content of this section begins
|
|
* 64 bits - section length - how long the memory content is
|
|
|
|
### Meta section
|
|
|
|
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
|
|
to other strings. All strings are UTF-8 encoded.
|
|
|
|
* 64 bits - the number of key-value entries
|
|
|
|
The remaining length of the section are the key-value pairs.
|
|
|
|
The layout for a key-value pair is the key, followed immediately by the value. The key is a string,
|
|
and the value is a 64-bit value. A key starts with the length of the string, followed by the key
|
|
string itself. A value is just the 8 bytes of the number.
|
|
|
|
The meta section should be used to place data that's readable by the VM, but is not used by the
|
|
executing program. Data in the meta section is not copied to the program memory.
|
|
|
|
A VM must provide support for the following meta-values:
|
|
|
|
* `entry` - a 64-bit address for where the VM should begin executing code.
|
|
|
|
# General TODO
|
|
|
|
* Interrupts
|
|
* MMIO regions
|
|
* Paging?
|
|
* Determine how address sizes are determined
|
|
* source size <= dest size - zero extend source and copy
|
|
* mov %r0, (label)u32
|
|
* source size > dest size - truncate to dest size
|
|
* mov (label)u32, %r0
|
|
* source size with unknown dest size - use dest size == source size
|
|
* mov %r0, (label)
|
|
* unknown source size with dest size - use dest size == source size
|
|
* mov (label), %r0
|
|
* unknown source size with unknown dest size - 64 bits
|
|
* mov (label), (%r0)
|