Files
rasp/vm.md
Alek Ratzloff bff9220fb1 Extend how interrupts are reported to the main execution loop
Some things that were previously hard VM-level errors are now handled by
interrupts. While this is relatively easy to handle, I was wanting a
little more structure for the error types - so, errors that should
invoke an interrupt are passed along in their own structure in a
VmError::Interrupt variant. If an error is raised during the tick()
phase of execution that would cause an interrupt, that interrupt is
intercepted and the VM continues.

The State::interrupt() function also will catch double faults and triple
faults, with a triple fault being its own variant in the VmError
structure (so it cannot be intercepted as an interrupt by accident).

Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
2020-03-12 16:56:20 -04:00

448 lines
15 KiB
Markdown

# VM
This is an outline of the VM that drives this language.
# Primitives
* Numbers are little endian (LE) at the byte level.
* Addresses point to single bytes.
* Signed numbers use two's complement.
| Type | Size (bits) |
| - | - |
| Address | 64 |
| Word | 64 |
| Halfword | 32 |
| Byte | 8 |
# Registers
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
* IP - Instruction pointer
* SP - Stack pointer
* FP - Frame pointer
* FLAGS - CPU flags
* STATUS - Generic status code
* NIL - Always zero for reading and will never change after writing.
* IVT - Interrupt vector table pointer
* R0-R31
* (25 reserved registers)
The following registers are caller-save (i.e., their value may change after a function call):
* FLAGS
* STATUS
* IVT
The rest are callee-save.
## CPU Flags
CPU flags are addressed by bit index, going from right to left.
* `00` - Halt flag
* `01` - Compare flag
* `02` - Enable interrupts
### Flag ideas
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
behavior - for debugging
* Overwriting a register without its value being used
* Mixing arithmetic with bit twiddling on the same target
# Instructions
All instructions have 16-bit opcodes. There are three types of instructions:
* Those whose operations require a source and a destination.
* Those whose operations require two sources
* The sources of these instructions is implied by the instruction itself; e.g. the `CMPEQ`
instruction implicitly sets a bit in the `FLAGS` register.
* Those whose operations require a source, but no destination.
* Those whose operations require a destination, but no source.
* There aren't any of these instructions yet
* Those whose operations require neither a source nor a destination.
Destinations may be:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
Sources may be one of:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
* A 64-bit immediate value
Counting all source and destination value sizes as their own configuration, there are:
* 3 possible destination types
* 4 possible source types
Instructions have different layouts depending on whether its operation takes a source and/or
destination. For example, the `ADD` instruction takes a source and a destination, the `JMP`
instruction takes a source, and the `NOP` instruction takes neither a source nor a destination.
For instructions that take neither a source nor a destination, they are simply 16 bits long and
that's that. All other instructions are followed by a byte determining its source and/or
destination.
An instruction that has a source and destination looks like this:
```
| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |
```
An instruction that has either a source or a destination (but not both) looks like this:
```
| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |
```
An instruction that has neither a source nor a destination looks like this:
```
| XXXXXXXX | XXXXXXXX |
```
## Source/destination flags
| Bits | Source/destination |
| - | - |
| 0b0000 | Address (64 bit value) |
| 0b0001 | Address (32 bit value) |
| 0b0010 | Address (16 bit value) |
| 0b0011 | Address (8 bit value) |
| 0b0100 | 6-bit register |
| 0b0101 | Immediate (64 bits, source only) |
| 0b0110 | Immediate (32 bits, source only) |
| 0b0111 | Immediate (16 bits, source only) |
| 0b1000 | Immediate (8 bits, source only) |
## Arithmetic
Arithmetic instructions store their result in the first register specified. Overflow is handled by
wrapping around to 0.
* Add
* Opcode: 0x1000
* Params: Destination, source
* Sub
* Opcode: 0x1001
* Params: Destination, source
* Mul
* Opcode: 0x1002
* Params: Destination, source
* Div
* Opcode: 0x1003
* Params: Destination, source
* Mod
* Opcode: 0x1004
* Params: Destination, source
* And
* Opcode: 0x1005
* Params: Destination, source
* Or
* Opcode: 0x1006
* Params: Destination, source
* Xor
* Opcode: 0x1007
* Params: Destination, source
* Shl
* Opcode: 0x1008
* Params: Destination, source
* Shr
* Opcode: 0x1009
* Params: Destination, source
* INeg
* Opcode: 0x100a
* Params: Destination, source
* Inv
* Opcode: 0x100b
* Params: Destination, source
* Not
* Opcode: 0x100c
* Params: Destination, source
### TODO
* Add signed instructions (iadd, imul, etc)
* Sign-extending SHR
* Overflow flag?
## Control flow
* CmpEq
* Opcode: 0x2000
* Params: Source, source
* CmpLt
* Opcode: 0x2001
* Params: Source, source
* Jmp
* Opcode: 0x2002
* Params: Source
* Jz
* Opcode: 0x2003
* Params: Source
* Jnz
* Opcode: 0x2004
* Params: Source
## Functions
* Call
* Opcode: 0x3000
* Params: Source
* When this instruction is executed, these actions occur:
* Push the current stack frame pointer
* Push the IP of the next instruction
* Update the IP to the value at the given source.
* Update the frame pointer to the current stack pointer - 16
* Ret
* Opcode: 0x3001
* When this instruction is executed, these actions occur:
* Update the stack pointer to the current frame pointer + 16.
* Pop the IP of the next instruction.
* Pop the old stack frame.
* Restore the last three values in an undefined order
* Push
* Opcode: 0x3002
* Params: Source
* When this instruction is executed, these actions occur:
* Set the value in memory at the current stack pointer to the source value.
* Increment the stack pointer by the size of value at the source.
* Pop
* Opcode: 0x3003
* Params: Dest
* When this instruction is executed, these actions occur:
* Decrement the stack pointer by the size of value at the destination.
* Copy the value at the stack pointer into the destination.
* Int
* Opcode: 0x3004
* Params: Source, Source
* When this instruction is executed, these actions occur:
* Push the current stack frame pointer
* Push the IP of the next instruction to be called
* Push the FLAGS register
* Push the STATUS register
* Push the R0-R31 registers
* Update the IP to the address of the given interrupt vector in the IVT
* Update the R0 register to the value in the first parameter
* Update the R1 register to the value in the second parameter
* Update the frame pointer to the current stack pointer - 288
* IRet
* Opcode: 0x3005
* When this instruction is executed, these actions occur:
* Update the stack pointer to the current frame pointer + 288
* Pop the old R31-R00 values
* Pop the old STATUS value
* Pop the old FLAGS value
* Pop the IP of the next instruction
* Pop the old stack frame
## Data movement
* Mov
* Opcode: 0x4000
* Params: Source, Dest
## Miscellaneous
* Halt
* Opcode: 0xF000
* Nop
* Opcode: 0xF001
* Dump
* Opcode: 0xF002
# Interrupts
Interrupts are signaled explicitly from software or from hardware signaling the CPU. When an
interrupt signal is set, the CPU will finish whatever instruction it is executing, and then begin
handling the interrupt whose signal was set. Software interrupts may be invoked using the `int`
instruction, supplying the index of the interrupt to invoke. Hardware interrupts are invoked
directly by a hardware event, e.g. a keypress. Hardware and software interrupts are treated equally
in the CPU, and as such, they are all maskable.
An interrupt may be masked in two ways: either through its entry in the IVT, or through the "enable
interrupts" CPU flag. If the "enabled" bit in the IVT is not set, that interrupt will not be handled
when it is invoked. If the "enable interrupts" CPU flag is not set, *no* interrupts will be handled.
## Interrupt vector table
Interrupts are defined by the IVT register. The address stored in the IVT register must be a
multiple of 64. The IVT always has 512 entries, with 8 bytes for each entry. Thus, the entire table
is 512 * 8 = 4096 bytes, or one page.
## Interrupt table entries
Interrupt table entries make up the interrupt vector table, each entry being 64 bits (8 bytes) long.
* 1 bit - Enabled
* 4 bits - Reserved, set to 0
* 59 bits - Interrupt address, multiplied by 64 for the start address
## Interrupt handling
After an interrupt is signaled, the CPU looks up the index of the interrupt in the IVT, calculates
its address, sets up the stack for the interrupt handler, and jumps to the interrupt handler's
address.
The interrupt stack is structured similarly to a normal call stack, but since interrupts may be
invoked at any time, it saves additional state. Interrupt handlers have two explicit arguments: the
interrupt index itself, and an auxiliary 64-bit value or pointer specific to that interrupt. The
index is stored in the R0 register, and the auxiliary value is stored in the R1 register. These
registers, along with the FP, IP, FLAGS, and STATUS registers are saved on the stack before calling
an interrupt handler.
Before an interrupt handler is called, these actions occur:
* Push the current stack frame pointer
* Push the IP of the next instruction to be called
* Push the FLAGS register
* Push the STATUS register
* Push the R0-R31 registers
* Update the IP to the address of the given interrupt vector in the IVT
* Update the R0 register to the value in the first parameter
* Update the R1 register to the value in the second parameter
* Update the frame pointer to the current stack pointer - 288
Interrupt handlers must be exited using the `iret` instruction. When an interrupt call is exited,
the above actions occur in reverse:
* Update the stack pointer to the current frame pointer + 288
* Pop the old R31-R00 values
* Pop the old STATUS value
* Pop the old FLAGS value
* Pop the IP of the next instruction
* Pop the old stack frame
## Exceptions
The first 256 interrupt vectors are reserved for CPU and I/O-sourced events - these are known as
exceptions. Likewise, the first 128 exceptions are error state exceptions, with the remaining 128
being used for general exceptions.
### Error state exceptions
Error state exceptions occur when an instruction attempts to perform illegal operation,
such as attempting to read an out-of-bounds memory address or attempting to execute an invalid
opcode. Error state exceptions may be caught and handled, just like any other interrupt.
#### Double fault and triple fault
If, while already handling an error state exception, and a second error state exception is raised, a
double fault is invoked. You may handle a double fault like any exception and attempt to repair the
situation. If yet another exception is raised, the CPU will invoke a triple fault. A triple fault
will unconditionally halt the machine.
### Error state exceptions (vectors 0x00-0x0f)
* Double fault
* Interrupt vector: 0x00
* Auxiliary: The interrupt vector that was being invoked that caused the fault.
* An error state interrupt occurred while already handling an error state interrupt
* Illegal instruction
* Interrupt vector: 0x01
* Auxiliary: Memory address where illegal instruction is located
* Attempted to execute a malformed instruction
* Illegal memory address
* Interrupt vector: 0x02
* Auxiliary: Memory address causing the interrupt
* Attempted to access a memory address in an illegal way - either it's out of bounds or is
protected in some way.
* Divide by zero
* Interrupt vector: 0x03
* Auxiliary: N/A
* Invoked upon a divide-by-zero
* Remaining error states below 0x80 are reserved for future use.
### General exceptions (vectors 0x80-0xff)
* I/O event
* Interrupt vector: 0x80
* Auxiliary: Pointer to the I/O event structure.
* An I/O device has an event that needs attention.
* NOTE: This will probably be removed.
# Binary object format
The binary object format is composed of a header followed by sections that make up the content of
the object.
## Header
The header is composed of:
* 64 bits - A magic number (0xDEAD\_BEA7\_BA5E\_BA11).
* 32 bits - Version of the file
* 32 bits - The number of sections in the file
* section descriptions detailed below
## Sections
The rest of the object is a list of sections. A section's layout is a section header, followed by
the section contents.
### Section header
* 8 bits - Section kind
* 0x00 - Data
* 0xFF - Meta
* 64 bits - Length of the section
### Data section
The data section contains static data that is initialized to some known value.
* 16 bits - length of section name
* N bits - section name
* 64 bits - section load start - where in memory the content of this section begins
* 64 bits - section length - how long the memory content is
### Meta section
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
to other strings. All strings are UTF-8 encoded.
* 64 bits - the number of key-value entries
The remaining length of the section are the key-value pairs.
The layout for a key-value pair is the key, followed immediately by the value. The key is a string,
and the value is a 64-bit value. A key starts with the length of the string, followed by the key
string itself. A value is just the 8 bytes of the number.
The meta section should be used to place data that's readable by the VM, but is not used by the
executing program. Data in the meta section is not copied to the program memory.
A VM must provide support for the following meta-values:
* `ip` - the initial value for the instruction pointer (the entry point)
* `fp` - the initial value for the stack frame pointer
* If not set, its default value is the value of the stack pointer.
* `sp` - the initial value for the stack pointer
* `flags` - the initial CPU flags
* `status` - the initial value for the status register
* `ivt` - the initial value for the pointer to the IVT
* `rXX` - the initial value for register XX (0-31)
# General TODO
* Memory permissions
* MMIO regions
* Paging
* Determine how address sizes are determined
* source size <= dest size - zero extend source and copy
* mov %r0, (label)u32
* source size > dest size - truncate to dest size
* mov (label)u32, %r0
* source size with unknown dest size - use dest size == source size
* mov %r0, (label)
* unknown source size with dest size - use dest size == source size
* mov (label), %r0
* unknown source size with unknown dest size - 64 bits
* mov (label), (%r0)