2020-01-25 19:17:39 -05:00
|
|
|
# VM
|
|
|
|
|
|
|
|
|
|
This is an outline of the VM that drives this language.
|
|
|
|
|
|
|
|
|
|
# Primitives
|
|
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
* Numbers are little endian (LE) at the byte level.
|
2020-01-25 19:17:39 -05:00
|
|
|
* Addresses point to single bytes.
|
2020-01-26 10:59:25 -05:00
|
|
|
* Signed numbers use two's complement.
|
2020-01-25 19:17:39 -05:00
|
|
|
|
|
|
|
|
| Type | Size (bits) |
|
|
|
|
|
| - | - |
|
|
|
|
|
| Address | 64 |
|
|
|
|
|
| Word | 64 |
|
|
|
|
|
| Halfword | 32 |
|
|
|
|
|
| Byte | 8 |
|
|
|
|
|
|
|
|
|
|
# Registers
|
|
|
|
|
|
|
|
|
|
CPU registers are addressed by a value between 0-63 (6 bits). All registers are 64 bits wide.
|
|
|
|
|
|
|
|
|
|
* IP - Instruction pointer
|
|
|
|
|
* SP - Stack pointer
|
|
|
|
|
* FP - Frame pointer
|
|
|
|
|
* FLAGS - CPU flags
|
2020-01-26 11:15:09 -05:00
|
|
|
* STATUS - Generic status code
|
2020-02-17 16:15:06 -05:00
|
|
|
* NIL - Always zero for reading and will never change after writing.
|
|
|
|
|
* R0-R31
|
|
|
|
|
* (26 unused registers)
|
2020-01-25 19:17:39 -05:00
|
|
|
|
|
|
|
|
## CPU Flags
|
|
|
|
|
|
|
|
|
|
CPU flags are addressed by bit index, going from right to left.
|
|
|
|
|
|
|
|
|
|
* `00` - Halt flag
|
|
|
|
|
* `01` - Compare flag
|
|
|
|
|
|
|
|
|
|
### Flag ideas
|
|
|
|
|
|
|
|
|
|
* "Trace" flag - halts the CPU when certain conditions are met that may be causing undesired
|
|
|
|
|
behavior - for debugging
|
|
|
|
|
* Overwriting a register without its value being used
|
|
|
|
|
* Mixing arithmetic with bit twiddling on the same target
|
|
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
# Instructions
|
2020-01-27 18:42:15 -05:00
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
All instructions have 16-bit opcodes. There are three types of instructions:
|
2020-01-27 18:42:15 -05:00
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
* Those whose operations require a source and a destination.
|
|
|
|
|
* Those whose operations require two sources
|
|
|
|
|
* The sources of these instructions is implied by the instruction itself; e.g. the `CMPEQ`
|
|
|
|
|
instruction implicitly sets a bit in the `FLAGS` register.
|
|
|
|
|
* Those whose operations require a source, but no destination.
|
|
|
|
|
* Those whose operations require a destination, but no source.
|
|
|
|
|
* There aren't any of these instructions yet
|
|
|
|
|
* Those whose operations require neither a source nor a destination.
|
|
|
|
|
|
|
|
|
|
Destinations may be:
|
|
|
|
|
|
|
|
|
|
* A 64-bit address pointing at a 64-bit or 8-bit value
|
|
|
|
|
* A 6-bit register
|
|
|
|
|
|
|
|
|
|
Sources may be one of:
|
|
|
|
|
|
|
|
|
|
* A 64-bit address pointing at a 64-bit or 8-bit value
|
|
|
|
|
* A 6-bit register
|
|
|
|
|
* A 64-bit immediate value
|
|
|
|
|
|
|
|
|
|
Counting all source and destination value sizes as their own configuration, there are:
|
|
|
|
|
|
|
|
|
|
* 3 possible destination types
|
|
|
|
|
* 4 possible source types
|
|
|
|
|
|
|
|
|
|
Instructions have different layouts depending on whether its operation takes a source and/or
|
|
|
|
|
destination. For example, the `ADD` instruction takes a source and a destination, the `JMP`
|
|
|
|
|
instruction takes a source, and the `NOP` instruction takes neither a source nor a destination.
|
|
|
|
|
|
|
|
|
|
For instructions that take neither a source nor a destination, they are simply 16 bits long and
|
|
|
|
|
that's that. All other instructions are followed by a byte determining its source and/or
|
|
|
|
|
destination.
|
|
|
|
|
|
|
|
|
|
An instruction that has a source and destination looks like this:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
An instruction that has either a source or a destination (but not both) looks like this:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
An instruction that has neither a source nor a destination looks like this:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
| XXXXXXXX | XXXXXXXX |
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Source/destination flags
|
|
|
|
|
|
|
|
|
|
| Bits | Source/destination |
|
|
|
|
|
| - | - |
|
|
|
|
|
| 0b0000 | Address (64 bit value) |
|
|
|
|
|
| 0b0001 | Address (32 bit value) |
|
|
|
|
|
| 0b0010 | Address (16 bit value) |
|
|
|
|
|
| 0b0011 | Address (8 bit value) |
|
|
|
|
|
| 0b0100 | 6-bit register |
|
|
|
|
|
| 0b0101 | Immediate (64 bits, source only) |
|
|
|
|
|
| 0b0110 | Immediate (32 bits, source only) |
|
|
|
|
|
| 0b0111 | Immediate (16 bits, source only) |
|
|
|
|
|
| 0b1000 | Immediate (8 bits, source only) |
|
2020-01-25 19:17:39 -05:00
|
|
|
|
2020-01-28 19:16:52 -05:00
|
|
|
|
2020-01-25 19:17:39 -05:00
|
|
|
## Arithmetic
|
|
|
|
|
|
2020-01-26 10:59:25 -05:00
|
|
|
Arithmetic instructions store their result in the first register specified. Overflow is handled by
|
|
|
|
|
wrapping around to 0.
|
|
|
|
|
|
2020-01-25 19:17:39 -05:00
|
|
|
* Add
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0000
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
|
|
|
|
* Sub
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0001
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
|
|
|
|
* Mul
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0002
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
|
|
|
|
* Div
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0003
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
|
|
|
|
* Mod
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0004
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* And
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0005
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* Or
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x0006
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* Xor
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x0007
|
|
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* Shl
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x0008
|
|
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* Shr
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x0009
|
|
|
|
|
* Params: Destination, source
|
|
|
|
|
* INeg
|
|
|
|
|
* Opcode: 0x000a
|
|
|
|
|
* Params: Destination, source
|
|
|
|
|
* Inv
|
|
|
|
|
* Opcode: 0x000b
|
|
|
|
|
* Params: Destination, source
|
|
|
|
|
* Not
|
|
|
|
|
* Opcode: 0x000c
|
|
|
|
|
* Params: Destination, source
|
2020-01-25 19:17:39 -05:00
|
|
|
|
2020-01-26 10:59:25 -05:00
|
|
|
### TODO
|
|
|
|
|
|
|
|
|
|
* Add signed instructions (iadd, imul, etc)
|
2020-01-26 11:15:09 -05:00
|
|
|
* Sign-extending SHR
|
2020-01-26 10:59:25 -05:00
|
|
|
* Overflow flag?
|
|
|
|
|
|
2020-01-25 19:17:39 -05:00
|
|
|
## Control flow
|
|
|
|
|
|
|
|
|
|
* CmpEq
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x1000
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Source, source
|
2020-01-25 19:17:39 -05:00
|
|
|
* CmpLt
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x1001
|
2020-02-17 16:15:06 -05:00
|
|
|
* Params: Source, source
|
2020-01-28 19:27:25 -05:00
|
|
|
* Jmp
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x1002
|
|
|
|
|
* Params: Source
|
2020-01-28 19:27:25 -05:00
|
|
|
* Jz
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x1003
|
|
|
|
|
* Params: Source
|
2020-01-25 19:17:39 -05:00
|
|
|
* Jnz
|
2020-02-17 16:15:06 -05:00
|
|
|
* Opcode: 0x1004
|
|
|
|
|
* Params: Source
|
2020-01-25 19:17:39 -05:00
|
|
|
|
|
|
|
|
## Data movement
|
|
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
* Mov
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0x2000
|
2020-01-26 11:15:09 -05:00
|
|
|
|
2020-01-26 11:17:21 -05:00
|
|
|
## Miscellaneous
|
|
|
|
|
|
|
|
|
|
* Halt
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0xF000
|
2020-01-26 11:17:21 -05:00
|
|
|
* Nop
|
2020-01-28 19:16:52 -05:00
|
|
|
* Opcode: 0xF001
|
2020-02-17 16:15:06 -05:00
|
|
|
* Dump
|
|
|
|
|
* Opcode: 0xF002
|
2020-01-26 11:17:21 -05:00
|
|
|
|
2020-01-26 11:15:09 -05:00
|
|
|
## Other instructions TODO
|
|
|
|
|
|
|
|
|
|
* Call
|
|
|
|
|
* Takes address and number of bytes on the stack that are for args(?)
|
|
|
|
|
* Updates SP, FP, IP, storing previous values starting at the new FP
|
|
|
|
|
* Ret
|
|
|
|
|
* Uses FP to determine previous SP, FP, and IP and restores them
|
|
|
|
|
* Push
|
|
|
|
|
* Pop
|
|
|
|
|
|
2020-01-28 18:15:07 -05:00
|
|
|
# Binary object format
|
2020-01-28 18:12:31 -05:00
|
|
|
|
2020-01-28 18:15:07 -05:00
|
|
|
The binary object format is composed of a header followed by sections that make up the content of
|
|
|
|
|
the object.
|
2020-01-28 18:12:31 -05:00
|
|
|
|
|
|
|
|
## Header
|
|
|
|
|
|
|
|
|
|
The header is composed of:
|
|
|
|
|
|
2020-02-17 16:15:06 -05:00
|
|
|
* 64 bits - A magic number (0xDEAD\_BEA7\_BA5E\_BA11).
|
2020-02-09 13:04:56 -05:00
|
|
|
* 32 bits - Version of the file
|
|
|
|
|
* 32 bits - The number of sections in the file
|
2020-01-28 18:12:31 -05:00
|
|
|
* section descriptions detailed below
|
|
|
|
|
|
|
|
|
|
## Sections
|
|
|
|
|
|
2020-01-28 18:15:07 -05:00
|
|
|
The rest of the object is a list of sections. A section's layout is a section header, followed by
|
2020-01-28 18:12:31 -05:00
|
|
|
the section contents.
|
|
|
|
|
|
|
|
|
|
### Section header
|
|
|
|
|
|
|
|
|
|
* 8 bits - Section kind
|
|
|
|
|
* 0x00 - Data
|
|
|
|
|
* 0x10 - Code
|
|
|
|
|
* 0xFF - Meta
|
|
|
|
|
* 64 bits - Length of the section
|
|
|
|
|
|
|
|
|
|
### Data section
|
|
|
|
|
|
|
|
|
|
The data section contains static data that is initialized to some known value.
|
|
|
|
|
|
2020-02-09 13:04:56 -05:00
|
|
|
* 64 bits - section load start - where in memory the content of this section begins
|
2020-02-17 16:15:06 -05:00
|
|
|
* 64 bits - section length - how long the memory content is
|
2020-01-28 18:12:31 -05:00
|
|
|
|
|
|
|
|
### Code section
|
|
|
|
|
|
|
|
|
|
The code section contains executable code.
|
|
|
|
|
|
2020-02-09 13:04:56 -05:00
|
|
|
* 64 bits - section load start - where in memory the content of this section begins
|
|
|
|
|
* 64 bits - section load end - where in memory the content of this section ends
|
2020-01-28 18:12:31 -05:00
|
|
|
|
|
|
|
|
The remaining length of the section is the code itself.
|
|
|
|
|
|
|
|
|
|
### Meta section
|
|
|
|
|
|
|
|
|
|
The meta section holds a table of metadata about the binary in a key-value format of strings mapping
|
|
|
|
|
to other strings. All strings are UTF-8 encoded.
|
|
|
|
|
|
|
|
|
|
* 64 bits - the number of key-value entries
|
|
|
|
|
|
|
|
|
|
The remaining length of the section are the key-value pairs.
|
|
|
|
|
|
2020-02-09 13:04:56 -05:00
|
|
|
The layout for a key-value pair is the key, followed immediately by the value. The key is a string,
|
|
|
|
|
and the value is a 64-bit value. A key starts with the length of the string, followed by the key
|
|
|
|
|
string itself. A value is just the 8 bytes of the number.
|
2020-01-28 18:12:31 -05:00
|
|
|
|
|
|
|
|
The meta section should be used to place data that's readable by the VM, but is not used by the
|
|
|
|
|
executing program. Data in the meta section is not copied to the program memory.
|
|
|
|
|
|
|
|
|
|
A VM must provide support for the following meta-values:
|
|
|
|
|
|
|
|
|
|
* `entry` - a 64-bit address for where the VM should begin executing code.
|
|
|
|
|
|
2020-01-26 11:15:09 -05:00
|
|
|
# General TODO
|
|
|
|
|
|
|
|
|
|
* Interrupts
|
|
|
|
|
* MMIO regions
|
|
|
|
|
* Paging?
|