Use lrpar for parsing, big 'ol syntax overhaul

Signed-off-by: Alek Ratzloff <alekratz@gmail.com>
This commit is contained in:
2020-02-17 16:15:06 -05:00
parent cf9ba376aa
commit 2c4b56e362
23 changed files with 1394 additions and 1494 deletions

443
vm.md
View File

@@ -4,7 +4,7 @@ This is an outline of the VM that drives this language.
# Primitives
* Numbers may be big endian (BE) or little endian (LE) at the byte level. This guide will use LE.
* Numbers are little endian (LE) at the byte level.
* Addresses point to single bytes.
* Signed numbers use two's complement.
@@ -23,10 +23,10 @@ CPU registers are addressed by a value between 0-63 (6 bits). All registers are
* SP - Stack pointer
* FP - Frame pointer
* FLAGS - CPU flags
* NULL - Always zero for reading and will never change after writing.
* (8 unused registers)
* STATUS - Generic status code
* R0-R49
* NIL - Always zero for reading and will never change after writing.
* R0-R31
* (26 unused registers)
## CPU Flags
@@ -42,14 +42,75 @@ CPU flags are addressed by bit index, going from right to left.
* Overwriting a register without its value being used
* Mixing arithmetic with bit twiddling on the same target
## Register ideas
* Other possible names: Z, NIL
# Instructions
Instructions attempt to be as small as possible while conforming to 8-bit, 16-bit, 32-bit, or 64-bit
alignment. All instructions have 16-bit opcodes.
All instructions have 16-bit opcodes. There are three types of instructions:
* Those whose operations require a source and a destination.
* Those whose operations require two sources
* The sources of these instructions is implied by the instruction itself; e.g. the `CMPEQ`
instruction implicitly sets a bit in the `FLAGS` register.
* Those whose operations require a source, but no destination.
* Those whose operations require a destination, but no source.
* There aren't any of these instructions yet
* Those whose operations require neither a source nor a destination.
Destinations may be:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
Sources may be one of:
* A 64-bit address pointing at a 64-bit or 8-bit value
* A 6-bit register
* A 64-bit immediate value
Counting all source and destination value sizes as their own configuration, there are:
* 3 possible destination types
* 4 possible source types
Instructions have different layouts depending on whether its operation takes a source and/or
destination. For example, the `ADD` instruction takes a source and a destination, the `JMP`
instruction takes a source, and the `NOP` instruction takes neither a source nor a destination.
For instructions that take neither a source nor a destination, they are simply 16 bits long and
that's that. All other instructions are followed by a byte determining its source and/or
destination.
An instruction that has a source and destination looks like this:
```
| XXXXXXXX | XXXXXXXX | DDDDSSSS | ...source and destination |
```
An instruction that has either a source or a destination (but not both) looks like this:
```
| XXXXXXXX | XXXXXXXX | YYYY0000 | ...source or destination |
```
An instruction that has neither a source nor a destination looks like this:
```
| XXXXXXXX | XXXXXXXX |
```
## Source/destination flags
| Bits | Source/destination |
| - | - |
| 0b0000 | Address (64 bit value) |
| 0b0001 | Address (32 bit value) |
| 0b0010 | Address (16 bit value) |
| 0b0011 | Address (8 bit value) |
| 0b0100 | 6-bit register |
| 0b0101 | Immediate (64 bits, source only) |
| 0b0110 | Immediate (32 bits, source only) |
| 0b0111 | Immediate (16 bits, source only) |
| 0b1000 | Immediate (8 bits, source only) |
## Arithmetic
@@ -58,160 +119,43 @@ wrapping around to 0.
* Add
* Opcode: 0x0000
* **Params**: REG1, REG2
* `REG1 = REG1 + REG2`
* Unsigned addition
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Mul
* Params: Destination, source
* Sub
* Opcode: 0x0001
* **Params**: REG1, REG2
* `REG1 = REG1 * REG2`
* Unsigned multiplication
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Div
* Params: Destination, source
* Mul
* Opcode: 0x0002
* **Params**: REG1, REG2
* `REG1 = REG1 / REG2`
* Unsigned division
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000010 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Mod
* Params: Destination, source
* Div
* Opcode: 0x0003
* **Params**: REG1, REG2
* `REG1 = REG1 % REG2` (exact semantics TBD)
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000011 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* INeg
* Params: Destination, source
* Mod
* Opcode: 0x0004
* **Params**: REG1
* `REG1 = REG1 * -1`
* Signed negative
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000000100 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Params: Destination, source
* And
* Opcode: 0x0005
* **Params**: REG1, REG2
* `REG1 = REG1 & REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000101 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Params: Destination, source
* Or
* Opcode: 0x0006
* **Params**: REG1, REG2
* `REG1 = REG1 | REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000000110 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Inv
* Opcode: 0x0007
* **Params**: REG1
* `REG1 = ~REG1`
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000000111 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Not
* Opcode: 0x0008
* **Params**: REG1
* ```
if REG1 == 0 {
REG1 = 0;
} else {
REG1 = 1;
}
```
* Boolean NOT; equivalent of C's `!` unary operator
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0000000000001000 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Params: Destination, source
* Xor
* Opcode: 0x0009
* **Params**: REG1, REG2
* `REG1 = REG1 ^ REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Opcode: 0x0007
* Params: Destination, source
* Shl
* Opcode: 0x000A
* **Params**: REG1, REG2
* `REG1 = REG1 << REG2`
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001010 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Opcode: 0x0008
* Params: Destination, source
* Shr
* Opcode: 0x000B
* **Params**: REG1, REG2
* `REG1 = REG1 >> REG2`
* Does not sign extend
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0000000000001011 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Opcode: 0x0009
* Params: Destination, source
* INeg
* Opcode: 0x000a
* Params: Destination, source
* Inv
* Opcode: 0x000b
* Params: Destination, source
* Not
* Opcode: 0x000c
* Params: Destination, source
### TODO
@@ -223,196 +167,33 @@ wrapping around to 0.
* CmpEq
* Opcode: 0x1000
* **Params**: REG1, REG2
* ```
if REG1 == REG2 {
FLAGS[1] = 1;
} else {
FLAGS[1] = 0;
}
```
* Sets the COMPARE flag to 1 if REG1 == REG2
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0001000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Params: Source, source
* CmpLt
* Opcode: 0x1001
* **Params**: REG1, REG2
* ```
if REG1 < REG2 {
FLAGS[1] = 1;
} else {
FLAGS[1] = 0;
}
```
* Sets the COMPARE flag to 1 if REG1 < REG2
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0001000000000001 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* Params: Source, source
* Jmp
* Opcode: 0x1100
* **Params**: REG1
* `IP = REG1;`
* Jumps to the address in REG1 unconditionally.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000000 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Jz
* Opcode: 0x1101
* **Params**: REG1
* ```
if FLAGS[1] == 0 {
IP = REG1;
}
```
* Jumps to the address in REG1 if COMPARE flag is 0.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000001 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Jnz
* Opcode: 0x1002
* **Params**: REG1
* ```
if FLAGS[1] != 0 {
IP = REG1;
}
```
* Jumps to the address in REG1 if COMPARE flag is 1.
* ```
32 16 10 0
opcode reg1 unused
/ / /
+----------------------------------------+
| 0001000100000002 | ...... | XXXXXXXXXX |
+----------------------------------------+
```
* Params: Source
* Jz
* Opcode: 0x1003
* Params: Source
* Jnz
* Opcode: 0x1004
* Params: Source
## Data movement
* Load
* Mov
* Opcode: 0x2000
* **Params**: REG1, REG2
* ```
REG1 = MEM[REG2];
```
* Sets REG1 to the value at the memory address in REG2.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010000000000000 | ...... | ...... | XXXX |
+-------------------------------------------+
```
* RegCopy
* Opcode: 0x2001
* **Params**: REG1, REG2
* `REG1 = REG2`
* Copies the value in REG2 into REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010000000000001 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
* StoreImm64
* Opcode: 0x2100
* **Params**: REG1, IMM_64
* `REG1 = IMM_64`
* Sets REG1 to the specified 64-bit number.
* StoreImm32
* Opcode: 0x2101
* **Params**: REG1, IMM_32
* `REG1 = IMM_32`
* Sets REG1 to the specified 32-bit number.
* ```
64 48 42 36 32 0
opcode reg1 reg2 unused
/ / / / immediate 32 bit value
/ / / / /
+------------------------------------------------------------------------------+
| 0010000100000001 | REG1.. | REG2.. | XXXX | IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII |
+------------------------------------------------------------------------------+
```
* MemCopy
* Opcode: 0x2200
* **Params**: REG1, REG2
* `MEM[REG1] = MEM[REG2]`
* Copies the value at the memory address in REG2 to the memory address in REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010001000000000 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
* Store
* Opcode: 0x2201
* **Params**: REG1, REG2
* ```
MEM[REG2] = REG1;
```
* Sets the value at the memory address in REG2 to the value in REG1.
* ```
32 16 10 4 0
opcode reg1 reg2 unused
/ / / /
+-------------------------------------------+
| 0010001000000001 | REG1.. | REG2.. | XXXX |
+-------------------------------------------+
```
## Miscellaneous
* Halt
* Opcode: 0xF000
* **Params**: (none)
* `FLAGS[0] = 1`
* Halts the machine
* ```
16
opcode
/
+------------------+
| 1111000000000000 |
+------------------+
```
* Nop
* Opcode: 0xF001
* **Params**: (none)
* Does nothing
* ```
16
opcode
/
+------------------+
| 1111000000000001 |
+------------------+
```
* Dump
* Opcode: 0xF002
## Other instructions TODO
@@ -423,8 +204,6 @@ wrapping around to 0.
* Uses FP to determine previous SP, FP, and IP and restores them
* Push
* Pop
* More immediate stores?
* Idea: Store42 (or whatever number of bits) that maximizes the usage of a 64-bit instruction
# Binary object format
@@ -435,7 +214,7 @@ the object.
The header is composed of:
* 64 bits - A magic number (0xDEAD_BEA7_BA5E_BA11).
* 64 bits - A magic number (0xDEAD\_BEA7\_BA5E\_BA11).
* 32 bits - Version of the file
* 32 bits - The number of sections in the file
* section descriptions detailed below
@@ -458,7 +237,7 @@ the section contents.
The data section contains static data that is initialized to some known value.
* 64 bits - section load start - where in memory the content of this section begins
* 64 bits - section load end - where in memory the content of this section ends
* 64 bits - section length - how long the memory content is
### Code section