Table of Contents
Appendix 4 Instruction Set Architecture
ISA Overview
The SD-8516 has core, extended, and CISC instructions.
- CORE operates like a RISC instruction set. It is designed to be completely sufficient while remaining small. This is what you should target first.
- EXTENDED are quality of life instructions. A great example is ADD register immediate. You do not need this instruction; you can load values into registers and add them. But we provide ADD register, immediate for quality of life. Technically, MUL is a QOL as well since you can loop with ADD.
- CISC instructions are special instructions, usually VAX-isms, designed to be quality of life for assembly language programmers.
Core instruction set
36 RISC-style instructions. A small core.
| 00 | LD_IMM | LDA $5 | Load register with immediate value | |
| 01 | LD_MEM | LDA [$5] | Load register from memory location | |
| 02 | LD_REG | LDA [X] | load register from memory location | |
| 05 | ST_MEM | STA [$10] | Store register in memory location | |
| 06 | ST_REG | STA [X] | Store A in memory location using register as pointer | |
| 09 | MOV | MOV Y, A | Copy A up on Y | |
| 0B | PUSH | PUSH A | Push A onto stack – ye scurvy dog! | |
| 0C | POP | POP Y | Pull stack value an' hand it over to register | |
| 0F | PUSHF | PUSHF | Push flags | |
| 10 | POPF | POPF | Pop flags back | |
| 15 | INC | INC X | Increment register by 1 (any size: byte/word/etc.) | |
| 16 | DEC | DEC Y | Decrement register by 1 (any size) | |
| 1E | ADD | ADD X, Y | Add X = X + Y | |
| 1F | SUB | SUB X, Y | Subtract X = X - Y | |
| 32 | AND | AND dst, src | Bitwise AND (compare two integers bit by bit) | |
| 33 | OR | OR dst, src | Bitwise OR (same) | |
| 34 | XOR | XOR dst, src | Bitwise XOR | |
| 35 | NOT | NOT reg | Invert all bits in an integer word | |
| 46 | SHL | SHL A | Shift left | Z, N, C |
| 47 | SHR | SHR A | Shift right | Z, N, C |
| 5A | CMP | CMP A, B | Compare (subtract, discard result) | Z, N, C, V |
| 5B | CMP_IMM | CMP A, 0x0001 | Compare (subtract, discard result) | Z, N, C, V |
| 64 | JMP | JMP @label | Unconditional jump | None |
| 65 | JZ | JZ @label | Jump if zero | None |
| 66 | JNZ | JNZ @label | Jump if not zero | None |
| 67 | JC | JC @label | Jump if carry set | None |
| 68 | JNC | JNC @label | Jump if carry clear | None |
| 84 | CALL | CALL @label | Call subroutine (push IP, jump) | |
| 85 | RET | Return from subroutine (pop IP) | ||
| 86 | INT | INT 0x10 | Software interrupt | |
| 87 | RTI | RTI | Return from Interrupt | |
| B6 | SETF | SETF 0x80 | Set bits in flags | * |
| B7 | CLRF | CLRF 0x80 | Clear bits in flags | * |
| B8 | TESTF | TESTF 0x80 | Non-destructive AND | Z C |
| FE | NOP | No operation | ||
| FF | HALT | Halt CPU (sets HALT flag) | H |
Core-2 instructions
29 instructions. Some instructions which are not strictly RISC but still considered core.
| 0A | XCHG | XCHG X, Y | Swap dem two – X an Y trade place, quick quick | |
| 20 | MUL | MUL X, Y | Multiply X = X * Y | |
| 21 | DIV | DIV X, Y | Divide X = X / Y | |
| 22 | MOD | MOD X, Y | Modulo X = X % Y | |
| 23 | ADD_REG_IMM | ADD X, $1234 | Add immediate word value; X = X + immediate | |
| 24 | SUB_REG_IMM | SUB X, $ABCD | Subtract immediate; X = X - immediate | |
| 25 | MUL_REG_IMM | MUL X, $100 | Multiply by immediate; X:Y = X * immediate | |
| 26 | DIV_REG_IMM | DIV X, $10 | Divide by immediate; X = quotient, Y = remainder | |
| 27 | MOD_REG_IMM | MOD X, $FF | Modulo by immediate; X = X % immediate | |
| 28 | ADDC | ADDC X, Y | Add with carry; X = X + Y + carry flag | |
| 29 | SUBC | SUBC X, Y | Subtract with borrow; X = X - Y - borrow | |
| 30 | ADDC_REG_IMM | ADDC X, $5 | Add imm w/carry; X = X + imm + carry | |
| 31 | SUBC_REG_IMM | SUBC X, $1 | Subtract w/carry X = X - imm - borrow | |
| 37 | AND_IMM | AND dst, imm | Bitwise AND with immediate | |
| 38 | OR_IMM | OR dst, imm | Bitwise OR with immediate |
export const JN:u8 = 103; // JN addr export const JNN:u8 = 104; // JNN addr export const JO:u8 = 107; // JO addr export const JNO:u8 = 108; // JNO addr
Extended instruction set
50 instructions. Some of these are more extended than others. For example, PUSHA and POPA are very useful because they are very fast. If you have to push or pop more than 4 registers at once you can consider PUSHA.
| 03 | LD_REG24 | LDA [X:Y] | Load register from memory location using [low_byte:word] | |
| 04 | LD_IMM24 | LDA [$1:$C000] | Load byte from memory location [bank:addr] | |
| 07 | ST_REG24 | STA [X:Y] | Store A in memory using [low_byte:word] registers. | |
| 08 | ST_IMM24 | STA [$0:$A0] | Store register in memory location [bank:addr] | |
| 0D | PUSHA | PUSHA | Save all registers in ye treasure chest | |
| 0E | POPA | POPA | Get the registers back |
PUSH2 = 17 // push 2 POP2 = 18 // pop 2 PUSH3 = 19 // push 3 POP3 = 20 // pop 3 PUSH4 = 21 // push 4 POP4 = 22 // pop 4
ST_PD 25 // pre-dec store. STA [-, BLX] LD_FS 26 // load+forward step. LDA [BLX, +] ST_FS 27 // store+forward step. STA [BLX, +]
| 36 | TEST | TEST dst, src | Non-destructive AND | |
| 48 | SHLC | SHLC A | Shift left | Z, N, C |
| 49 | SHRC | SHRC A | Shift right | Z, N, C |
| 4C | ROLC | ROLC A | Rotate left | Z, N, C |
| 4D | RORC | RORC A | Rotate right | Z, N, C |
JMPR = 109 // JMPR reg JZR = 110 // JZR reg JNZR = 111 // JNZR reg JNR = 112 // JNR reg JNNR = 113 // JNNR reg JCR = 114 // JCR reg JNCR = 115 // JNCR reg JOR = 116 // JOR reg JNOR = 117 // JNOR reg
| A4 | SEE | SEE | Set Extra Flag (User flag) | E |
| A5 | SEF | SEF | Set Free Flag (User flag) | F |
| A6 | SEB | SEB | Set Bonus Flag (User flag) | B |
| A7 | SEU | Set User Flag (User flag) | U | |
| A8 | SED | Set Debug Flag | D | |
| A9 | SEI | Set Enable Interrupt | I | |
| AA | SSI | Enable Sound System Interrupts | S | |
| AF | CLE | CLE | Clear E | E |
| B0 | CLF | CLF | Clear F Flag (user flag) | F |
| B1 | CLB | B | ||
| B2 | CLU | U | ||
| B3 | CLD | D | ||
| B4 | CLI | CLI | Clear Interrupt Flag | I |
| B5 | CSI | CSI | Clear Sound Interrupt | S |
LD_IDXI 210 // indexed immediate: LDreg, [ptr + imm] signed byte immediate, -128 to +127 LD_IDXR 211 // indexed register: LDreg, [ptr + reg] register offset ST_IDXI 212 // ST [ptr + imm], reg ST_IDXR 213 // ST [ptr + reg], reg
| FB | CNOP | CNOP | Instruction Count / Timer | |
| FC | YIELD | YIELD | Yield thread priority | Y |
| FD | BREAK | BREAK | Reserved |
CISC instruction set
24 instructions. CISC style, usually inspired by VAX, 680×0, or other CISC-leaning processors.
| 8C | MEMCOPY | MEMCOPY src, dst, n | Copy memory from ptr to ptr. | |
| 8D | SCAN | SCAN src, reg | Scan ptr src for needle reg | |
| 8E | CMPC3 | CMPC3 ELM, FLD, C | Compare Characters | Z C |
| 8F | SKPC | SKPC ELM, AL | Skip characters | |
| 90 | SKPC_IMM | SKPC ELM, $20 | Skip characters (immediate) |
| 91 | INSQUE | INSQUE | INSQUE REG, REG | |
| 92 | INSQUE_PTR | INSQUE | INSQUE REG, [REG] | |
| 93 | REMQUE | REMQUE | ||
| 94 | SCANQUE | SCANQUE | ||
| 95 | SCANQUE_IMM | SCANQUE |
| 98 | PAB | PAB | Pack low 4 bytes of A and low 4 bytes of B into AL | |
| 99 | UAB | UAB | Unpack AL into low 4 bytes of AL and low 4 bytes of BL |
| C8 | CASE | CASE AL | Jumptable instruction. Will CALL the address in IP+(reg*3) |
| C9 | CASE3 | CASE3 ELM, AL, AH | VAX-style case. takes base, selector, limit. |
| CA | CASEB | CASEB ELM, AL, AH | case-on-byte. takes base, selector, numrec. |
| CB | CVTAN | CVTAN AL | Zone converter ASCII 0-Z to Number) |
| CC | CVTNA | CVNAT AL | Zone converter 0-35 to ASCII '0' to 'Z' |
The PPU extended commands are included at the CISC level:
| #220 | PPIXEL | PPIXEL | PPU draw_pixel | |
| #221 | PLINE | PPU draw_line | | #222 | [[#PRECT] | PRECT | PPU draw_rect | | #223 | [[#PRECTF] | PRECTF | PPU draw_rect_filled | | #224 | [[#PCIRC] | PCIRC | PPU draw_circle | | #225 | [[#PCIRCF] | PCIRCF | PPU draw_circle_filled | | #226 | [[#PCLEAR] | PCLEAR | PPU clear_screen | | #227 | [[#PBLIT] | PBLIT | PPU draw tile | | #229 | [[#PTIMER] | PTIMER | PPU timer function | As are the forth acceleration opcodes: | F0 | [[#lstepm|LSTEPM | LSTEPM | Loop Step in-memory (see Dictionary for details) | Z |
| F1 | TTOS | TTOS | Test TOS. CD=AB, AB=[ELY], ELY+=4, ZF=(CD==0) | Z |
| F2 | LIT | LIT | push AB to stack, load next 4 bytes into AB. | |
| F3 | LSTEP | LSTEP A, @label | Loop Step direct (see Dictionary for details) | Z |
Dictionary
here you will find information about each opcode.
$A0 SEZ
$A1 SEN
$A2 SEC
$A3 SEV
Sets the main CPU flags zero flag, negative flag, carry flag and overflow flag. If you are absolutely sure no intervening operations set these flags you can use them. For example, carry is unaffected by POP and MOV so it is used to return error or clear from some interrupts and routines. EX. JC @error.
$A4 SEE
Sets Exception flag. The execption flag is sometimes used by interrupts or the system to indicate an error, but in the absence of anything weird you can use it yourself. Consider it an 'Extra' flag.
$A5 SEF
Sets F flag. The “Flag” flag. It's flag. The 'free' flag? Used for 'First-statement' in BASIC. Considered safe for machine language programmer use.
$A6 SEB
Sets B flag. The alternate flag. 'bonus' flag? Used for BREAK in BASIC. Considered safe for machine language programmer use.
$A7 SEU
Sets USER flag. User-facing flag, not used by the system. Considered reserved for programmer use.
$A8 SED
Sets debug. Prints trace messages when on. Slows system down a lot. Trace messages may be removed in some versions.
$A9 SEI
Enables or turns off interrupts. If off, INT won't fire. Probably useless.
$AA SSI
Sets the Sound Interrupt flag. Currently managed by the KERNAL, not really important for users. Probably useless and might be removed later. Considered 'semi-reserved'.
#140 $8C MEMCOPY src, dst, reg
Copies bytes in reg from src to dst. Handles over-writes. Very fast MMU operation. Useful for copying strings if you know the length; int12_strcpy is a tight LDA/STA/JNZ loop, but this is just one opcode.
#141 $8D SCAN ELM, reg
Scan the bytes starting at ELM for the bytes in register reg. If register is 8-bit it will search for a byte. If it's a word, it will search for the word. Can be used to find targets for further keyword matching, as it can very quickly find a sequence of 1, 2, 3 or 4 bytes in memory. Can be used to calculate string length; scan for zero and compare pointer to string start.
#142 $8E CMPC3 ELM, FLD, C
Non-zero byte compare. Useful for comparing strings. Scan up to C characters. The total number of characters scanned will be returned in C. So, C will point to either the first matching character, or will contain the string length. Sets ZERO on a match. If zero is not set, they don't match and Carry indicates if the STRCMP is -1 (not set) or +1 (set).
Note: CMPC3 allows early termination on byte_a == 0: This is the “begins with” / C-string semantics. If both strings reach a null terminator at the same position with all preceding bytes matching, the loop exits with matched=true (Z=1, C=1). Only byte_a is checked because by the time you reach this line, you've already proven byte_a == byte_b, so byte_a == 0 implies byte_b == 0. This makes CMPC3 do double duty; fixed-length compare and null-terminated strcmp in one instruction.
#143 $8F SKPC ELM, reg
Anti-scan. Scans until the character/word/etc. is not found. If a word width or wider register is used, will skip by needle width. Ends with ELM pointing to the first non-matching character. Most of the time this is used to skip spaces: SKPC ELM, $20 ; skip spaces, ELM points after last space.
REMQUE
REMQUE's Z-flag semantic is worth a kernal-side note The classic VAX semantic for REMQUE used the V flag for “queue was already empty before the call.” SD-8516's REMQUE uses Z for “queue is now empty after the call”; a different question entirely. Both are useful signals, but they're not equivalent.
- VAX V=1 “you tried to remove from an empty queue” (error condition)
- SD-8516 Z=1 “you successfully removed the last item; the queue is now empty” (state condition)
#148 $98 SCANQUE H, N, O
Both SCANQUE forms take three operands; head, needle and offset. SCANQUE_IMM treats offset as an IMM (byte). SCANQUE takes the offset in any register (allowing for larger offsets than +255).
SCANQUE. One-upping the VAX. SCANQUE combines queue traversal with a structured-field match in one fetch-execute cycle, which is the inner loop of every lookup operation in a hash chain, free list, or PCB table.
The split between SCANQUE (register offset) and SCANQUE_IMM (immediate offset) is the pragmatic call: most of the time you know the offset (you are writing CRUD for a struct). However, having the register form means you can write generic kernel-level queue helpers that take “field offset” as a parameter without recompiling. Its “one of those” tradeoffs. 99% of the time you use IMM, but the general form is the REG version.
#153 $99 PAB
Pack the low 4 bits of AL and BL into AL. This is useful for BCD and 4bpp video mode.
Start: [ AL ][ BL ]
[....llll][....hhhh]
End: [ AL ][ BL ]
[hhhhllll][....hhhh]
#153 $99 UAB
Unpack the 8 bits in AL into AL and BL and zeroes the four high bits of AL and BL. This is useful for BCD and 4bpp video mode.
Start: [ AL ][ BL ]
[hhhhllll][........]
End: [ AL ][ BL ]
[0000llll][0000hhhh]
#200 $C8 CASE selector, #limit
switch-case. Index an address from selector and CALL to it. Limit is an immediate value (0-255) that represents the length of the table. If selector > limit it will silently fall-through (not CALL). If you need to detect whether or not the CALL occurred it is suggested that the handlers produce an error code (the 'default' of which is that no handler was called). Table format: [addr][addr][addr]…
CASE is unusual because it reads from the instruction stream as data. The inline jump table isn't fetched and decoded, it's just structured bytes. This is the same pattern as LD_IMM (where the operand is fetched), but pushed further: an opcode-controlled variable-size payload. The bookkeeping all comes down to two derived addresses:
- IP + 3 * sel: the table slot containing the jump target. Read with read_24bit.
- IP + 3 * lim + 3: the first byte after the table. Used as the return point (and as the IP destination on OOB).
Both are masked to 24 bits because IP arithmetic in a 24-bit address space wraps.
OOB sets two flags and falls through (doesn't trap). VAX-style CASE would raise an exception. SC-8516's CASE sets V and ER but lets execution continue at the post-table point. That's a different policy – error recoverable – software decides whether to handle the case. If the caller wants trap-on-OOB behavior, it's a follow-up JV some_handler after the CASE.
Flags otherwise untouched. CASE only writes V and ER. Z, N, C, etc. survive across the instruction, which is convenient if the selector was computed by an arithmetic op whose flags you still want.
Note: A single entry CASE can be used as a CALLZ. CALL if register is zero, otherwise not.
#201 $C9 CASE3 base, selector, limit
switch-case. Index an address from the table at base and CALL to it. Otherwise works like CASE, falling through whether or not the CALL occurred. Table format: [addr][addr][addr]…
#202 $CA CASEB base, selector, limit
switch-case-on-byte. Index an address from the table at base and jump to it. Works like CASE3 except that limit is not checked and the records are scanned instead of computed. This means you do not need to have a complete set of indexes to the table. Table format: [selector][addr][selector][addr][selector][addr]…
#203 $CB CVTAN reg8
Convert ASCII to number. Will convert ASCII characters in the range '0-Z' into the numbers 0-35. Overflow is set if it is not a digit (0-9), and carry will be set if it is not a hexidecimal digit (0-15). This is also a fast way to test if something is a number; MOV reg8, AL and CVTAN reg8 can test for digits with JO/JNO (JV/JNV). This instruction is designed to work with zoned decimal but can also work with zoned hexidecimal (this essentially means numbers in strings).
#204 $CC CVTNA reg8
Convert number to ASCII. Will convert the number 0 to 35 in a register to the ASCII characters '0' to 'Z'. Overflow is set if it is not a digit (0-9) and Carry is set if it is not a hexidecimal (0-F). This is the inverse of CVTAN. This instruction is designed to work with zoned decimal but can also work with zoned hexidecimal (this essentially means numbers in strings).
Draw a pixel at X, Y color in C.
$F0 LSTEPM
Loop Step in-memory. A Forth acceleration opcode. First, it increments the 4-byte loop counter at [FLX]. Then it compares the value [FLX] to the 4-byte loop termination value at [FLX+4]. If the values match, it sets the Z flag. This provides the ability to create extremely tight loops, so long as you are willing to use an 8 byte loop counter and limit. For example;
LDC #0 LDD #1000
loop:
<do something> INC C CMP C, D JNZ @loop
This is a tight loop that runs 1,000 times. Many people might try to remove the CMP and use of D by doing this:
LDC #1000
loop:
<do something> DEC C JNZ @loop
This is a tight loop, but this is even tighter:
LDFLX $F000 ; loop start
LDA #1
STA [FLX]
LDA #1000
STA [FLX+4]
loop:
<do something>
LSTEPM
JNZ @loop
The benefit of this is you can count from X to Y. It is not a DEC loop. And, it's done at [FLX] and [FLX+4]. This is mainly a kludge for Forth and has been generalized in LSTEP. If I can get Forth to use LSTEP I will remove LSTEPM.
$F3 LSTEP reg, @addr
Loop Step. A generalized version of the Forth acceleration opcode. This opcode lets you create very tight loops:
LDC #1000 ; loop range
loop:
<do something>
LSTEP C, @loop
This is a decrement-and-JNZ in one opcode. It is heavily inspired by the VAX instruction SOB (subtract one and branch) and the 68000's DBcc.
