This is an old revision of the document!
Table of Contents
SD-8516 PPU
- Introducing the XY-2000 PPU.
Introduction: PPIXEL (plot_pixel)
| Method | Pixels per Second |
|---|---|
| BASIC PIXEL command | 2100 |
| INT 18h plot pixel | 26,000 |
| INT 18h plot pixel (no bounds check) | 30,000 |
| PPU via INT 0x18 | 90,000 |
| PPU via INT 0x03 (direct) | 220,000 |
| PPIXEL (PPU via opcode) | 460,000 |
The number shows the pixels per second of the screen being cleared in a tight loop. This essentially represents the fastest practical use for plot pixel; read X, Y and C data in a loop and plot it. If we unroll the loop by 20 times, the difference in speed between the INT 3 version and the OP.PPIXEL version remains about 2x, but you lose the loop overhead. The opcode version remains relatively 2x faster. This makes sense, since every call to INT 0x03 requires a LDA $0101 to access plot_pixel; the PPIXEL command gives this for free. Therefore PPIXEL is always at least twice as faster than INT 0x03 plot_pixel. It's actually a bit faster since it does not cross the host-guest bride inside the opcode call.
- However, if all you are doing is loading XYC data and calling INT $03, then theoretically their performance will be within 5%-10% of each other; INT touches memory moreso than PPIXEL directly, but this amounts to an edge case; 31 vs 30 active sprites; not worth the enginering headache.
460,000 pixels/second is 7,600 pixels per frame at 60fps. That is almost exactly 30 16×16 sprites. Suggested use is to MEMCOPY a pre-drawn background into the frame-buffer and then draw sprites with PPIXEL (if you want to use PPIXEL for sprites). This way you don't have to draw the whole screen, and you don't have to draw the sprite background. It is hard to say how many sprites you will actually be able to draw in a frame because each sprite has more or less empty background space.
This initial test proves that a PPU construct has value, that it can serve as a drop-in boost to the code in INT 18h, that it should be initially implemented as a subcall of AH=$01, INT $03, and that the move to a dedicated opcode's practical value is 95% code density and 5% improved speed.
Code Replacement
PPIXEL can be called via INT 0x03:
LDA $0100 ; AH = 1 (PPU dispatch), AL = 00 (plot_pixel) INT $03 ; plot_pixel(X, Y, C)
It replaces the old plot pixel function in the graphics library (AH $01 INT $18):
; ============================================================================
; AH=01h - Plot MODE 3 Pixel (4bpp packed nibbles)
; ============================================================================
; Input: X = x coordinate (mode dependant)
; Y = y coordinate (mode dependant)
; C = color (0-15)
; Output: CF = 0 on success, 1 if out of bounds
; ============================================================================
int18_plot_pixel_4bpp:
PUSHA
; Bounds check using SCREEN_WIDTH / SCREEN_HEIGHT
LDA [@SCREEN_WIDTH]
CMP X, A
JC @plot4_error
LDA [@SCREEN_HEIGHT]
CMP Y, A
JC @plot4_error
PUSH X ; save x
PUSH C ; save color
LDA [@SCREEN_WIDTH]
SHR A ; A = stride (width / 2)
MOV B, A ; B = stride
MOV A, Y ; A = y
MUL A, B ; A = y * stride
POP C ; restore color
POP B ; restore x into B
PUSH B ; save x for nibble check
MOV T, B
SHR T
ADD A, T ; A = byte offset
MOV J, A
LDI #2
POP A ; A = x
LDTL $01
AND AL, TL
CMP AL, $00
JNZ @plot4_odd
plot4_even:
LDBL [I:J]
MOV AL, BL
UAB ; AL = odd pixel, BL = even pixel
MOV BL, CL ; BL = new color
PAB ; AL = (new color << 4) | odd pixel
MOV BL, AL
STBL [I:J]
POPA
CLC
RET
plot4_odd:
LDBL [I:J]
MOV AL, BL
UAB ; AL = odd pixel, BL = even pixel
MOV AL, CL ; AL = new color
PAB ; AL = (even pixel << 4) | new color
MOV BL, AL
STBL [I:J]
POPA
CLC
RET
plot4_error:
POPA
SEC
RET
