sd:sd-8516_ppu
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| sd:sd-8516_ppu [2026/05/02 22:46] – appledog | sd:sd-8516_ppu [2026/05/18 16:08] (current) – appledog | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| = SD-8516 PPU | = SD-8516 PPU | ||
| - | This is a short reference to the SD-8516 PPU, known as the Akira Engine. The PPU is part of an arcade board called | + | This is a short reference to the graphics capabilities |
| - | The Akira Engine | + | The PPU is a proprietary 32-bit central processing unit (CPU) developed for the XY-85 arcade board. It was designed to deliver high-performance framebuffer access and blit hundreds of sprites |
| == Introduction | == Introduction | ||
| Line 9: | Line 9: | ||
| == Primitives | == Primitives | ||
| === PPIXEL (plot_pixel) | === PPIXEL (plot_pixel) | ||
| - | ^ Method ^ Pixels per Second | + | ^ Method ^ Pixels per second (WASM) |
| | BASIC PIXEL command | 2,100 | | | BASIC PIXEL command | 2,100 | | ||
| | INT 18h plot pixel | 26,000 | | | INT 18h plot pixel | 26,000 | | ||
| | INT 18h plot pixel (no bounds check) | 30,000 | | | INT 18h plot pixel (no bounds check) | 30,000 | | ||
| | PPU via INT 0x18 | 90,000 | | | PPU via INT 0x18 | 90,000 | | ||
| - | | PPU via INT 0x03 (direct) | 220,000 | | + | | PPU via INT 0x03 (direct) | 660,000 | |
| - | | PPIXEL (PPU via opcode) | 460,000 | | + | | PPIXEL (unrolled 1,000 times) | 1,300,000 | |
| - | The number shows the pixels per second | + | The numbers above only show that the plot rate is bound by the CPU. At one instruction per cycle, 1.28 MIPS gets ups 1.28 PIPS (pixels per second). However, once you start adding in commands that load data, process color and so on, the number can drop. A more realistic number might be closer to 300,000 pixels per second at 1 MIPS. Good enough for a bitmapped font, good enough for a couple of sprites, good enough |
| - | 460,000 pixels/ | + | How useful |
| - | This initial test proves that a PPU construct has value, that it can serve as a drop-in boost to the code in INT 18h, that it should be initially implemented as a subcall of AH=$01, INT $03, and that the move to a dedicated opcode' | + | Notably, the Apple ][ and VIC-20 had no PPU and relied on the 6502 for all graphics. Therefore, to approximate |
| - | ==== Code Replacement | + | === PLINE |
| - | PPIXEL can be called via INT 0x03: | + | The case for PLINE is the case for the Atari Digital Vector Generator chip used in Asteroids (1979) and many other games. |
| - | LDA $0100 ; AH = 1 (PPU dispatch), AL = 00 (plot_pixel) | ||
| - | INT $03 ; plot_pixel(X, | ||
| - | |||
| - | PPU plot_pixel replaces the old plot pixel function in the graphics library (function #1 in INT 18h). This is some of the oldest code in the entire system, likely carried over from the SD-8510/ | ||
| - | |||
| - | * [I:J] addressing mode | ||
| - | * Manual bank 2 register load (LDI #2) | ||
| - | * Register pressure PUSH X, PUSH C | ||
| - | * Overuse of LDA at start, then shows LDB before end | ||
| - | * Only uses A, | ||
| - | |||
| - | This was one of the first routines ever written for the original VC-2, as a test to draw characters to a terminal screen. This was at the time I was constructing the cpu emulator itself; during the writing of this function I added the LDB, PAB and UAB opcodes. That's a good reason to keep this code around, it serves as the design document for the framebuffer and renderer itself. In other words, "it works." | ||
| - | |||
| - | <codify armasm> | ||
| - | ; ============================================================================ | ||
| - | ; AH=01h - Plot MODE 3 Pixel (4bpp packed nibbles) | ||
| - | ; ============================================================================ | ||
| - | ; Input: | ||
| - | ; Y = y coordinate (mode dependant) | ||
| - | ; C = color (0-15) | ||
| - | ; Output: CF = 0 on success, 1 if out of bounds | ||
| - | ; ============================================================================ | ||
| - | int18_plot_pixel_4bpp: | ||
| - | PUSHA | ||
| - | |||
| - | ; Bounds check using SCREEN_WIDTH / SCREEN_HEIGHT | ||
| - | LDA [@SCREEN_WIDTH] | ||
| - | CMP X, A | ||
| - | JC @plot4_error | ||
| - | LDA [@SCREEN_HEIGHT] | ||
| - | CMP Y, A | ||
| - | JC @plot4_error | ||
| - | |||
| - | PUSH X ; save x | ||
| - | PUSH C ; save color | ||
| - | |||
| - | LDA [@SCREEN_WIDTH] | ||
| - | SHR A ; A = stride (width / 2) | ||
| - | MOV B, A ; B = stride | ||
| - | MOV A, Y ; A = y | ||
| - | MUL A, B ; A = y * stride | ||
| - | |||
| - | POP C ; restore color | ||
| - | POP B ; restore x into B | ||
| - | PUSH B ; save x for nibble check | ||
| - | MOV T, B | ||
| - | SHR T | ||
| - | ADD A, T ; A = byte offset | ||
| - | |||
| - | MOV J, A | ||
| - | LDI #2 | ||
| - | |||
| - | POP A ; A = x | ||
| - | LDTL $01 | ||
| - | AND AL, TL | ||
| - | CMP AL, $00 | ||
| - | JNZ @plot4_odd | ||
| - | |||
| - | plot4_even: | ||
| - | LDBL [I:J] | ||
| - | MOV AL, BL | ||
| - | UAB ; AL = odd pixel, BL = even pixel | ||
| - | MOV BL, CL ; BL = new color | ||
| - | PAB ; AL = (new color << 4) | odd pixel | ||
| - | MOV BL, AL | ||
| - | STBL [I:J] | ||
| - | POPA | ||
| - | CLC | ||
| - | RET | ||
| - | |||
| - | plot4_odd: | ||
| - | LDBL [I:J] | ||
| - | MOV AL, BL | ||
| - | UAB ; AL = odd pixel, BL = even pixel | ||
| - | MOV AL, CL ; AL = new color | ||
| - | PAB ; AL = (even pixel << 4) | new color | ||
| - | MOV BL, AL | ||
| - | STBL [I:J] | ||
| - | POPA | ||
| - | CLC | ||
| - | RET | ||
| - | |||
| - | plot4_error: | ||
| - | POPA | ||
| - | SEC | ||
| - | RET | ||
| - | </ | ||
| - | |||
| - | === PLINE | ||
| These figures are for screen clears using LINE in a Y=0 to 199 loop. | These figures are for screen clears using LINE in a Y=0 to 199 loop. | ||
| - | ^ Method ^ Pixels | + | ^ Method ^ Lines per Second ^ |
| - | | BASIC LINE command | 22,000 | | + | | BASIC LINE command | 1,000 | |
| - | | INT 18h draw line | 26,000 | | + | | INT 18h draw line | 2,500 | |
| - | | BASIC using PPU via INT 0x18 | 156,000 | | + | | BASIC using PPU via INT 0x18 | 15,000 | |
| - | | BASIC using PPU direct call | 165,000 | | + | | BASIC using PPU direct call | 16,000 | |
| - | | PPU via INT 0x18 | 15,000,000 | | + | | PPU via INT 0x18 | 150,000 | |
| - | | PPU via INT 0x03 (direct) | 55,000,000 | | + | | PPU via INT 0x03 (direct) | 500,000 | |
| - | | PLINE (PPU via opcode) | 77,000,000 | | + | | PLINE (PPU via opcode) | 700,000 | |
| - | As it turns out the line drawing algorithm, even though it is in assembly, is the limiting factor. The BASIC interpreter might execute 50 to 100 lines of assembly to get to INT 18h, but INT 18h is executing over 13,000 lines of assembly to draw a single horizontal line from 0,0 to 319,0. Thus replacing INT $18 with a PLINE opcode increased speed 7.5x to 165,000 pps fill rate. This implies 5-10 16x16 sprites with a smart draw algorithm. Of course, | + | As it turns out the line drawing algorithm is also 1 cycle bound. Meaning, |
| - | The big win comes from the direct call in assembly. The speedup is equivalent | + | An example would be writing an ELITE clone. Clocking down to 0.35MIPS and using the INT 0x03 interface, you would still be comfortably north of 30,000 lines per second |
| - | + | ||
| - | ==== Fast tiles and sprites in BASIC with PLINE | + | |
| - | If you have a smart drawing routine | + | |
| === PRECT and PFRECT | === PRECT and PFRECT | ||
| '' | '' | ||
| + | |||
| + | I will not comment much on this or on the next commands (PCIRCLE) except to say, they are much less frequently used, but, as far as draw primitives go they are lightning fast. | ||
| === PCIRCLE and PFCIRCLE | === PCIRCLE and PFCIRCLE | ||
| Line 144: | Line 54: | ||
| | large | 4500/sec | Art | | | large | 4500/sec | Art | | ||
| - | For any meaningful use case, this is probably good enough. At 100,000+ circles | + | Again, not much to say, there is a use for this in some games, but I expect |
| === PCLEAR | === PCLEAR | ||
| Line 150: | Line 60: | ||
| == Sprites | == Sprites | ||
| - | Sprites are arguably the reason for a PPU, and why it's called a PPU and not a 2d accelerator. | + | Sprites are arguably the reason for a PPU, and why it's called a PPU and not a 2d accelerator. Early consoles all had sprites, in contrast to microcomputers which didn' |
| - | * INT 18h LOAD and DRAW for 4bpp and 8bpp: Tested and working | + | * Atari 2600 5 sprites (everything is a scanline) |
| + | * Intellivision 8 sprites (no scanline limits) | ||
| + | * Colecovsion 32 sprites (4 per scanline) | ||
| + | * NES 64 sprites (8 per scanline) | ||
| + | * SNES 128 sprites (32 per scanline) | ||
| - | The INT 18h DRAW function | + | Our INT 18h sprite functions are mode-aware (4bpp and 8bpp) but slow. INT 18h DRAW can draw a 16x16 sprite in 1.3ms. This is very slow. |
| < | < | ||
| Line 182: | Line 96: | ||
| JMP @spritebench_loop | JMP @spritebench_loop | ||
| </ | </ | ||
| - | |||
| - | If you take out the spin-wait loop, the clear screen and the VSTEP instruction, | ||
| - | |||
| - | The conclusion is, it works, but 1.3ms per draw is essentially unusable. Even a ColecoVision or a NES had more sprite power than the SD-8516 -- //but not without a PPU!// The PPU provides hardware accelerated sprites, and that's just what we need here. | ||
| === Wiring up the PPU | === Wiring up the PPU | ||
| Line 259: | Line 169: | ||
| </ | </ | ||
| - | This test attempts to draw a sprite to every valid screen location; from 0,0 to 303,183 -- a total of 55,936 times. Can you guess how fast it was? The routine completed in 0.45. That's right, calibre 0.45 -- 0.45 microseconds | + | This test attempts to draw a sprite to every valid screen location; from 0,0 to 303,183 -- a total of 55,936 times. Can you guess how fast it was? The routine completed in 0.45 microseconds. This is very fast, and very good! At this speed, we could draw 3600 sprites per frame. But this doesn' |
| === Syncing the Frames | === Syncing the Frames | ||
| Line 273: | Line 183: | ||
| * Up to 5600 sprites //per frame// at 30fps | * Up to 5600 sprites //per frame// at 30fps | ||
| - | Now, if you reserve 50% of your game for logic, that's 1024 sprites per frame at 60fps. The SNES could do 128 (but only 32 per scanline). This result places us firmly in the super-high end early 90s arcade board territory; Sega Y board (1988) was the first board to crack the 2000 barrier, while even later boards like the SNK Neo Geo (1990) were hardware limited to 381 sprites. It was boards like this that enabled the superscalar arcade games of the 90s. Having a microcomputer with this kind of graphics powerhouse would have been the dream of every microcomputer afficionado in the 80s/90s. | + | //Note: The C version trebles these numbers, reaching over 7,000 sprites per frame without trying to optimize the benchmark loop.// |
| + | |||
| + | Now, if you reserve 50% of your game for logic, that' | ||
| //And don't knock 30fps. There is also some value to 30fps. Games like Teenage Mutant Ninja Turtles (NES, 1989), Ghosts 'n Goblins (NES, 1986) and Contra Force (NES, 1992) ran internally at 30fps. Even SNES games like Return of Double Dragon, or N64/PS1 games (ex. Super Mario 64) would run at 30fps. Other famous 30fps games include Ocarina of Time (N64, 1998), Soul Reaver (PS1, 1999), Resident Evil, Tomb Raider and GoldenEye 007. These were great, legendary games; Sometimes 30fps is ok. 45fps is definitely OK.// | //And don't knock 30fps. There is also some value to 30fps. Games like Teenage Mutant Ninja Turtles (NES, 1989), Ghosts 'n Goblins (NES, 1986) and Contra Force (NES, 1992) ran internally at 30fps. Even SNES games like Return of Double Dragon, or N64/PS1 games (ex. Super Mario 64) would run at 30fps. Other famous 30fps games include Ocarina of Time (N64, 1998), Soul Reaver (PS1, 1999), Resident Evil, Tomb Raider and GoldenEye 007. These were great, legendary games; Sometimes 30fps is ok. 45fps is definitely OK.// | ||
| Line 287: | Line 199: | ||
| == Conclusion | == Conclusion | ||
| A decent tile and sprite engine is the foundation of an 80s/90s console PPU and of a high end arcade board. Extensive testing must be done before I can come back and add anything to this. But, there are a few little things that may prove useful, i.e. flip transforms. As it stands, the PPU is considered ready for testing in the main system. | A decent tile and sprite engine is the foundation of an 80s/90s console PPU and of a high end arcade board. Extensive testing must be done before I can come back and add anything to this. But, there are a few little things that may prove useful, i.e. flip transforms. As it stands, the PPU is considered ready for testing in the main system. | ||
| + | |||
| + | ^ VC-4 ^ 2026 ^ 50,000+ ^ | ||
| + | | Sega Saturn (2D) | 1994| ~5, | ||
| + | | PSX (2D mode) | 1994 | ~4,000 | | ||
| + | | Sega Y Board (Galaxy Force II) | 1988 | ~2,000| | ||
| + | | 3DO | 1993 | ~1, | ||
| + | | Sega System 32 | 1990 | ~1,000 | | ||
| + | | Neo Geo MVS | 1990 | 381 | | ||
| + | | Capcom CPS-1 (Street Fighter II) | 1988 | 256 | | ||
| + | | Sega System 16 (Outrun) | 1986 | ~128 | | ||
| + | | NES | 1983 | 64 | | ||
| + | |||
| + | Nothing touches the VC-4. | ||
| + | |||
| + | |||
sd/sd-8516_ppu.1777761997.txt.gz · Last modified: by appledog
