User Tools

Site Tools


sd:history

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
sd:history [2026/03/20 12:16] – created appledogsd:history [2026/03/20 12:33] (current) appledog
Line 1: Line 1:
 = History = History
  
-<blockquote>In 1981, the C64 was not trying to be a C64, it was just trying to be a computer. Today, the C64 is trying to be a C64; it's a simulation -- an emulator. The C64 Ultimate is a simulator running on an AMD Xilinx Artix-7 FPGA. It's an official Commodore product, and it does a //very// good job of emulating a C64. If no one said anything, no one would know the difference.+<blockquote>In 1981, the C64 was not trying to be a C64, it was just trying to be a computer. Today, the C64 is trying to be a C64; it's a simulation -- an emulator. The C64 Ultimate is a simulator running on an AMD Xilinx Artix-7 FPGA. It's an official Commodore product, and it does a //very good// job of emulating a C64. If no one said anything, no one would know the difference.
  
 The SD-8516 is also an emulator. An emulator for a computer that never existed. It's not trying to be a 6502, a Z80 or a 68000. It's just trying to be a computer. The SD-8516 is also an emulator. An emulator for a computer that never existed. It's not trying to be a 6502, a Z80 or a 68000. It's just trying to be a computer.
Line 62: Line 62:
  
 == The Secret Sauce: A Fast MUL == The Secret Sauce: A Fast MUL
-The SD-8516's legendary sub-eight cycle MUL is not based on the latest technology, but technology from yesteryear. It uses a lookup table to perform an 8bit argument to 16 bit result MUL in a single CPU cycle.+The SD-8516's legendary //single-cycle MUL// is not based on the latest technology, but technology from yesteryear. It uses a lookup table to perform an 8bit argument to 16 bit result MUL in a single CPU cycle.
  
-The //Quarter-square lookup// uses the formula xy = (x^2 + y^2 - (x-y)^2) / in a squaring table of just 512 bytes. This allows it to do an 8x8 multiply with three lookups. One such implementation ran in 79 to 83 cycles on a 6502 -- much faster than shift-and-add. On a chip with an internal squaring ROM and a dedicated subtractor, this can be done in 3 to 4 cycles.+The //Quarter-square lookup// uses the formula xy = ((x+y)^2 - (x-y)^2) / in a squaring table of just 512 bytes. This allows it to do an 8x8 multiply with three lookups. One such implementation ran in 79 to 83 cycles on a 6502 -- much faster than shift-and-add. On a chip with an internal squaring ROM and a dedicated subtractor, this can be done in 3 to 4 cycles.
  
-But 3 to 4 cycles is nothing compared to one. //According to the lore,// Stellar Dynamics bet the bank and licensed a 128 KB mask ROM die from an unknown Japanese semiconductor partner. Hitachi, Toshiba, Sharp and many others were all doing custom ROM for game cartridges in that era. This ROM, bonded directly into the SD-8516 package, contains a complete 8 bit to 16 bit multiplication table. The MUL microcode concatenates the two 8-bit operands as a 16-bit ROM address and reads the 16-bit result in a single bus cycle. For partial products it then sums them with the internal ALU. A total of no more than 4 ROM lookups and 3 additions. Pipelined, an 8x8 mul into 16 bits could be done in //one CPU cycle,// although 16 bit and above could take up to cycles. And thus, the SD-8516's legendary "single-cycle MUL" was born.+But 3 to 4 cycles is nothing compared to one. //According to the lore,// Stellar Dynamics bet the bank and licensed a 128 KB mask ROM die from an unknown Japanese semiconductor partner. Hitachi, Toshiba, Sharp and many others were all doing custom ROM for game cartridges in that era. This ROM, bonded directly into the SD-8516 package, contains a complete 8 bit to 16 bit multiplication table. The MUL microcode concatenates the two 8-bit operands as a 16-bit ROM address and reads the 16-bit result in a single bus cycle. For partial products it then sums them with the internal ALU. A total of no more than 4 ROM lookups and 3 additions. Pipelined, an 8x8 mul into 16 bits could be done in //one CPU cycle.// 
 + 
 +<blockquote>In truth, it's not really a single-cycle MUL -- that's just a marketing term. It actually can take up to five CPU cycles to multiply a 16 bit number, as it has to shift and add for each additional byte. Still, pound for pound, nothing else could do an bit MUL that fast. 
 + 
 +And thus, the SD-8516's legendary "single-cycle MUL" was born. 
 +</blockquote>
  
 Table lookup MUL was not new, but it was typically impractical due to cost and die area. Stellar Dynamics' innovation was integrating the lookup ROM on-die at a time when most competitors considered 128 KB of ROM too expensive for a general-purpose processor. The gamble paid off. The SD-8516's legendary "single-cycle MUL" became a signature feature and the reason it found its niche in scientific instruments, embedded systems, arcade hardware, and high-end personal computers where multiply-intensive workloads (3D graphics, signal processing, game physics) justified the premium price. Table lookup MUL was not new, but it was typically impractical due to cost and die area. Stellar Dynamics' innovation was integrating the lookup ROM on-die at a time when most competitors considered 128 KB of ROM too expensive for a general-purpose processor. The gamble paid off. The SD-8516's legendary "single-cycle MUL" became a signature feature and the reason it found its niche in scientific instruments, embedded systems, arcade hardware, and high-end personal computers where multiply-intensive workloads (3D graphics, signal processing, game physics) justified the premium price.
sd/history.1774008971.txt.gz · Last modified: by appledog

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki