sd:emulation_benchmarks
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| sd:emulation_benchmarks [2026/05/15 23:41] – appledog | sd:emulation_benchmarks [2026/05/16 00:24] (current) – appledog | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| Legend: | Legend: | ||
| * Green: The SD-8516 is capable of emulating this level of performance on an i7-12700 (Geekbench 6 baseline). | * Green: The SD-8516 is capable of emulating this level of performance on an i7-12700 (Geekbench 6 baseline). | ||
| - | * Light Gray: Although the SD-8516 cannot | + | * Light Green: Capable, but only with extensive use of the PPU and APU. |
| - | * Dark Gray: The SD-8516 cannot emulate this system. This is almost always because it is slower than 50% of the required speed. For example, it is unlikely to achieve Dreamcast-level performance | + | * Light Gray: The SD-8516 cannot |
| - | * Red: The system is unlikely to emulate this level of performance | + | * Dark Gray: The SD-8516 cannot emulate this system. This is almost always because it is designed as a single threaded system with no 3D acceleration. As an emulator, modern computers can often run emulated systems at near-native speeds. But even if we pass-through |
| ^ Year ^ System ^ CPU ^ Width ^ Approx MIPS ^ RAM ^ Graphics ^ Audio ^ Notes ^ | ^ Year ^ System ^ CPU ^ Width ^ Approx MIPS ^ RAM ^ Graphics ^ Audio ^ Notes ^ | ||
| Line 27: | Line 27: | ||
| | @lightgreen: | | @lightgreen: | ||
| | @lightgreen: | | @lightgreen: | ||
| - | | @red:2000 | PS2 | MIPS R5900 | 128-bit SIMD | 6000 + 40 | 32 MB | 640×448 | SPU2 | Emotion Engine 6k MIPS; +40 MIPS for PS1 compat.; VUs dominate | | + | | @darkgray:2000 | PS2 | MIPS R5900 | 128-bit SIMD | 6000 + 40 | 32 MB | 640×448 | SPU2 | Emotion Engine 6k MIPS; +40 MIPS for PS1 compat.; VUs dominate | |
| - | | @red:2001 | GameCube | Gekko @ 485 MHz | 32-bit | 1125 | 24 MB | 640×480 | Flipper DSP | Dolphin emulator gold standard; clean PowerPC arch | | + | | @lightgray:2001 | GameCube | Gekko @ 485 MHz | 32-bit | 1125 | 24 MB | 640×480 | Flipper DSP | Dolphin emulator gold standard; clean PowerPC arch | |
| - | | @red:2017 | Switch | ARM Cortex-A57 | 64-bit | 12,000 | 4 GB | 720p–1080p | | GPU & OS dominate emulation cost | | + | | @darkgray:2017 | Switch | ARM Cortex-A57 | 64-bit | 12,000 | 4 GB | 720p–1080p | | GPU & OS dominate emulation cost | |
| Line 45: | Line 45: | ||
| * Branch prediction: predicted branches (e.g., bottom-of-loop) increased throughput. | * Branch prediction: predicted branches (e.g., bottom-of-loop) increased throughput. | ||
| - | These Pentium-specific traits were exploited via Abrash' | + | These Pentium-specific traits were exploited via Abrash' |
| == Profiling Experiments | == Profiling Experiments | ||
| - | Here are the results of tight-loop experiments featuring benchmarks of one instruction. They are intended as relative results only. Taken on an i7-12700k. | + | Taken on an i7-12700k, a basic loop example executes at 55 MIPS in the WASM version and at 550 MIPS in the C version. However, there' |
| - | + | ||
| - | The basic " | + | |
| === MIPS isn't useful | === MIPS isn't useful | ||
| - | The following program illustrates | + | The following program illustrates |
| <codify armasm> | <codify armasm> | ||
| Line 68: | Line 66: | ||
| * WASM version 55 MIPS. | * WASM version 55 MIPS. | ||
| - | * C version was 560 MIPS. | + | * C version was 550 MIPS. |
| - | + | ||
| - | But here's the problem with MIPS. LSTEP, a command that does DEC CD and JNZ in one step, performs at 360 MIPS (in the C version). This shows that MIPS is somewhat deceptive as a measurement. The LSTEP command is performing the work of both DEC and JNZ in less time than each; but since it is a relatively slow command in and of itself it lowers the MIPS of the system as a whole. In reality if it was counted as two instructions it would show over ~700 MIPS. If we use a dual stage LSTEP (on two 16 bit registers) it runs in 470 MIPS (940 mips equivalent). | + | |
| - | === CISC vs RISC | + | The issue occurs when we try to replace |
| - | This shows that time spent on the hot path is slow, while time spent in the hot path is fast. That is, just like the WASM version, | + | |
| - | //Using a RISC-like ISA is only a requirement if you are emulating a particular architecture. It is not a good idea for a fantasy computer in general. A fantasy computer does better with CISC instructions.// | + | Another example, I had benchmarked kernal 0.7.2 at 750 MIPS, then I switched kernals to from 0.7.2 to 0.8.3. This had the effect of putting CASETAB into the hot path. So instead of performing hundreds of JZ and CMP instructions |
| - | A final example; I had benchmarked kernal 0.7.2 at 750 MIPS, then I switched kernals to from 0.7.2 to 0.8.3. This had the effect of putting CASETAB into the hot path. This meant that instead of performing hundreds of JZ and CMP instructions | + | === Conclusion: CISC vs RISC |
| + | Time spent on the hot path is slow, while time spent in the hot path is fast. That is, just like the WASM version, the C version does best with CISC instructions. MIPS itself, is not as important as it seems. What matters most is the quality | ||
sd/emulation_benchmarks.1778888474.txt.gz · Last modified: by appledog
