StrmnNrmn Shares Dynamic Recompilation Updates
StrmnNrmn is deadset on letting people know that he is doing his best to improve his N64 emulator, Daedalus. Just last week he let us in on how he’s trying to incorporate a dynamic recompiler to his N64 emulator to make it run somewhat at par with the original N64 speed. Today he posts more of the dynarec progress on his blog.
StrmnNrmn has now succeeded in assembling the fragment buffers into a native x86 code and execute it dynamically. At first, he debated as whether to target MIPS or Intel initially. He ended up initially debugging the code generation on the PC rather than doing it on the PSP. It turned out to be a wise decision.
“I’m glad I started with the PC as it allowed me to fix a number of hairy problems without going down the torturous path of debugging self modifying code on the PSP with just a few printf() statements to help track down any problems,” StrmnNrmn says.
Converting the fragment simulator loop directly into assembly was the first thing StrmnNrmn did to start with the x86 code generation. Each instruction in the fragment had the following generated code:
set current pc
set branch delay flag
get op code in ECX
call handler for specified op code (from R4300.cpp)
if ( exception set ) exit to exception handler
if ( branch instruction and branch taken ) exit to branch handler
This generated code produced a lot of assembly (i.e. 200KB of N64 instructions will produce 4MB of x86 assembly, which means an expansion of around 2000%). This didn’t bother StrmnNrmn. All he was concerned with at that time was preserving the behavior of the fragment simulator as much as possible.
The full article awaits after the jump!
StrmnNrmn is deadset on letting people know that he is doing his best to improve his N64 emulator, Daedalus. Just last week he let us in on how he’s trying to incorporate a dynamic recompiler to his N64 emulator to make it run somewhat at par with the original N64 speed. Today he posts more of the dynarec progress on his blog.
StrmnNrmn has now succeeded in assembling the fragment buffers into a native x86 code and execute it dynamically. At first, he debated as whether to target MIPS or Intel initially. He ended up initially debugging the code generation on the PC rather than doing it on the PSP. It turned out to be a wise decision.
“I’m glad I started with the PC as it allowed me to fix a number of hairy problems without going down the torturous path of debugging self modifying code on the PSP with just a few printf() statements to help track down any problems,” StrmnNrmn says.
Converting the fragment simulator loop directly into assembly was the first thing StrmnNrmn did to start with the x86 code generation. Each instruction in the fragment had the following generated code:
set current pc
set branch delay flag
get op code in ECX
call handler for specified op code (from R4300.cpp)
if ( exception set ) exit to exception handler
if ( branch instruction and branch taken ) exit to branch handler
This generated code produced a lot of assembly (i.e. 200KB of N64 instructions will produce 4MB of x86 assembly, which means an expansion of around 2000%). This didn’t bother StrmnNrmn. All he was concerned with at that time was preserving the behavior of the fragment simulator as much as possible.
Although a few more hours were spent debugging, things worked well generally. But with the dynarec on, the N64 emulator was still running at the same speed as when the dynarec was still turned off. So StrmnNrmn tried to optimise the generated assembly.
The first thing he did was remove setting the program counter before executing each instruction. Next, since the branch delay slot was preconditioned to be always clear, he explicitly set or cleared the branch delay flag when the state needed to change. Last thing he did was put out the exception handling from all the instructions he knew was safe.
After all the changes, StrmnNrmn’s instruction block looked like this:
if ( pc needed ) set current pc
if ( branch delay instruction )set branch delay flag
get op code in ECX
call handler for specified op code (from R4300.cpp)
if ( can throw exception and exception set ) exit to exception handler
if ( branch instruction and branch taken ) exit to branch handler
With this, 200KB of N64 code will now generate only 2MB of x86 assembly, an expansion ratio of only 1000%. Now, the PC version run 60% faster with the dynarec enabled than when dynarec was disabled.
With his dynarec success in the PC version, StrmnNrmn moved to getting the PSP code generation working. At the moment, he is at the same stage as the PC version with the PSP: “…the code generation is running fine (and executing on the PSP without crashing more importantly)”. Right now, Daedalus is running 10% faster with the dynarec enabled.
However, since StrmnNrmn has just started optimising things, he is still wary of speculating how much performance improvement will occur. He does promise to keep us updated with the dynarec progress. QJ will keep its eyes and ears open to deliver those updates to you.
Via StrmnNrmn