I noticed that jak 3's compilation was spending a lot of time accessing
the `unordered_map`s we use to store constants and symbol types.
I repurposed the `EnvironmentMap` originally made for GOOS for this. It
turns out that we were copying the entire constant map whenever we
encountered a `deftype`, and fixed that too.
This speeds up jak3 compiles from ~16 to 11 seconds for me.
Reverse engineer the skinning matrix calculation and port to GOAL. This
is about 3x faster than the MIPS2c version.
As usual, there is a `*use-new-bones*` flag to go back to the old
version.
Fix for a bug in the compiler's `.div.vf` implementation (only happens
if src/dst are the same), and fix for a typo in the register allocator
that would sometimes cause it not to consider xmm8-xmm15.
Started at 349,880,038 allocations and 42s
- Switched to making `Symbol` in GOOS be a "fixed type", just a wrapper
around a `const char*` pointing to the string in the symbol table. This
is a step toward making a lot of things better, but by itself not a huge
improvement. Some things may be worse due to more temp `std::string`
allocations, but one day all these can be removed. On linux it saved
allocations (347,685,429), and saved a second or two (41 s).
- cache `#t` and `#f` in interpreter, better lookup for special
forms/builtins (hashtable of pointers instead of strings, vector for the
small special form list). Dropped time to 38s.
- special-case in quasiquote when splicing is the last thing in a list.
Allocation dropped to 340,603,082
- custom hash table for environment lookups (lexical vars). Dropped to
36s and 314,637,194
- less allocation in `read_list` 311,613,616. Time about the same.
- `let` and `let*` in Interpreter.cpp 191,988,083, time down to 28s.
- decompile `subdivide`, `wind-work`, `tie-work`, `bsp`, `focus`
- support `ppacb` in compiler
- don't assert when bitfield stuff fails due to constant propgataion
weirdness
- finish up history
- div/mod unsigned assert fix in decompiler
- empty assert fix in decompiler for failed `add` type prop
- make jak 1 performance counters "work" (just measure time)
- fix cast/typos on pcgtb/vftoi15
* [goalc] macro expansion in integer constants
* working
* didn't break it yet
* support conditional compilation
* fix up some more small bugs
* fix duplicate evaluation of bitfield definitions
* paranoid
* clean up
* before int to float stuff
* before trying to eliminate the separate read and write maps
* partial fix for register issues
* add missing include
* temp: commit what i have so far
decomp: Fix nonempty_intersection impl for MSVC Debugging use-case
docs: Add info on getting ASan builds running on Visual Studio w/o exceptions
* decomp: initial rlet implementation
* decomp: cleanup pass of vector-rewrite stage
* decomp: Commit in-progress vector.gc, shortcomings are TODO commented
* decomp: More cleanup, rename from being `vector` instr specific
Fundamentally, this process can be used for re-writing ANY inline-asm instruction
* decomp: Support 4th arg ACC instructions
* decomp: Final pass of vector.gc before implementing last instructions
* decomp: Better warnings when hitting unimplemented instructs
* compiler: Implement inverse-sqrt and mov.vf
* decomp: Final manual pass over vector.gc, documented gaps
* decomp: Finish decompiling what currently is possible in vector.gc
* decomp: Fix Variable -> RegisterAccess conflict
* decomp: codacy lint
* Address review feedback
* Address feedback part 2
* Resolve build failures
* compiler: Support the majority of the remaining VU VF instructions
- VWAIT
- VMADD variants
- VMSUB variants
- VSQRT
- VDIV
- outer product (VOPMULA + VOPMSUB)
* compiler: Fix some bugs / optimize some instructions
* tests/compiler: Add test coverage for new instructions
* docs: Add documentation for new inline assembly functions
* lint: Formatting / fix failing test
* Remove my comment about ftf/fsf encoding, it's been fixed
* address review feedback
* correct VSQRTPS implementation
* begin work on vf support
* split reg kind into reg hw kind and class, use class for ireg
* try test
* clang format
* add some more ops and some example functions
* better lvf on statics
* add documentation