Added decompilation guides for new contributors (#703)

This commit is contained in:
LagoLunatic
2025-03-11 22:20:47 -04:00
committed by GitHub
parent 74b917e08c
commit 5a5019589e
24 changed files with 988 additions and 0 deletions
+5
View File
@@ -119,3 +119,8 @@ Now you have Ghidra set up and ready to use.
For an introduction on how to use Ghidra, you can read [this section of the Twilight Princess decompilation's guide](https://zsrtp.link/contribute/decompiler-setup#using-ghidra).
Optionally, you may also want to also request "Read" access to the TwilightPrincess server on https://ghidra.decomp.dev and set that Ghidra project up too, even if you are not interested in working on that game. The reason for this is that a significant amount of engine code is shared between The Wind Waker and Twilight Princess, and the debug version of Twilight Princess (called `shield_chn_debug` in the Ghidra project) is easier to work with because inline functions are not inlined in that version. It can be worth checking if the function you're working on is present in that game as well.
Contributing
=======
If you've got all the requirements set up and want to learn how to contribute to the decompilation effort, see [this guide](/docs/decompiling.md) for details.
+121
View File
@@ -0,0 +1,121 @@
# Coding guidelines
This page contains some tips on how code should be written in this decompilation project. We don't have strict style rules for most things, but you should keep your code readable, and try to stick to the names and style used by the original programmers whenever possible.
Naming variables properly isn't required to help with the decompilation. You can submit a PR with code you decompiled even if many of the variable names are just placeholders (e.g. `field_0x290`, `temp`, `r29`, `sp10`, etc) - these names can always be cleaned up later in a documentation pass of the actor. Placeholder names are preferable to coming up with names that are incorrect if you aren't sure exactly what the variables are.
## Table of Contents
1. [Offsets and padding](#offsets-and-padding)
2. [Includes](#includes)
3. [Naming style](#naming-style)
4. [Use the official names where possible](#use-the-official-names-where-possible)
5. [Look at the actor's model](#look-at-the-actors-model)
## Offsets and padding
Member variables of classes and structs should all have comments to their left with the hexadecimal data offset of that member:
```cpp
struct anm_prm {
/* 0x00 */ s8 anmTblIdx;
/* 0x01 */ u8 armAnmTblIdx;
/* 0x02 */ u8 btpAnmTblIdx;
/* 0x04 */ int loopMode;
/* 0x08 */ f32 morf;
/* 0x0C */ f32 speed;
};
```
Furthermore, padding data should not be written in the class/struct body. In the example above, note that no field is located at offset 0x03, and then `int loopMode;` starts at offset 0x04. This is because `int` needs to be aligned to 4 bytes, so the compiler can't put it at offset 0x03, so it inserts one unused byte of padding before `loopMode`.
If a translation unit isn't fully decompiled yet, then there's no way to know if a particular offset is padding or if it actually has a field in there that is used by code that hasn't been decompiled yet. So you should wait until the TU is 100% decompiled before removing fields that look like padding.
## Includes
Avoid unnecessary includes, especially in header files. clangd will give you a warning saying "Included header is not used directly (fix available)" if you aren't using a header at all.
Forward declaring types where possible will reduce compile times. So instead of putting all the includes in a actor's header file, like so:
```cpp
#include "d/d_path.h"
#include "d/actor/d_a_obj_search.h"
dPath* ppd;
daObj_Search::Act_c* mpSearchLight;
```
You could move those includes into the actor's .cpp file where they are actually needed, and add forward declarations to the actor's header like so:
```cpp
class dPath;
namespace daObj_Search { class Act_c; };
dPath* ppd;
daObj_Search::Act_c* mpSearchLight;
```
## Naming style
We try to stick to the same naming style that the original developers used. They didn't have a completely consistent naming style, but they tended to use certain prefixes and styles depending on the type of variable.
Function parameters should be prefixed with `i_` (or `o_` if it's an output parameter) and use lowerCamelCase:
* `fopAc_ac_c* i_this`
* `int i_itemNo`
* `GXTlutObj* o_tlutObj`
In-function local variables have no prefix and use lower_snake_case:
* `int zoff_blend_cnt = 0;`
* `int phase_state = ...`
Member variables of classes are generally prefixed with `m` (or `mp` for pointers) and use UpperCamelCase:
* `fpc_ProcID mTimerID;`
* `J3DModel* mpModel;`
Member variables of structs (plain old data) have no prefix and use lower_snake_case:
* `csXyz shape_angle;`
* `int id;`
Static variables are prefixed with `l_`, while global variables are prefixed with `g_` (the official names for these are all known from the symbol maps):
* `static cXy l_texCoord[] = ...`
* `dComIfG_inf_c g_dComIfG_gameInfo;`
## Use the official names where possible
If a class has a getter function (whether it's an inline or not) that returns a member variable, you should generally name it as indicated by the inline.
For example: `getChainCnt()` would return `mChainCnt`.
Another place that official variable names of all kinds can be revealed is in debug assertion strings. In these cases, you should always use the exact name from the assert, even if it doesn't follow a consistent style. For example, the following are both official names for similar in-function local variables:
```cpp
J3DModelData* modelData = ...
JUT_ASSERT(98, modelData != NULL);
```
```cpp
J3DModelData* model_data = ...
JUT_ASSERT(382, model_data != NULL);
```
## Look at the actor's model
If a variable's name doesn't appear in a function name or assertion string, we'll have to come up with a name for it ourselves. To do this, you usually need to know what the decompiled actor you're looking at actually is in-game before you can start coming up with names. But it's often pretty hard to tell what an actor is just by reading its code.
The official TU name of the actor doesn't tell you much, not only because they're frequently in Japanese, but also because they're aggressively abbreviated. e.g. `d_a_nh` is short for "mori **n**o **h**otaru", which is Japanese for "forest firefly", but it would be pretty much impossible to guess that without context, even if you know Japanese.
If the actor has a 3D model, you can determine what the actor is by simply viewing the model in a model viewer. First, find the .arc file for this actor. Look in the `createHeap` or `useHeapInit` function for this actor. You should see something like:
```cpp
(J3DModelData*)dComIfG_getObjectRes("Bk", BK_BDL_BK)
```
This means the actor's .arc in this example is named "Bk". You can find it your copy of TWW's files at `files/res/Object/Bk.arc`.
Next go to https://noclip.website/ and drag-and-drop the .arc file onto the website. It should display all of the 3D models in that archive on top of each other. You can open up the "Layers" menu on the left hand side and toggle off specific models if it's too confusing with them all overlapping.
Alternatively, you can also download [GCFT](https://github.com/LagoLunatic/GCFT) (version 2.0.0 or higher) to view models if you prefer an offline program to a website. GCFT allows you to load not just models but also their animations, which may be helpful in determining the exact difference between states if the actor has multiple animations.
Drag-and-drop the .arc file onto GCFT to open it, then right click on one of the BDL models and select "Open J3D" to view the model. You can also go back to the RARC tab and right click a BCK animation and select "Load J3D Animation" to view that animation on the model you have loaded.
+474
View File
@@ -0,0 +1,474 @@
# Decompiling
This document describes the basics how to start decompiling code and contributing to this decompilation project, as well as explaining some common pitfalls. Feel free to ask for help in the ZeldaRET Discord server if you get stuck or need assistance.
If you haven't already, you should first follow the instructions in the [readme](../README.md) to get the decomp set up, as well as the tools you will be using to work on it: objdiff and Ghidra.
## Table of Contents
1. [Choosing an object to decompile](#choosing-an-object-to-decompile)
2. [Setting up classes/structs](#setting-up-classesstructs)
3. [Decompiling functions](#decompiling-functions)
4. [Inline functions and how to read the debug maps](#inline-functions-and-how-to-read-the-debug-maps)
5. [Linking a 100% matching object](#linking-a-100-matching-object)
6. [Documentation and naming](#documentation-and-naming)
## Choosing an object to decompile
Once you have everything set up, you should pick which object (also called a translation unit, TU) you want to work on.
It's recommended to begin with a small and simple actor to learn the basics of decompilation. We have a list of small actors that haven't been decompiled yet [here on GitHub](https://github.com/zeldaret/tww/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22easy%20object%22), so you can pick one of those. You can leave a comment on the issue saying that you're working on it to let others know that they shouldn't pick the same one.
Now that you've decided on an object, open up objdiff and type the object's name (e.g. `d_a_wall`) into the filter bar on the left, then click on it to open it. You should see a list of data and functions in this TU.
You'll also want to open the source file for this TU in VSCode. You can do this by pressing VSCode's `Ctrl+P` shortcut and typing the name of the TU with the .cpp extension (e.g. `d_a_wall.cpp`).
## Setting up classes/structs
Once you've chosen which object you want to decompile, you'll usually want to set up the actor's class/struct in Ghidra before you start decompiling any code.
> [!NOTE]
> Some actors that aren't decompiled may have already had their struct defined in our Ghidra server by someone else in the past, in which case you may be able to skip this step. But this is not the case for most actors.
In objdiff, pick one of the actor's functions (one with "create" in the name would be good to start with). Then open the `main` program in Ghidra, press `G` and type the function name (e.g. `daWall_c::CreateInit`) to go to that function in Ghidra. If the struct hasn't been properly defined for Ghidra, the function may look something like this at first:
![Ghidra function before defining the struct](images/ghidra_createinit_1.png)
It's not very readable at the moment, so let's improve that. Right click the first parameter (e.g. `daWall_c *this`) and choose "Edit Data Type" to open Ghidra's structure editor:
![Ghidra Structure Editor](images/ghidra_struct_1.png)
The placeholder struct defaults to empty, which is why Ghidra isn't doing a great job of decompiling the function. Let's give it the correct size. Luckily, all actors have a profile that tells us how large each instance should be.
Press `Ctrl+P` in VSCode and type the name of the TU with the .s extension (e.g. `d_a_wall.s`) to open the automatically generated assembly file. Then search for the text `g_profile` in this file to find the actor's profile near the bottom:
```asm
# .data:0xE4 | 0xE4 | size: 0x30
.obj g_profile_WALL, global
.4byte 0xFFFFFFFD
.4byte 0x0007FFFD
.4byte 0x01B10000
.4byte g_fpcLf_Method
.4byte 0x000005E4
.4byte 0x00000000
.4byte 0x00000000
.4byte g_fopAc_Method
.4byte 0x01980000
.4byte daWallMethodTable
.4byte 0x00040100
.4byte 0x000E0000
.endobj g_profile_WALL
```
The fifth line of the profile is the size, so 0x5E4 bytes in this example. Copy paste that number into the Size field of the struct editor you have open in Ghidra.
> [!NOTE]
> If the .s file doesn't exist, then you may need to run `ninja` to build the decomp for the first time. The decomp repo doesn't come with any assembly, it is generated from your own copy of TWW.
Next, you want to set the parent class of this actor to the actor base class. Change the data type of the first field from `undefined` to `fopAc_ac_c` and change its name from being blank to `parent`. It should look like this now:
![Ghidra Structure Editor](images/ghidra_struct_2.png)
> [!NOTE]
> Some actors inherit from a different base class besides `fopAc_ac_c`. The most common are `dBgS_MoveBgActor` (for things like moving platforms), `fopNpc_npc_c` (for NPCs), or `fopEn_enemy_c` (for enemies). If you see one of those names show up in objdiff, your actor might inherit from that class.
Save the struct. If you go back to the function in Ghidra you were looking at before, it should be somewhat more readable now:
![Ghidra function after defining the struct's size and parent](images/ghidra_createinit_2.png)
But we can still improve it further by defining this actor's own fields too. You see the part where it says `*(uint *)&this->field_0x5dc`? That pointer cast before a field name (`*(Type *)&`) is Ghidra trying to tell you that the field at offset 0x5dc hasn't had its type correctly defined.
Right click on the `field_0x5dc` part, choose "Retype Field", and replace `undefined` with `uint` (or whatever the type is in your case). If you did it properly, it should now show as just `this->field845_0x5dc` without the `*(uint *)&` part.
Repeat this process for the other fields that are referenced in this function. For example, `*(dBgW **)&this->field_0x578` means that `field_0x578` should be retyped as `dBgW *`.
Once you've properly typed everything in this function, it should look a lot cleaner than it did originally:
![Ghidra function after fully defining the struct](images/ghidra_createinit_3.png)
(In the above screenshot, the fields have also been renamed, but you don't have to do that if you're not sure what they are. No names are less confusing than incorrect names, and they can always be named in a documentation pass later on.)
Once you're done with one function, go through all of this actor's other functions, and continue retyping all of this actor's fields. Starting with constructors and functions that have "create" or "init" or "heap" in their name will make it easier.
After all of the actor's fields have proper types, it's almost time to actually start decompiling. The only thing left to do is get all these fields you defined into the decomp itself.
Open up the header file for the actor you're working on (e.g. `d_a_wall.h`). You should see a placeholder that says `/* Place member variables here */` inside the actor's class definition.
You could start manually typing out all of the fields in there, but that would be a waste of time if you already defined them in Ghidra. Instead, you can use a Ghidra script we have to automate the process.
In Ghidra, select Window -> Script Manager -> Create New Script -> Python. Name the script `tww_class_to_cpp.py`, and copy paste the contents of [this file](tww_class_to_cpp.py) into the new script. You can optionally assign a keyboard shortcut if you wish (e.g. `Alt+Shift+S`).
You will be prompted to type the name of struct you want to export. After clicking Okay, all of the struct's members will now be automatically copied onto your clipboard.
Simply replace the `/* Place member variables here */` line in the header by pasting over it.
Great, now the actor's class is fully defined in both Ghidra and the decomp! Now you can start actually decompiling some functions.
## Decompiling functions
With your TU open in objdiff, you should select a small function to start with. Here's what a small function will look like when you click it in objdiff:
![Function in objdiff](images/objdiff_function.png)
In VSCode, find the placeholder for the function you're going to be working on, which should currently be empty:
```cpp
/* 00000FE4-00001044 .text _draw__8daWall_cFv */
void daWall_c::_draw() {
/* Nonmatching */
}
```
Navigate to this function in Ghidra. You might see something like this:
![Function in Ghidra](images/ghidra_draw.png)
Ghidra's pseudocode isn't accurate enough to be directly copy-pasted into this decompilation project, but it's still useful for quickly understanding what most functions are doing.
In this example, the function would look like this when fully decompiled (don't remove the "Nonmatching" comment until it shows 100% matching in objdiff!):
```cpp
/* 00000FE4-00001044 .text _draw__8daWall_cFv */
bool daWall_c::_draw() {
g_env_light.settingTevStruct(TEV_TYPE_BG0, &current.pos, &tevStr);
g_env_light.setLightTevColorType(mpModel, &tevStr);
mDoExt_modelUpdateDL(mpModel);
return true;
}
```
There are several minor differences between what Ghidra showed us and how the function should actually be written. Things like passing objects as the first argument to their functions, unnecessarily using `this->`, or writing out `(Type *)0x0` instead of `NULL` are Ghidra-isms that you'll start to pick up on over time.
The easiest way to learn about these differences is to look for similar code in already-decompiled actors. In VSCode, you can press `Ctrl+Shift+F` and type the name of a function to search for everywhere in the decomp that function was used, which should help you understand how it will be called.
But despite the minor syntax differences, the above example looks pretty similar in both Ghidra and the decomp. Not all functions will look this similar.
For example, if you were to look at the actor's create function, you should see something like this at the top of the function in Ghidra:
![Create function in Ghidra](images/ghidra_setup_actor_macro.png)
This code is constructing the actor when it's first created. You shouldn't write it out by hand - instead, use the `fopAcM_SetupActor` macro, like so:
```cpp
fopAcM_SetupActor(this, daWall_c);
```
That should expand out into the proper code when compiled. If something in there is missing even after using the macro, then you might not have set up all of the actor's member variables properly in the previous step, so add any missing fields now.
There are other macros to watch out for too. A common pattern you'll likely see at some point is a debug assertion, which looks like this in Ghidra:
![Debug assertion in Ghidra](images/ghidra_jut_assert_macro.png)
The macro to use in this case is `JUT_ASSERT`, which handles checking a condition and showing that condition as a string:
```cpp
JUT_ASSERT(0x181, modelData != NULL);
```
Note that any variables used in an debug assertion must have their names match the assertion string exactly, like the `modelData` local variable in this case. This can sometimes even give you the official name of a member variable. Defines like `NULL` or `FALSE` work a bit differently and show up as their value (zero) in the assertion strings, instead of appearing the way the programmer actually wrote them.
Other than those two macros, there's another common case that can cause code to look very different in Ghidra compared to how it was originally written: **inline functions**. These are used all over the place in TWW's codebase, and they're important to get right for several reasons, but as there are thousands of them we can't go over all of them individually in this guide. Instead, let's go over what the workflow for finding them on your own will look like.
## Inline functions and how to read the debug maps
Inline functions, or inlines for short, are functions that don't show up in Ghidra's decompiled code or objdiff's disassembly. This is because, while the original programmers wrote a function call, the compiler replaced that call with the *contents* of the inline function as an optimization. Inlines are generally pretty small functions, most often only a single line long.
When decompiling, you should try to use the same inlines the original devs used whenever possible, not only because this makes the code much more readable, but also because inline usage affects how the compiler generates code in many non-obvious ways.
If you've fully decompiled a function and are sure you didn't make any mistakes, but the function doesn't match in objdiff due to some small issue in the assembly, it's possible that you need to use the same inlines that the original developers used in order to get the compiler to generate the same assembly.
Some examples of small issues in the assembly that may be caused by incorrect inline usage:
* Two or more registers being swapped around (regswap/regalloc)
* Instructions being slightly out of order
* Instructions being unnecessarily duplicated (on either the left or right hand side)
But how can you know which inlines to use if they're not in the assembly? Inlines do appear in debug builds, but we don't have access to a debug binary of TWW like TP.
But luckily, we do have access to debug *symbol maps* for a Japanese prerelease kiosk demo of TWW. This demo is from very late in TWW's development, so the debug maps have the names of almost every single inline the final retail game uses.
Without the accompanying debug binary, there is some guesswork involved in figuring out exactly where each inline is used, but we'll cover some examples of how to read these maps and determine what inlines to use where.
First of all, download all the debug maps. You can find them pinned in the [tww-decomp](https://discord.com/channels/688807550715560050/1150077060098822226) channel of the ZeldaRET Discord server.
Second, open up the debug map for the actor you're working on. For example, if your object is called `d_a_wall`, you would open up `d_a_wallD.map`. Then consult the [Reading REL debug maps](#reading-rel-debug-maps) section below.
If the actor you're working on *doesn't* have its own `D.map` file, then it was probably merged in with `frameworkD.map`, which makes it harder to read. In this case, consult the [Reading frameworkD.map](#reading-frameworkdmap) section below.
### Reading REL debug maps
Let's take a look at another unmatched function in our TU:
```cpp
/* 00000F74-00000FE4 .text set_se__8daWall_cFv */
void daWall_c::set_se() {
/* Nonmatching */
}
```
In Ghidra, the function looks like this:
![Example of how an inline function appears in Ghidra](images/ghidra_inline.png)
You might be tempted to clean up Ghidra's output and decompile the function like this:
```cpp
/* 00000F74-00000FE4 .text set_se__8daWall_cFv */
void daWall_c::set_se() {
JAIZelBasic::zel_basic->seStart(0x696C, &eyePos, 0, dComIfGp_getReverb(current.roomNo), 1.0f, 1.0f, -1.0f, -1.0f, 0);
}
```
That does match in this case (it won't always), but we can improve it by checking this function in the debug map for this actor. Copy paste the function's *mangled* name (the last part of the comment after .text, e.g. `set_se__8daWall_cFv`) and Ctrl+F for it in the `D.map` for your actor.
You should see something along these lines:
```
8] set_se__8daWall_cFv (func,global) found in d_a_wall.o
9] fopAcM_seStart__FP10fopAc_ac_cUlUl (func,weak) found in d_a_wall.o
>>> SYMBOL NOT FOUND: dComIfGp_getReverb__Fi
10] mDoAud_seStart__FUlP3VecUlSc (func,weak) found in d_a_wall.o
11] getInterface__11JAIZelBasicFv (func,weak) found in d_a_wall.o
>>> SYMBOL NOT FOUND: zel_basic__11JAIZelBasic
>>> SYMBOL NOT FOUND: seStart__11JAIZelBasicFUlP3VecUlScffffUc
8] dComIfG_Ccsp__Fv (func,weak) found in d_a_wall.o
```
This is part of the *linker tree*, which shows which functions call other functions. It also tells us which functions are inlines - the ones with `(func,weak)` after their name.
The number on the left hand side indicates the indentation/depth in the tree. So `set_se` is at depth 8, and `fopAcM_seStart` is at depth 9. That means `set_se` calls `fopAcM_seStart`, which has `(func,weak)` so it's an inline. `mDoAud_seStart` is also an inline, but it's at depth 10, meaning it's called by `fopAcM_seStart`, not by `set_se` directly.
There are no other functions below `set_se` in the tree at depth 9, so it only calls that one inline.
> [!NOTE]
> The symbol names in the debug maps are mangled, like `fopAcM_seStart__FP10fopAc_ac_cUlUl`. If you need to read one of these more clearly, you can use objdiff's Tools -> Demangle... to get the demangled form of the symbol, such as `fopAcM_seStart(fopAc_ac_c*, unsigned long, unsigned long)`.
Let's try decompiling this function again, but this time using the `fopAcM_seStart` inline:
```cpp
/* 00000F74-00000FE4 .text set_se__8daWall_cFv */
void daWall_c::set_se() {
fopAcM_seStart(this, JA_SE_OBJ_BOMB_WALL_BRK, 0);
}
```
This is much closer to how it would have looked when written by the original devs.
(The `JA_SE` value there is part of an enum of sound effects - if you search through the decomp for decompiled actors that call `fopAcM_seStart`, you will see this enum being when that inline is called.)
However, there's an important caveat to keep in mind when reading the debug maps: Each inline only appears **once per map**, even if it was called multiple times.
We got lucky in this example, because `fopAcM_seStart` was only called once in the entire file. But if it had been called multiple times and it already appeared higher up in the tree, nothing would have appeared underneath `set_se` when we had looked at it.
To avoid this, you could start at the top of the linker tree (search for this text: `1] g_profile_`) and decompile functions in the order they appear there, which is a different order from how they appear in the .cpp file.
But with experience you'll start to recognize more inlines even in cases where the debug maps don't help you for a particular function.
### Reading frameworkD.map
Sometimes, the actor you're working on doesn't have its own debug map. In these cases, the actor's symbols - and any inlines it uses - were merged into frameworkD.map with many other TUs instead. (This is more advanced, so if this doesn't apply to the object you're working on, you can skip this section.)
Inlines in frameworkD.map are harder to spot and understand, as this map doesn't have the linker tree described above like all other maps. It has a flat list of symbols instead, and while this list does include inlines, the order they're placed in is more confusing. Worse, the rule of inlines appearing only "once per map" mentioned above still applies here, but it's much more disruptive in this case due to this map having hundreds of TUs in it instead of just one.
To find the object you're working on, search for the TU name with the .o extension. For example, for the `d_a_player` TU:
```
001e5028 0004d0 801ea768 1 .text d_a_player.o
001e5028 000078 801ea768 4 changePlayer__9daPy_py_cFP10fopAc_ac_c d_a_player.o
001e50a0 0001b0 801ea7e0 4 objWindHitCheck__9daPy_py_cFP8dCcD_Cyl d_a_player.o
001e5250 000038 801ea990 4 execute__25daPy_mtxFollowEcallBack_cFP14JPABaseEmitter d_a_player.o
001e5288 000058 801ea9c8 4 end__25daPy_mtxFollowEcallBack_cFv d_a_player.o
001e52e0 000080 801eaa20 4 makeEmitter__25daPy_mtxFollowEcallBack_cFUsPA4_fPC4cXyzPC4cXyz d_a_player.o
001e5360 000084 801eaaa0 4 makeEmitterColor__25daPy_mtxFollowEcallBack_cFUsPA4_fPC4cXyzPC8_GXColorPC8_GXColor d_a_player.o
001e53e4 000094 801eab24 4 setDoButtonQuake__9daPy_py_cFv d_a_player.o
001e5478 000080 801eabb8 4 stopDoButtonQuake__9daPy_py_cFi d_a_player.o
001e54f8 0001c8 801eac38 1 .text d_a_player.o
001e54f8 000024 801eac38 4 dComIfGp_att_ChangeOwner__Fv d_a_player.o
001e551c 00003c 801eac5c 4 dComIfGp_setPlayer__FiP10fopAc_ac_c d_a_player.o
001e5558 000010 801eac98 4 setPlayer__14dComIfG_play_cFiP10fopAc_ac_c d_a_player.o
001e5568 000040 801eaca8 1 .text d_a_player.o
001e5568 00002c 801eaca8 4 changeOwner__12dAttention_cFv d_a_player.o
001e5594 000008 801eacd4 1 .text d_a_player.o
001e5594 000008 801eacd4 4 Owner__9dCamera_cFP10fopAc_ac_c d_a_player.o
```
This TU has multiple .text sections. When this happens, generally the first .text section will have the non-weak functions (the ones in `d_a_player.cpp`), while the rest of the .text sections will have inlines. Let's look at the inlines in the second .text section:
```
001e54f8 0001c8 801eac38 1 .text d_a_player.o
001e54f8 000024 801eac38 4 dComIfGp_att_ChangeOwner__Fv d_a_player.o
001e551c 00003c 801eac5c 4 dComIfGp_setPlayer__FiP10fopAc_ac_c d_a_player.o
001e5558 000010 801eac98 4 setPlayer__14dComIfG_play_cFiP10fopAc_ac_c d_a_player.o
```
This tells use that the inlines `dComIfGp_att_ChangeOwner()`, `dComIfGp_setPlayer(int, fopAc_ac_c*)`, and `dComIfG_play_c::setPlayer(int, fopAc_ac_c*)` are used in the `d_a_player` TU.
It also tells us that these three inlines are not used by any of the other TUs that appear above this point in frameworkD.map.
It doesn't tell us whether or not they're used by other TUs below this point, or vice versa. The `dComIfGp_getCamera` inline is used in this TU, and should appear in this section, but doesn't due to the "once per map" rule as it already appeared higher up in frameworkD.map for a different TU.
Unfortunately, it also doesn't tell us which specific function(s) in `d_a_player` call these inlines, because they're in a different section from d_a_player's own functions. However, it does give us a hint as to the *order* these functions are called in the TU.
Specifically, inlines at the same depth/indentation as each other in the linker tree will appear in **reverse order** in the list of symbols. But inlines at a lower depth will still appear below the inline that called them. Based on the names, `setPlayer` is *probably* a deeper inline that is called by `dComIfGp_setPlayer`. So if we were to take a guess and try to recreate the linker tree in this case, it might look something like this:
```
1] dComIfGp_setPlayer__FiP10fopAc_ac_c
2] setPlayer__14dComIfG_play_cFiP10fopAc_ac_c
1] dComIfGp_att_ChangeOwner__Fv
```
This doesn't tell us as much as the real linker trees, and is based on guesswork, but going through this process can sometimes help you to determine what inlines you should be using where.
## Fixing minor nonmatching issues
Once you've gone through and decompiled every function in your chosen TU, you might have run into a few functions that you could only get *mostly* matching, falling short of showing a 100% match in objdiff.
It's not possible for this guide to cover every possible issue you might face, but we'll go over some common cases, as well as how to ask for help if you're still stuck.
### Swapped if/else blocks
Ghidra has a habit of showing if/else blocks in the wrong order. So even if Ghidra shows you this:
```cpp
if (!condition) {
var = 2;
} else {
var = 1;
}
```
You might actually need to write it like this sometimes:
```cpp
if (condition) {
var = 1;
} else {
var = 2;
}
```
You can tell when this is necessary by looking at this part of the code in objdiff, as the assembly will show you the correct order. You can fix it by simply swapping the blocks as well as the condition. If there are multiple conditions being checked you may also need to switch the logical operator (e.g. `||` -> `&&`).
### if/else vs ternary conditional branch differences
The compiler optimizes if/else statements differently from the ternary conditional operator.
If you write something like this with a ternary:
```cpp
return condition ? TRUE : FALSE;
```
And that compiles to the wrong optimized code like this:
![Incorrect branch optimization in objdiff](images/objdiff_ternary.png)
Try writing it with if/else instead, and it may match:
```cpp
if (condition) {
return TRUE;
} else {
return FALSE;
}
```
The same applies in reverse. You'll likely have to swap if/else for a ternary at some point.
### Swapped registers
Sometimes, all of the instructions in a function will match, but which variable got put in which processor register by the compiler is all swapped around:
![A regswap](images/regswap_actor_base.png)
This issue is called a **regswap**, and it's so common, and has so many different possible causes, that it gets its [own entire guide](regalloc.md).
### Asking for help with a function by sharing a decomp.me scratch
If you're still stuck on some annoying minor issue, it can be worth having a second pair of eyes look to see if they can spot the issue. objdiff has a built-in way to easily share a particular function with others just by giving them a link on a site called decomp.me.
To use this feature, first open up the function you're stuck on in objdiff, and then click the `📲 decomp.me` button in the upper left corner:
![objdiff's decomp.me button](images/decomp_me_button.png)
Your web browser will be opened automatically, and you should see a blank page that says "Move related code from Context tab to here".
Switch from the "Source code" tab to the "Context" tab. Search through this tab for the specific function you had opened up. Cut (don't copy) this entire function out of the Context tab and paste it into the Source code tab. You also might need to go back to the Context tab and delete all the code that comes *after* the function you just cut in order for it to compile properly (don't touch the context that comes before it though).
If done correctly, the scratch should compile and show the same issue as you were seeing in objdiff. Save (Ctrl+S) the scratch. Now you can share this scratch's URL in the [tww-decomp-help](https://discord.com/channels/688807550715560050/1150077114347966545) channel of the ZeldaRET Discord server and ask for help.
Note that scratches only show functions, not data. So if all the functions match 100% but some data doesn't, you'll have to figure that out locally using objdiff.
### Missing weak data
Many actors TUs in TWW have unused data included into them, usually in the .bss or .data sections. This data won't be referenced by any of the functions, but it's still necessary to include it in order for the TU to match.
You can tell if this is the case for the TU you're working on by looking at the symbol list in objdiff. If one of the data sections has a bunch of symbols on the left side but not on the right side, and they have names like `@1036`, they may be missing weak data.
The exact cause of these aren't fully understood yet, but we have headers you can include that should match these symbols. Copy either the .bss include or the .data include below, or both, depending on which section(s) in your TU the missing symbols are in:
```cpp
#include "weak_bss_936_to_1036.h" // IWYU pragma: keep
#include "weak_data_1811.h" // IWYU pragma: keep
```
### Diffing data values with objdiff
Sometimes, even if you've 100% matched all functions, some of the data symbols will show less than 100% in objdiff:
![A data symbol with a name that doesn't match in objdiff](images/objdiff_data_named_symbol.png)
If the symbol in question has a name, like `eye_co_sph_src` in the above screenshot, you can find this variable by simply searching for its name in the .cpp and fixing whatever shows as different in objdiff's data diff view.
But what if the symbol doesn't have a real name, and it's just a bunch of numbers like `@1440`?
![A data symbol without a name that doesn't match in objdiff](images/objdiff_data_unnamed_symbol.png)
You won't find the text `@1440` anywhere in the .cpp file, because it's a compiler-generated name. The compiler automatically assigns these unique names to literal values that appear inside functions - most often float literals like `0.0f`. If one of these doesn't match, it means you got one of the literals in a function wrong.
objdiff has a feature that allows you to easily find exactly where this wrong literal appears. Go to Diff Options -> Function relocation diffs, and change this option from "Name or address (default)" to "Name or address, data value". Then scroll down through the list of functions that you had 100% matched, and you should now see that one of them shows less than 100%. That's the one that uses the incorrect literal.
If you open that function up in objdiff, you'll now see the literal with the wrong value is highlighted as a diff. You can hover over it on the left side to see what its value should really be:
![Hovering over a data relocation in the function diff view in objdiff](images/objdiff_function_data_hover.png)
As objdiff shows you both the line number that the literal appears on (e.g. 108) and the value it should be changed to (e.g. 90.0f) it should be very easy to fix this.
Note that while unnamed data symbols are often floats literals, this isn't always the case. Sometimes they'll be PTMFs (Pointer to Member Functions) or switch statement jump tables. In these cases, objdiff's function diff view itself may not show you exactly what's wrong, and instead you'll have to look in the data diff view. You can hover over the highlighted hex bytes in this view to see relocations that don't match:
![Hovering over a data relocation in the data diff view in objdiff](images/objdiff_data_hover.png)
## Linking a 100% matching object
Once you've fully decompiled all functions and data so that every section shows as 100% matching in objdiff, it's time to check that the TU as a whole also matches. To do this, find the TU's name in [configure.py](../configure.py) and change it from `NonMatching` to `Matching` to tell the build system this TU should be linked, then run `ninja` (or `Ctrl+Shift+B` in VSCode) to build.
If you see `416 files OK` followed by a report of the project's total progress, that means your TU matches. Great, you're done! You can go ahead and submit a pull request on GitHub now.
But if you see something like this, where it says your chosen TU failed:
```
FAILED: build/GZLE01/ok
build/tools/dtk shasum -q -c config/GZLE01/build.sha1 -o build/GZLE01/ok
build/GZLE01/d_a_wall/d_a_wall.rel: FAILED
415 files OK
WARNING: 1 computed checksum(s) did NOT match
```
Then that means something in your TU doesn't actually match exactly and you should figure out what it is. We'll go over a few methods of finding out what the issue is - though keep in mind that you can still submit a pull request even if you don't manage to figure it out, just mention that in the description of the PR and revert it in [configure.py](../configure.py) to `NonMatching`.
### ninja diff (only for main.dol)
If the TU you're working on is in main.dol, you can run the `ninja diff` command and DTK will print out an explanation of exactly where the issue lies:
```
$ ninja diff
[4/4] DIFF build/GZLE01/framework.elf
FAILED: dol_diff
build/tools/dtk -L error dol diff config/GZLE01/config.yml build/GZLE01/framework.elf
ERROR Expected to find symbol getZoneNo__20dStage_roomControl_cFi (type Function, size 0x1C) at 0x8005DCD0
ERROR At 0x8005DCD0, found: offSwitch__10dSv_info_cFii (type Function, size 0x1AC)
ERROR Instead, found getZoneNo__20dStage_roomControl_cFi (type Function, size 0x1C) at 0x8005EF6C
ninja: build stopped: subcommand failed.
```
Unfortunately, this command currently only supports detecting differences in main.dol, while most actors are in RELs, so it won't print anything useful most of the time:
```
$ ninja diff
[3/3] DIFF build/GZLE01/framework.elf
```
If you're working on a REL, you'll have to locate the difference manually.
### Weak function ordering
Even if all functions match 100%, it's possible for the TU to not match if the compiler put some of the functions from included headers in the wrong order. You can tell if this is the case by looking at the list of functions in objdiff and slowly moving your mouse down across all the function names on the left hand side - if the cursor on the right hand side jumps back and forth at times, then the functions aren't in the same order.
![objdiff showing weak functions that are out of order](images/objdiff_weak_func_order.png)
This issue is called **weak function ordering**, and it's so common, and has so many different possible causes, that it gets its [own entire guide](weak_func_order.md).
## Documentation and naming
Once an actor is fully decompiled, you can start naming some of its member variables if you want to. This is completely optional - it's normal to submit a PR without documenting most fields. Leaving them unnamed (e.g. `field_0x290`) is preferable to coming up with wrong names if you aren't sure.
But if you do decide to start naming things, you should check out the [coding guidelines page](coding_guidelines.md).
Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 311 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

+135
View File
@@ -0,0 +1,135 @@
# Register swaps / register allocation
Sometimes, all of the instructions in a function will match, but which variable got put in which processor register by the compiler (register allocation) is all swapped around. This is known as a register swap or regswap.
Some regswaps you encounter will have unique solutions, but most are caused by just a handful of recurring patterns. The purpose of this document is to act as a cheatsheet showing examples of those common patterns and how they can be fixed.
## Double check that the function is equivalent
Before trying different things that affect how the compiler performs register allocation, you should first double check that the way you've decompiled the function so far is even functionally equivalent.
Rarely, you will make a mistake and swap two different variables of the same type, e.g. by passing the `this` pointer to a function when you should have passed a local variable that points to a different actor entirely.
One thing you can do to help spot these mistakes is click on one of the swapped registers on the left hand side in objdiff, then click on the swapped register that's in the same location but on the right hand side. Even though the register numbers are different, you should see that the highlighted locations where they are used is the same on both sides if the function is equivalent. If the highlighted locations are different on each side, this might mean you used the wrong variable in that spot. (However, for very large functions, this trick might not always work. The highlighted locations might differ due to complex regalloc even if you didn't make any mistakes.)
If you did make a mistake that causes the function to be non-equivalent, then none of the patterns the rest of this guide goes over will help you, and you'll just be wasting your time trying them.
## Shuffling local variable declaration order
The order that local variables is declared is not always the same as the order they are first assigned to, and this can impact regalloc.
You can try moving all of the local variable declarations to the top of the function like so:
```cpp
/* 8024F410-8024FA90 .text cM3d_Cross_CylLin__FPC8cM3dGCylPC8cM3dGLinP3VecP3Vec */
int cM3d_Cross_CylLin(const cM3dGCyl* cyl, const cM3dGLin* line, Vec* param_2, Vec* param_3) {
f32 ratio;
f32 f2;
f32 fVar5;
f32 fVar2;
f32 fVar1;
f32 fVar6;
f32 fVar4;
BOOL bVar4;
BOOL bVar3;
BOOL bVar6;
BOOL bVar5;
u32 uVar11;
f32 sp28;
f32 r_sq;
int count;
ratio = 0.0f;
...
```
And then try moving them around relative to each other to see if you can change regalloc that way.
## C-style actor base pointer as a separate variable
When working on C-style actors (actor classes named like `xyz_class` that don't have member functions), you will often encounter a regswap where one of the incorrect registers will contain the `xyz_class* i_this` parameter. For example:
```cpp
/* 00000FF4-00001344 .text Line_check__FP9am2_class4cXyz */
static BOOL Line_check(am2_class* i_this, cXyz destPos) {
dBgS_LinChk linChk;
cXyz centerPos = i_this->current.pos;
centerPos.y += 100.0f + REG12_F(19);
i_this->mLinChkCenter = centerPos;
i_this->mLinChkDest = destPos;
linChk.Set(&centerPos, &destPos, i_this);
if (!dComIfG_Bgsp()->LineCross(&linChk)) {
return TRUE;
}
return FALSE;
}
```
In these cases, try making a local variable called `actor` to hold a pointer to the actor's base class at the start of the function, and then use that variable instead of `i_this` whenever the base class is needed:
```diff
/* 00000FF4-00001344 .text Line_check__FP9am2_class4cXyz */
static BOOL Line_check(am2_class* i_this, cXyz destPos) {
+ fopAc_ac_c* actor = i_this;
dBgS_LinChk linChk;
- cXyz centerPos = i_this->current.pos;
+ cXyz centerPos = actor->current.pos;
centerPos.y += 100.0f + REG12_F(19);
i_this->mLinChkCenter = centerPos;
i_this->mLinChkDest = destPos;
- linChk.Set(&centerPos, &destPos, i_this);
+ linChk.Set(&centerPos, &destPos, actor);
if (!dComIfG_Bgsp()->LineCross(&linChk)) {
return TRUE;
}
return FALSE;
}
```
## Casting
Explicitly casting from one type to another can affect regalloc. This applies to both primitive types and pointer types. Sometimes, you may have to add a cast that serves no practical purpose just to fix regalloc.
Even the type of casting operator you use affects it in some cases. For example, this C-style cast:
```cpp
J3DModelData* modelData = (J3DModelData*)dComIfG_getObjectRes(m_arcname, VBAKH_BDL_VBAKH);
```
Is functionally equivalent to this C++ static_cast:
```cpp
J3DModelData* modelData = static_cast<J3DModelData*>(dComIfG_getObjectRes(m_arcname, VBAKH_BDL_VBAKH));
```
But the two of them produce different regalloc.
## Temp variables
Sometimes, instead of writing a single line that does multiple things at once, you may need to split the intermediate values it calculates out into temp variables across multiple lines.
## Inlines
Inlines can affect regalloc, so be sure that you're using the exact inlines mentioned in the debug maps. Also try using inlines used in other functions from the same object, or inlines used in similar functions from a different object.
If you're sure that you're using the right inline, but there are still regswaps happening in the area of the function where the inline is used, the cause can sometimes be that the inline itself is implemented wrong. You may have to try modifying the inline and write the code inside it differently in order to fix the regalloc in the functions that use it. But when doing this, be careful that you don't break any already-matched functions that use the same inline you're modifying.
## Const
Whether a variable is `const` or not can affect regalloc (as well as instruction ordering). This is especially true for inline functions parameters.
Even though we know the function signatures of all functions and inlines from the symbol maps, const is not included in mangled symbol names for primitive parameters - only for pointer parameters.
For example, the mangled name `__ct<f>__Q29JGeometry8TVec3<f>Ffff` from the symbol maps would demangle to this signature:
```cpp
TVec3(f32 x, f32 y, f32 z)
```
However, `f32` is a primitive type. So the following is another possibility for this inline's signature:
```cpp
TVec3(const f32 x, const f32 y, const f32 z)
```
You may need to try adding or removing const from inlines like this, but be careful that you don't break any already-matched functions that use the same inline you're modifying.
+135
View File
@@ -0,0 +1,135 @@
# Dumps structure members to C format with comments and names
#@Pheenoh / Taka / LagoLunatic
#@category #Decomp
#@keybinding
#@menupath
#@toolbar
import re
dtm = currentProgram.getDataTypeManager();
type_name = askString("Generate Struct Members", "Enter a data type name: ")
struct_path = type_name.replace("::", "/")
struct = dtm.getDataType("/" + struct_path)
if struct is None:
struct = dtm.getDataType("/Demangler/" + struct_path)
if struct is None:
raise Exception("Could not find struct with name: %s" % type_name)
size = struct.getLength()
size_str = str("{:X}".format(size))
offset_pad_size = len(size_str)
check = False
start_address = ""
member_name = ""
undefined_member_name = ""
datatype_remaps = {
'byte': 'u8',
'uchar': 'u8',
'sbyte': 's8',
'short': 's16',
'ushort': 'u16',
'undefined1': 'u8',
'undefined2': 'u16',
'undefined4': 'u32',
'undefined1 *': 'u8*',
'undefined2 *': 'u16*',
'undefined4 *': 'u32*',
#'int': 's32', # Breaks matches sometimes
'long': 's32',
'unsigned int': 'uint',
'ulong': 'u32',
'float': 'f32',
'pointer': 'void*',
'MTX34': 'Mtx',
'MTX34 *': 'MtxP',
#'PTMF':
'TVec3<float>': 'JGeometry::TVec3<f32>',
'_GXColor': 'GXColor',
'_GXColorS10': 'GXColorS10',
}
undefined_member_name_prefix = "field_0x"
# undefined_member_name_prefix = "m"
out_lines = []
for i in range (struct.numComponents):
data_type = str(struct.getComponent(i).getDataType().getName())
offset = struct.getComponent(i).getOffset()
hex_offset_string = str("%0*X" % (offset_pad_size, offset))
if struct.getComponent(i).getFieldName() is not None:
member_name = str(struct.getComponent(i).getFieldName())
else:
member_name = undefined_member_name_prefix+hex_offset_string
if member_name in ["parent", "base"] and data_type in ['fopAc_ac_c', 'dBgS_MoveBgActor', 'fopNpc_npc_c', 'fopEn_enemy_c', 'daPy_py_c', 'daPy_npc_c']:
# Not a member, inheritance
continue
# if undefined member
if data_type == 'undefined' or check == True:
if data_type == 'undefined':
if check == False:
check = True
start_address = hex_offset_string
continue
check = False
undefined_member_name = undefined_member_name_prefix+start_address+"[0x"+hex_offset_string+" - 0x"+start_address+"]"
if undefined_member_name != "":
undefined_member_name = " /* 0x" +start_address+" */ u8 "+undefined_member_name+";"
out_lines.append(undefined_member_name)
undefined_member_name = ""
print(data_type)
data_type = str(data_type)
if data_type in datatype_remaps:
data_type = datatype_remaps[data_type]
if "[" in data_type:
# move array to member name
array_start_idx = data_type.find("[")
array = data_type[array_start_idx:]
data_type = data_type[0:array_start_idx]
if data_type in datatype_remaps:
data_type = datatype_remaps[data_type]
member_name = member_name+array
elif data_type == "char":
data_type = "s8"
if data_type == "PTMF": # Pointer to member function
match = re.search("^mCurr?(\S+Func)$", member_name)
if match:
data_type = match.group(1)
else:
# Just guess at the return type and parameters
data_type = "int"
member_name = "("+type_name+"::*"+member_name+")()"
member_string = " /* 0x" +hex_offset_string+" */ "+str(data_type).replace(" ","")+" "+member_name+";"
out_lines.append(member_string)
hex_end_offset_string = str("%0*X" % (offset_pad_size, struct.getLength()))
if check:
undefined_member_name = undefined_member_name_prefix+start_address+"[0x"+hex_end_offset_string+" - 0x"+start_address+"]"
member_string = " /* 0x" +start_address+" */ "+"u8"+" "+undefined_member_name+";"
out_lines.append(member_string)
out_lines.append("}; // Size: 0x%s" % hex_end_offset_string)
out_str = "\n".join(out_lines)
print(out_str)
# Copy to clipboard
from docking.dnd import GClipboard
from java.awt.datatransfer import Clipboard, StringSelection
clipboard = GClipboard.getSystemClipboard()
data = StringSelection(out_str)
clipboard.setContents(data, None)
+118
View File
@@ -0,0 +1,118 @@
# Fixing weak function ordering
If everything symbol in a TU is 100% matched, but the order of weak functions (ones with `[gw]` in objdiff) is different on the left and the right, then the TU is functionally equivalent, but it won't actually match when linked.
## Table of Contents
1. [Compiler flags](#compiler-flags)
2. [Factors affecting weak function order within a .text section](#factors-affecting-weak-function-order-within-a-text-section)
3. [Factors affecting the number of .text sections](#factors-affecting-the-number-of-text-sections)
## Compiler flags
The most common cause of weak functions being ordered incorrectly is simply the compiler flags. The following compiler flags are currently known to affect it:
* `-sym on` (the default)
* `-sym off`
* `-pragma "nosyminline on"`
If the weak function ordering is incorrect with the default (`-sym on`), you should try modifying [configure.py](../configure.py) to add different flags for the TU you're working on, like so:
```py
ActorRel(Matching, "d_a_am", extra_cflags=['-pragma "nosyminline on"']),
```
First try adding `-pragma "nosyminline on"`, as that fixes the weak function ordering for many actors. Try running `ninja` again to check if it matches this time. If it still doesn't match and the order is still wrong in objdiff, try using `-sym off` instead and checking again.
If neither of those fix it, I recommend marking the TU as `Equivalent` in [configure.py](../configure.py) and adding a comment about the weak function order, e.g.:
```py
ActorRel(Equivalent, "d_a_pirate_flag"), # weak func order
```
Then you can just submit a pull request as-is instead of worrying about it any more. The build system won't be able to automatically verify that the TU is accurately decompiled, but it will still contribute to the project's overall percent completion and be useful for anyone interested in understanding the code of the actor you just decompiled or modding the actor, as weak function order has no effect on the functionality of the code.
**You can stop reading here if you're new to decompilation and working on learning the basics.** The rest of this document will go into more advanced details about weak function ordering, but all of the exact specifics are not fully understood by anyone yet.
### Explanation of compiler flags
`-sym on` is a flag that enables debugging information, such as line numbers (you can see these line numbers in objdiff, to the left of the assembly instructions).
Due to a strange quirk of the compiler, this flag has the side effect of causing functions to be split up into multiple .text sections, one for each unique filename that a function is defined in. So functions defined in the .cpp file would go in the first .text section, functions defined in one header file would get their own separate .text section, functions defined in a different header file would go in a third .text section, etc.
`-sym off` disables that debugging information, removing line numbers.
This also disables the multiple .text sections side effect - all functions will go in a single section instead.
When `-sym on` is enabled, using the `-pragma "nosyminline on"` flag too will cause inline functions to have their debugging information disabled, while normal functions will still have debugging information.
This has an unpredictable effect on the number of .text sections in the file. Some of the weak functions defined in headers will be merged into the main .text section for the .cpp file, while others will be merged into a different .text section that's for a different header, others will not be merged at all, etc. It's not currently understood how this is determined.
## Factors affecting weak function order within a .text section
If one of those three compiler flags result in the .text sections being split up correctly, but there are still weak functions out of order compared to other functions in the same section, there are a number of known factors that can affect this.
### Defining template virtual weak functions inside vs outside a class body
For template classes, sometimes defining virtual weak functions within the class body like this:
```cpp
template<typename T>
class JPACallBackBase {
public:
JPACallBackBase() {}
virtual ~JPACallBackBase() {}
virtual void init(T) {}
virtual void execute(T) {}
virtual void executeAfter(T) {}
virtual void draw(T) {}
}; // Size: 0x04
```
Will result in those functions being put out of order. These can be fixed by moving the definitions to after the class body (but still in the header), and marking the declarations as `inline` within the class body:
```cpp
template<typename T>
class JPACallBackBase {
public:
JPACallBackBase() {}
virtual ~JPACallBackBase() {}
inline virtual void init(T);
inline virtual void execute(T);
inline virtual void executeAfter(T);
inline virtual void draw(T);
}; // Size: 0x04
template<>
void JPACallBackBase<JPABaseEmitter*>::init(JPABaseEmitter*) {}
template<>
void JPACallBackBase<JPABaseEmitter*>::execute(JPABaseEmitter*) {}
template<>
void JPACallBackBase<JPABaseEmitter*>::executeAfter(JPABaseEmitter*) {}
template<>
void JPACallBackBase<JPABaseEmitter*>::draw(JPABaseEmitter*) {}
```
### TODO HeartPiece's list
* [ ] pure virtual base class
* [ ] pure virtual declarations in classes inheriting from a pure virtual base class (i.e. re-declaring something pure virtual)
* [ ] implicit vs explicit definitions of special virtual functions (such as dtors)
* [ ] ordering of virtual weak function declaration in inheriting classes after their order is defined in a base class
* [ ] ordering of virtual function declaration in "higher" (more base) classes when the virtuals are weak in an inheriting class
* [x] definition within or external to a class within a given header for template virtuals (and template weak functions in general, do not have to be virtual)
* [ ] order of declaration of weak AND virtual weak functions in template class definitions (interweaving non-virtual weak function definitions between virtual function definitions affects ordering)
* [ ] calling of inlines within weak virtual template function definitions (calling inlines, even if they don't get generated as actual functions, affects where stuff spits out, such as "invisible" getters and setters)
* [ ] stripped functions calling virtual functions within files
* [ ] ordering of NON-weak functions within virtual tables (previously ordered by declaration in base classes) causing vtable spawn ordering adjustments (i.e. hitting the "key" method for a vtable before or after a "key" method for another table)
* [ ] calling a virtual function from an inheriting class vs a base class on the same object
## Factors affecting the number of .text sections
If none of the three sym compiler flags mentioned above result in the .text sections being split up correctly, then there are a few known factors that can influence this splitting, but for the most part this is still an unsolved problem.
We have a list of decompiled actors that cannot be linked due to this issue [here on GitHub](https://github.com/zeldaret/tww/issues?q=is%3Aissue%20state%3Aopen%20label%3Aweakfunc-order).
### TODO
* constructor defined in header vs cpp
* explicit vs implicit definition of virtual weak destructor in child class (e.g. `virtual ~dCcD_Cyl() {}`) (only has an effect WITHOUT nosyminline) (also applies to non-virtual destructors if the class has other virtual functions?)