From ce2c78a3081a2c108603794bc3c6df7413e76b36 Mon Sep 17 00:00:00 2001 From: Aetias Date: Sun, 13 Oct 2024 11:18:14 +0200 Subject: [PATCH] Update docs --- CONTRIBUTING.md | 108 ++++-------------------------- INSTALL.md | 43 ++++-------- README.md | 4 +- docs/build_system.md | 152 ++++++++----------------------------------- docs/decompiling.md | 11 ++-- extract/README.md | 2 + 6 files changed, 64 insertions(+), 256 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 883982ae..7e671659 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -5,111 +5,31 @@ - [Creating new `.c`/`.cpp` files](#creating-new-ccpp-files) ## Project structure -- `asm/`: Non-decompiled assembly code - - `ovXX/`: Code for overlay `XX` - - `*.s`: Source file in assembly - - `*.inc`: External symbols imported by respective source file - `build/`: Build output - - `arm9_linker_script.lcf`: Linker command file for ARM9 program, specifies the order to put code and data into the ROM - - `arm9_objects.txt`: List of object files to pass to the linker - - `eur/`: Compiled/linked files - - `asm/`: Built assembly code - - `src/`: Built C/C++ code - - `overlays/`: Contains `.bin` and `.lz` files for each overlay - - `*.bin`: Linked code/data to compress or put in the ROM - - `*.lz`: Compressed code to put in the ROM - - `main.bin.xMAP`: Map file listing RAM addresses for all symbols + - `eur|usa/`: Target version + - `build/`: Linked ROM objects + - `delinks/`: Objects delinked from the base ROM + - `libs|src/`: Built C/C++ code + - `arm9.o`: Linked ELF object + - `arm9.o.xMAP`: Map file listing memory addresses for all symbols +- `config/`: [`dsd`](https://github.com/AetiasHax/ds-decomp) configuration files - `docs/`: Documentation about the game +- `extract/`: Game assets, extracted from your own supplied ROM + - `eur|usa/`: [`ds-rom`](https://github.com/AetiasHax/ds-rom) extract directories - `include/`: Include files -- `ph_eur/`: Game assets, extracted from your own supplied ROM - - `assets/`: Unmodified assets - - `banner/`: Banner logo and text that shows on the DS home menu - - `arm7.bin`: Extracted ARM7 program - - `arm9_ovdata.bin`: Data about ARM9 overlays - `src/`: Source C/C++ files - `tools/`: Tools for this project - - `compress/`: Compresses code before it is put in the ROM - - `include/`: Common C code for multiple tools - `mwccarm/`: Compiler toolchain - - `rom/`: Extracts and builds ROMs - - `gen_externs.py`: Generates `.inc` files, use `make gen_externs` to run it - - `lcf.py`: Generates `arm9_linker_script.lcf` - - `m2ctx.py`: Generates context for decomp.me - - `patch_mwcc.py`: Patches bugs in the toolchain - - `progress.py`: Computes decompilation progress + - `configure.py`: Generates `build.ninja` + - `m2ctx.py`: Generates context for [decomp.me](https://decomp.me/) + - `mangle.py`: Shows mangled symbol names in a given C/C++ file - `requirements.txt`: Python libraries - `setup.py`: Sets up the project -- `assets.txt`: The order of asset directories to put in the ROM - `*.sha1`: SHA-1 digests of different versions of the game ## Decompiling See [/docs/decompiling.md](/docs/decompiling.md). -## Creating new `.c`/`.cpp` files -New source files must be added to the LCF (Linker Command File). This is done via `lcf.py`, which generates the LCF when -building. - -In `lcf.py`, you will see a list of overlays near the top. Each overlay then has a list of source files ending in `.s`, `.c` or -`.cpp`. Those source files, when compiled, are appended to the ROM in the order that they appear in the list. - -So, to create a new source file, you put the path to the source file in the correct overlay so that it appears in the correct -order in relation to other source files. - ## Code style -The code style is not strict, but please try to mimic the existing style as much as possible. - -If it's impossible to match a function while following the code style, then it's OK to not follow it. But do let us know when -this happens so we may amend the code style. - -Below is an example of the code style in this project. If something is unclear, look at existing code. If the existing code is -insufficient, then you may decide the code style in that situation. -```cpp -// Space before pointer asterisk * and reference ampersand & -s32 MyClass::MyMethod(MyStruct *myStruct, s32 &anInteger) { - // Opening brace { on the same line - // Space after `if`, `while`, `for` and `switch` - if (myStruct->isCool) { - // Class member fields are prefixed with "m" - mInteger = anInteger; - } - // No space before asterisk * in pointer casts - // Space after cast operator - mPointer = (u32*) &anInteger; - - // Prefer pre-increment ++i - // Use s32, s16, s8, etc. instead of int, short, char - for (s32 i = 0; i < 10; ++i) { - // Use `char` instead of s8 to indicate actual characters - char ch = 'A' + i * 2; - mString[i] = ch; - } - - // Put long conditions on new line - if ( - // Add clarifying parentheses for bool operators - (mInteger > 10 && mPointer != NULL) || - (mInteger < 5) - ) { - // Add clarifying parentheses for bitwise operators - mBool = ((mInteger >> 5) & 1) != 0; - } - - do { - // Call member functions using `this` - this->DoStuff(); - // In do-while loops, `while` on same line as closing brace } - } while (this->CanDoStuff()); - - switch (mInteger) { - // Indent `case` - // If possible, put braces after `case` - case 8: { - return *mPointer; - // If possible, put `break` after closing brace } - } break; - } - - // No parentheses around return value - return mInteger; -} -``` +This project has a `.clang-format` file and all C/C++ files in this project should follow it. We recommend using an editor +compatible with `clang-format` to format the code as you save. diff --git a/INSTALL.md b/INSTALL.md index 7e15557d..8562a45d 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -10,14 +10,13 @@ Contents: ## Prerequisites 1. Use one of these platforms: - - Windows (MSYS) - - Linux via WSL + - Windows (recommended) - Linux 2. Install the following: - Python 3.11+ and pip - GCC 9+ - - Make - - **On Linux/WSL**: Wine/Wibo + - Ninja + - **On Linux**: Wine/Wibo 3. Install the Python dependencies: ```shell python -m pip install -r tools/requirements.txt @@ -26,38 +25,22 @@ python -m pip install -r tools/requirements.txt ```shell python tools/setup.py ``` +5. Run the Ninja configure script: +```shell +python tools/configure.py +``` + +> [!IMPORTANT] +> Rerun `configure.py` often to ensure that all C/C++ code gets compiled. > [!NOTE] -> For Linux users: If you plan to use Wibo instead of Wine, run make with `make WINE= ...`. - -## Build the ROM - -This repository does not include any of the game's assets, and you will need an original decrypted base ROM. -Put the base ROM in the root directory of this repository. Please verify that your dumped ROM matches one of the versions -below: - -| Version | File name | SHA1 | -| ------- | ----------------- | ------------------------------------------ | -| EUR | `baserom_eur.nds` | `02be55db55cf254bd064d2b3eb368b92a5b4156d` | -| USA | `baserom_usa.nds` | `4c8f52dd719918bbcd46e73a8bae8628139c1b85` | - -Run `make extract` to extract from all the base ROMs you've provided. You only need to do this once. - -Once you have extracted the base ROM, simply run `make eur` or `make usa` to rebuild it. +> For Linux users: If you plan to use Wibo instead of Wine, run `configure.py` with `-w `. +6. Put one or more base ROMs in the [`/extract/`](/extract/README.md) directory of this repository. ### Matching the base ROM **This is optional!** You only need to follow these steps if you want a matching ROM. -> [!NOTE] -> For interested readers: -> Retail games are usually "encrypted," which means that the first 0x800 bytes of the secure area is encrypted using a -4168-byte key found in the ARM7 BIOS. The secure area is 0x4000 bytes long and lives at the start of the ARM9 program at -address 0x2000000. -> This encryption is optional, and games will run just fine without it. In fact, this project doesn't even produce an -encrypted ROM. However, the ROM header includes a checksum of the secure area **after** encryption, so we must calculate it -somehow. - First, [extract the ARM7 BIOS from your DS device](https://wiki.ds-homebrew.com/ds-index/ds-bios-firmware-dump). Put the ARM7 BIOS in the root directory of this repository, and verify that your dumped BIOS matches the one below: @@ -65,4 +48,4 @@ ARM7 BIOS in the root directory of this repository, and verify that your dumped | --------------- | ------------------------------------------ | | `arm7_bios.bin` | `6ee830c7f552c5bf194c20a2c13d5bb44bdb5c03` | -Now, `make` should automatically detect the ARM7 BIOS and will build a matching ROM. +Now, rerun `configure.py` so it can update `build.ninja` to build a matching ROM. diff --git a/README.md b/README.md index c992b7b4..98934c3f 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # The Legend of Zelda: Phantom Hourglass **Work in progress!** This project aims to recreate source code for ***The Legend of Zelda: Phantom Hourglass*** by decompiling -assembly code by hand. **The repository only contains code.** To build the ROM, you must own an existing copy of the game to -extract assets from. +assembly code by hand. **The repository does not contain assets or assembly code.** To build the ROM, you must own an existing +copy of the game to extract assets from. **Note:** The project targets the European and American versions, and other versions might be supported later. diff --git a/docs/build_system.md b/docs/build_system.md index a81d6e4e..8b40d3ab 100644 --- a/docs/build_system.md +++ b/docs/build_system.md @@ -2,30 +2,34 @@ This document describes the build system used for this decompilation project, for those interested to learn about how we build the ROM. - [Extracting assets](#extracting-assets) -- [Assembling code](#assembling-code) +- [Delinking code](#delinking-code) - [Compiling code](#compiling-code) -- [Postprocessing ELF files](#postprocessing-elf-files) - [Generating a linker command file](#generating-a-linker-command-file) - [Linking modules](#linking-modules) -- [Compressing modules](#compressing-modules) - [Building the ROM](#building-the-rom) ## Extracting assets -We implemented a tool called [`extractrom`](/tools/rom/extract.c) that extracts assets from a base ROM that you -provide yourself. It extracts the following data: +We use [`ds-rom`](https://github.com/AetiasHax/ds-rom) to extract code and assets from a base ROM that you provide yourself. It +extracts the following data: - ARM7 program - - Code for the DS coprocessor CPU, aka ARM7 + - Code for the DS coprocessor CPU, the ARM7TDMI aka ARM7 - The program is likely similar to other retail games, so it is not decompiled in this project +- ARM9 program + - The main program that runs on game launch + - Also contains the Instruction TCM (ITCM) and Data TCM (DTCM) modules +- ARM9 overlays + - Dynamically loaded modules that overlap each other in memory - Banner - Logo and text that is displayed on the DS home menu -- Assets +- Files/assets - Models, textures, maps, etc. -- Overlay data - - We need the file ID for each overlay, since there is currently no other way to determine the file IDs correctly -## Assembling code -Files in the `/asm/` directory with the `.s` extension is assembly code. These files are grouped into modules, which consists -of overlays, a main module, an Instruction TCM (ITCM) module and a Data TCM (DTCM) module. +## Delinking code +We use [`dsd`](https://github.com/AetiasHax/ds-decomp) as a toolkit for DS decompilation. This includes taking the extracted +code and splitting (delinking) them into smaller files. By editing a `delinks.txt` file, we can tell `dsd` to add more delinked +files to the project. + +Each `delinks.txt` file belongs to one module, such as the ARM9 program, the ITCM, the DTCM or an overlay. > [!NOTE] > For interested readers: @@ -43,7 +47,7 @@ of overlays, a main module, an Instruction TCM (ITCM) module and a Data TCM (DTC > memory and has predictable access time unlike typical RAM. However, they are fully static, which means no heap or stack will > live there. So, they are mostly reserved for hot code and data. -The assembly files themselves consist of multiple sections: +Each module and delinked file consist of multiple sections: - `.text`: Functions - `.init`: Static initializers - `.ctor`: List of static initializers @@ -51,7 +55,8 @@ The assembly files themselves consist of multiple sections: - `.data`: Global variables - `.bss`/`.sbss`: Global uninitialized variables -When the code is linked, all code of the same section will be written adjacent to each other. More on this in [Linking modules](#linking-modules) below. +When the code is linked, all code of the same section will be written adjacent to each other. More on this in +[Linking modules](#linking-modules) below. ## Compiling code This game was written in C++, so most of the code we decompile will be in this programming language. In C++, we typically don't @@ -70,8 +75,7 @@ void MyClass::MemberFunction() {} - To our knowledge, there is at most one static initializer per source file. This means that multiple variables can be initialized in one static initializer, if they are in the same source file. - See the example below. Since `foo` is initialized by a constructor and not as plain data, this constructor has to be - called at some point before `foo` can be used. In the case of an overlay, this happens as soon as the overlay has been - loaded. + called at some point before `foo` can be used. Overlays do this happens as soon as the overlay has been loaded. ```cpp class Foo { int myValue; @@ -83,7 +87,7 @@ Foo foo = Foo(42); ``` - `.ctor` - List of static initializers - - Generated automatically as soon as you make a static initializer + - Generated automatically when you create a static initializer - `.rodata` - Global or static constants - Example: @@ -134,22 +138,10 @@ int thisWillBeSbss; #pragma section sbss end ``` -## Postprocessing ELF files -The result of compiling and assembling is an ELF (Executable and Linkable Format) file. We do some postprocessing on these -files to ensure that we can get a matching ROM: -- Killing implicit functions - - Writing a constructor/destructor often generates multiple functions used for different purposes. The game does not always - use each type of ctor/dtor, so some functions must be killed before building the ROM. This is done by writing - `KILL(FunctionToKill)` in any C/C++ file, which is postprocessed by [`elfkill`](/tools/elf/elfkill.cpp) which puts such - functions in a section called `.dead`, instead of `.text`. - ## Generating a linker command file The linker command file (LCF), also known as linker script, tells the linker in which order it should link the compiled or -assembled files. It is generated by [`lcf.py`](/tools/lcf.py), which is also the file where we define our source files. - -In `lcf.py` we can see how the source/assembly files are grouped into modules. These groups are then used to generate the LCF. -You can see the generated LCF in `/build/arm9_linker_script.lcf` after you've built the ROM. +assembled files. It is generated by `dsd` which calculates a correct file order according to the `delinks.txt`. The LCF also decides in what order the sections are linked in each module. In the main module, the order is: @@ -163,106 +155,18 @@ For overlays, `.init` comes after `.rodata`: ---------|-----------|---------|---------|--------|--------|---------
-The ITCM module contains mostly `.text`, but has an unused `.bss` section at the end to pad out the ITCM to exactly 32 kB, -which is exactly the size of the ITCM. +The ITCM only contains `.text` and the DTCM only contains `.data` and `.bss`. -The DTCM module contains only `.data` and `.bss` and is exactly 16 kB, i.e. the size of the DTCM. - -The LCF also decides the file names where each module is written to. Overlays have one file each (`ov00.bin`, `ov01.bin`, etc), -while the main module, ITCM and DTCM are linked to the same file (`arm9.bin`). - -Lastly, the LCF creates extra files that do not come from code: -- `arm9_footer.bin` - - To be appended to the ROM after `arm9.bin`. - - This file contains an offset to some build information in the main module. This information then points to the ITCM and - DTCM modules inside `arm9.bin`. Technically, the TCMs are placed in the main module's `.bss` section, and will be moved - over to the actual ITCM and DTCM when the game boots up. -- `arm9_metadata.bin` - - Contains some data which will be inserted into the main module build information mentioned above. Some of this data is - also needed during the [ROM building step](#building-the-rom), which is why they are placed in this metadata file. -- `arm9_ovt.bin` - - ARM9 overlay table - - This is a segment in the ROM which declares the address space for each overlay. Some data is missing in this table, and - will be completed during the [ROM building step](#building-the-rom). +The LCF generates ROM images for each module into the `/build//build/` directories. These are then passed back into +`ds-rom` to rebuild the ROM. ## Linking modules The LCF and list of compiled/assembled files will be passed to the linker, which generates the files mentioned in the previous section. -## Compressing modules -All ARM9 code is compressed, to save space on the ROM. The compression algorithm is a variant of [LZ77](https://en.wikipedia.org/wiki/LZ77_and_LZ78#LZ77) -but compressed backwards, starting from the end of the file and working its way to the start. - -In short, LZ77 works as follows. The file is read back to front, byte for byte. Anytime a new byte is read, the algorithm -searches forward through the file for any sequence of bytes that match the bytes being read. - -If such a sequence exists, and is 3 bytes or longer, the algorithm emits a **length-distance pair**. A length-distance pair -encodes this sequence as 4 bits of length, and 12 bits of distance. The length ranges between 3 and 18, and the distance can be -up to 4095 bytes ahead. - -If no such sequence exists within this 4095 byte window, the algorithm instead emits a **literal**, which is simply one -uncompressed byte. - -Length-distance pairs and literals are collectively called **tokens**. For every 8 tokens, the algorithm emits a flag byte. -In this byte, each of the 8 bits determines if an upcoming token is a literal or a length-distance pair. - -This project implements [`compress`](/tools/compress/main.c), which manages to match this algorithm, including several edge -case improvements to the compressed file. - -For instance, as you approach the start of the file, you may lose a few bytes due to lack of length-distance pairs. In that -case, it's actually better not to compress the start of the file, as it would waste both ROM space and CPU time when -decompressing. - -The code that decompresses the modules is located in the main module. This means that the first 16 kB of the main module is not -compressed. This segment is called the secure area, and includes the entrypoint function and decompression algorithm, among -others. +The linker eliminates some dead code such as unused constructor and destructor variants. ## Building the ROM -At this stage, we have obtained the following resources to put in the final ROM: -- Extracted: - - ARM7 program - - Banner - - Assets - - Overlay data (file IDs) -- Built: - - ARM9 main module (compressed), including ITCM and DTCM - - ARM9 main footer - - ARM9 metadata - - ARM9 overlay modules (compressed) - - ARM9 overlay table -- Other: - - Assets listing [`assets.txt`](/assets.txt) - - ARM7 BIOS (dumped from your own DS device) - -We implement the [`buildrom`](/tools/rom/build.c) tool which combines these files in order to build a ROM, in such a way that -it can match the original base ROM. - -The procedure is quite long, but here's a summary of the content in the ROM, listed in order of appearance: - - Section | Description -----------------------|------------- -Header | Game ID, region, offsets to other sections, CRC checksums, ARM9/ARM7 entrypoint addresses -ARM9 main module | The full contents of `arm9.lz` -ARM9 main footer | The full contents of `arm9_footer.bin` -ARM9 overlay table | The full contents of `arm9_ovt.bin`, plus file IDs from `extractrom` and overlay file sizes after compression -ARM9 overlay modules | The full contents of `ov00.lz`, `ov01.lz`, etc -ARM7 program | Taken directly from `extractrom` -File name table | Assets file hierarchy, directory/file names, file IDs for each asset file -File allocation table | Maps file ID to an offset within the ROM where the asset file is located -Assets | Taken directly from `extractrom`, prioritized by `assets.txt` - -> [!NOTE] -> For interested readers: -> The ROM file format has been documented online for a very long time, but there are some details that are necessary for -> building a matching ROM that there was no documentation for, until now: -> -> The file name table (FNT) is sorted with special priority rules: -> 1. Directories before files -> 2. Alphabetic, case-insensitive ordering -> 3. Shortest name first -> -> The order that assets are written to the ROM is sorted in a different way: -> 1. Traverse directories listed in `assets.txt` from top to bottom -> 2. ASCII ordering, i.e. case-sensitive -> 3. Shortest name first +At this stage, we should have all the resources needed to rebuild the ROM. We use `ds-rom` to build everything according to the +specifications of the base ROM, but instead using the ROM images that the linker created. diff --git a/docs/decompiling.md b/docs/decompiling.md index caf723de..c01ba1cf 100644 --- a/docs/decompiling.md +++ b/docs/decompiling.md @@ -10,21 +10,21 @@ stuck or need assistance. ## Pick a source file See the `decomp` tag in the [issue tracker](https://github.com/AetiasHax/ph/issues?q=is%3Aopen+is%3Aissue+label%3Adecomp) for a list of delinked source files that are ready to be decompiled. This list grows as more source files are delinked from the -rest of the Assembly code. +rest of the base ROM. You can claim a source file by leaving a comment on its issue, so that GitHub allows us to assign you to it. This indicates that you are currently decompiling that source file. If you want to unclaim the file, leave another comment so we can be certain that the source file is available to be claimed -again. Remember to make a pull request of any notable progress you made on the source file, which can include -[non-matching functions](/CONTRIBUTING.md#non-matching-functions). +again. Remember to make a pull request of any progress you made on the source file, whether it is just header files or +partially decompiled code. ## Decompiling a source file We use the object diffing tool [`objdiff`](https://github.com/encounter/objdiff) to track differences between C++ and assembly code. 1. [Download the latest release.](https://github.com/encounter/objdiff/releases/latest) -1. Run `python tools/objdiff.py ` to generate `objdiff.json` in the project root. -1. In `objdiff`, set the project directory to the root of this project. This will load `objdiff.json`. +1. Run `configure.py` and `ninja` to generate `objdiff.json` in the `/config//arm9/` directories. +1. In `objdiff`, set the project directory to one of the mentioned `arm9/` directories. 1. Select your source file in the left sidebar: ![List of objects in objdiff](images/objdiff_objects.png) 5. See the list of functions and data to decompile: @@ -68,7 +68,6 @@ following: 1. Once you're sent to `decomp.me`, go to "Options" and change the preset to "Phantom Hourglass". 1. Paste your code into the "Source code" tab. 1. Share the link with us! -- In the worst case, add the function as a [non-matching function](/CONTRIBUTING.md#non-matching-functions). ## Decompiling `.init` functions > [!NOTE] diff --git a/extract/README.md b/extract/README.md index f2179138..cb6f93a5 100644 --- a/extract/README.md +++ b/extract/README.md @@ -1,3 +1,5 @@ +This repository does not include any of the game's assets, and you will need an original decrypted base ROM. + Put the base ROM(s) in this directory. Please verify that your dumped ROM matches one of the versions below: | Version | File name | SHA1 |