More notes

This commit is contained in:
Spencer Tipping 2017-03-12 22:17:48 -06:00
parent bcce85af27
commit a01f60a266
1 changed files with 31 additions and 1 deletions

View File

@ -107,6 +107,17 @@ $
![image](http://storage2.static.itmages.com/i/17/0308/h_1488996910_5153802_e6927d8be0.jpeg) ![image](http://storage2.static.itmages.com/i/17/0308/h_1488996910_5153802_e6927d8be0.jpeg)
### Performance analysis ### Performance analysis
**In the real world, JIT is absolutely the wrong move for this problem.**
Array languages like APL, Matlab, and to a large extent Perl, Python, etc,
manage to achieve reasonable performance by having interpreter operations that
apply over a large number of data elements at a time. We've got exactly that
situation here: in the real world it's a lot more practical to vectorize the
operations to apply simultaneously to a screen-worth of data at a time -- then
we'd have nice options like offloading stuff to a GPU, etc.
However, since the point here is to compile stuff, on we go.
JIT can basically eliminate the interpreter overhead, which we can easily model JIT can basically eliminate the interpreter overhead, which we can easily model
here by replacing `interpret()` with a hard-coded Mandelbrot calculation. This here by replacing `interpret()` with a hard-coded Mandelbrot calculation. This
will provide an upper bound on realistic JIT performance, since we're unlikely will provide an upper bound on realistic JIT performance, since we're unlikely
@ -164,7 +175,7 @@ sys 0m0.000s
$ $
``` ```
### JIT design ### JIT design and the x86-64 calling convention
The basic strategy is to replace `interpret(registers, code)` with a function The basic strategy is to replace `interpret(registers, code)` with a function
`compile(code)` that returns a pointer to a function whose signature is this: `compile(code)` that returns a pointer to a function whose signature is this:
`void compiled(registers*)`. The memory for the function needs to be allocated `void compiled(registers*)`. The memory for the function needs to be allocated
@ -198,7 +209,26 @@ interpret:
movq %rdi, -40(%rbp) // registers arg -> local var movq %rdi, -40(%rbp) // registers arg -> local var
movq %rsi, -48(%rbp) // code arg -> local var movq %rsi, -48(%rbp) // code arg -> local var
jmp for_loop_condition // commence loopage jmp for_loop_condition // commence loopage
```
Before getting to the rest, I wanted to call out the `%rsi` and `%rdi` stuff
and explain a bit about how calls work on x86-64. `%rsi` and `%rdi` seem
arbitrary, which they are to some extent -- C obeys a platform-specific calling
convention that specifies how arguments get passed in. On x86-64, up to six
arguments come in as registers; after that they get pushed onto the stack. If
you're returning a value, it goes into `%rax`.
The return address is automatically pushed onto the stack by `call`
instructions like `e8 <32-bit relative>`. So internally, `call` is the same as
`push ADDRESS; jmp <call-site>; ADDRESS: ...`. `ret` is the same as `pop %rip`,
except that you can't pop into `%rip`. This means that the return address is
always the most immediate value on the stack.
Part of the calling convention also requires callees to save a couple of
registers and use `%rbp` to be a copy of `%rsp` at function-call-time, but our
JIT can mostly ignore this stuff because it doesn't call back into C.
```s
for_loop_body: for_loop_body:
// (a bunch of stuff to set up *src and *dst) // (a bunch of stuff to set up *src and *dst)