[19jun2024] injecting symbols into llvm jit
Table of Contents
1. Introduction
2. TL;DR
When binding a symbols for in-process jit, consider supplyin absolute address instead of symbol name.
3. Context
- trying to make library functions callable from llvm-compiled functions.
- library xo-reflect provides c++ reflection
- library xo-expression provides abstract syntax trees (for a typed lambda calculus-ish language)
- library xo-jit compiles expressions using LLVM + links into running process via JIT
- libraries xo-pyreflect, xo-pyexpression, xo-pygit provide pybind11 wrappers.
Using these libraries can from python:
- construct an AST for a program
- compile to machine code
- run resulting machine code from the same python session, thanks to Jit
4. Setup
Libraries built with CMAKE_INSTALL_PREFIX
set to ~/local2
.
Then run python like
PYTHONPATH=~/local2/lib:$PYTHONPATH python
(my PYTHONPATH
rather long, contains a plethora of nix directories)
Then from python:
from xo_pyreflect import * from xo_pyexpression import * from xo_pyjit import * # builing program like # lambda (x) : x * x # with # x :: double f64_t = TypeDescr.lookup_by_name('double') x = make_var('x', f64_t) #f = make_sin_pm() # works f = make_mul_f64_pm() # fails to resolve c = make_apply(f, [x, x]) lm = make_lambda('sq', [x], c) mp = MachPipeline.make() code = mp.codegen(lm) mp.machgen_current_model()
5. Problem + Diagnosis
All this appears to work, however the llvm jit is lazy, so at this point although it's ready to produce machine code, it hasn't actually done so yet.
To get hold of the llvm-compiled sq
function, so we can invoke it from python,
we want to fetch corresponding symbol from the jit:
sq = mp.lookup_fn('double (*)(double)', 'sq')
This step would fail with error
Unable to resolve symbol: mul_f64
Even though inspecting libxo_jit.so
shows the symbol is present
$ readelf -d ~/proj/local2/lib/libxo_jit.so | grep mul_f64 ... D mul_f64 ...
We were relying on feature adopted from Kaleidoscope's JIT
// in xo/jit/Jit.hpp dest_dynamic_lib_.addGenerator (cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess (data_layout_.getGlobalPrefix())));
Although this seems to work for kaleidoscope, it's somehow not sufficient here.
Hypothesis:
- perhaps
GenericLibrarySearchGenerator::GetForCurrentProcess()
behaves differently when invoked from a shared library (in this caselibxo_jit.so
)? - maybe it can find the symbol for
::sin()
because they're in a library that's in scope starting fromlibLLVM.so
, but can't find::mul_f64()
because that comes from a sibling xo library (even when jamming the symbol intolibxo_jit.so
itself!)
6. Workaround
- Found this article on stack overflow:
- One of the answers gives a workaround for LLVM-16. Was able to adapt that solution for LLVM-18:
- Workaround is to explicitly bind the symbol to an absolute address.
- This only works via JIT in a running process, since only then do we know the absolute address for a symbol.
In our jit:
class Jit { public: /** intern @p symbol, binding it to address @p dest **/ template <typename T> llvm::Error intern_symbol(const std::string & symbol, T * dest) { llvm::orc::SymbolMap symbol_map; symbol_map[mangler_(symbol)] = llvm::orc::ExecutorSymbolDef(llvm::orc::ExecutorAddr::fromPtr(dest), llvm::JITSymbolFlags()); auto materializer = llvm::orc::absoluteSymbols(symbol_map); return dest_dynamic_lib_.define(materializer); } /*intern_symbol*/ };
Footnotes:
Using gcc 13.2, LLVM 18.1.5, building on ubuntu (really WSL2-on-windows), with nix supplying dependencies