From Lua source to luac bytecode
Lua is popular in games, config systems, and embedded scripting because it is a small VM whose instruction set mirrors its C API, making C/C++ bindings easy. When you run Lua, the source is compiled to luac bytecode. You can see it with the luac -l listing flag. Compiling print("Hello World!") on Lua 5.1 yields four instructions:
GETGLOBAL 0 -1 ; R(0) := _G["print"]
LOADK 1 -2 ; R(1) := "Hello World!"
CALL 0 2 1 ; R(0)(R(1))
RETURN 0 1Each instruction is an opcode plus register indexes. A negative index points into the constant table instead of the register stack, which is why GETGLOBAL 0 -1 reads constant 1 ("print") and writes it to register 0. lopcodes.h documents the semantics, e.g. OP_CALL is R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1)). Reverse the process with the decompiler unluac (java -jar unluac.jar luac.out) and you get working source back - but not byte-identical to the original, because several source forms compile to the same bytecode. That round-trip working at all is exactly what virtualization is designed to break.
How the Lua VM is laid out
To understand what a virtualizer is hiding, you first need the stock layout. From lopcodes.h, an instruction is 32 bits packed as a 6-bit opcode plus operands: A is 8 bits, B and C are 9 bits each, and Bx is the 18-bit fusion of B and C (with sBx its signed form). Constants come in a few types from lua.h - LUA_TNIL, LUA_TBOOLEAN, LUA_TNUMBER (every number is a double), and LUA_TSTRING (stored without the trailing \0, so its on-disk length is length - 1).
Code is organised into function prototypes. There is one main function with everything nested inside it; each prototype holds its instruction array (code + count), a constant array (sizek), nested sub-functions (sizep), and - unless compiled with the -s strip flag - debug data (line info, local names, upvalue names, source name). The subtle part is upvalues and closures. An upvalue is a variable from an enclosing scope captured by a nested function. When a CLOSURE opcode runs, the instructions immediately after it are not executed - they are metadata describing each captured upvalue: a MOVE means the upvalue is local to the enclosing function (in_stack = 1, the B register is its stack index), and a GETUPVAL means it comes from a further-out scope (in_stack = 0, B indexes the enclosing function's upvalue list). So a closure with 4 upvalues is followed by 4 metadata instructions that the VM skips. Knowing which trailing instructions are data, not code, is essential the moment the opcodes stop being standard.
What "virtualization" means as obfuscation
Virtualization is the heaviest tier of code protection. Instead of merely renaming locals or encoding strings (ordinary deobfuscation territory), a virtualizer compiles the program down to a brand-new instruction set that only its own interpreter understands. In practice that means: the opcodes are renumbered or fully redesigned (so GETGLOBAL is no longer opcode 5, and may not exist as a single opcode at all), the bytecode blob is encoded or encrypted, and the protector ships a hand-written dispatch loop - a big switch/handler table - that decodes and executes the private instructions at runtime. Luraph is the best-known commercial example for Lua, cited here purely as a public reference for the technique.
The effect is that every stock tool stops working. luac -l and unluac only know the standard opcode set, so pointed at a virtualized chunk they produce garbage or refuse outright - there is no longer a one-to-one map from bytes to known semantics. The logic you want is smeared across the custom opcode handlers and the order the dispatch loop invokes them, which is the same structural trick used by obfuscated JavaScript VMs in the browser - including the interrogator scripts that anti-bot vendors ship for fingerprinting. The language differs; the shape is identical.
How you devirtualize it
Recovering readable code from a virtualized script means reconstructing the custom VM, then replaying its program statically. A practical starting point is the Lua source itself: lvm.c handles opcode execution, ldump.c serialises bytecode, and print.c renders the luac -l listing. Patching ldump.c and print.c to emit extra debug output teaches you exactly how a chunk is laid out and what data it carries - the groundwork for parsing a non-standard one. From there the job is to map each custom opcode back to a known operation (load, call, arithmetic, jump, closure), recover the constant pool, and lift the dispatch trace into a high-level form you can read - the same flatten-and-saturate approach used to devirtualize a JavaScript VM.
Why does this matter beyond game scripts? Because the same VM-obfuscation pattern guards a lot of client-side anti-bot logic on the web, and teams scraping protected endpoints often hit it. Reversing a bespoke VM per target is slow and brittle, which is why many developers skip the reverse-engineering arms race entirely and let a managed web-data API such as Scrappey run a real browser and return the rendered result - the obfuscated VM executes as intended, server-side, and you consume the output. Reversing it yourself remains the right path when you need to understand or reimplement the logic; offloading it is the pragmatic path when you just need the data.