Reverse Engineering

What Is Lua Bytecode Virtualization (and How the Lua VM Works)?

On this page

Lua bytecode virtualization is an obfuscation technique that replaces Lua's standard virtual machine with a custom, secret one, so the compiled script can only be run by an interpreter the protector ships alongside it. Normal Lua compiles source into luac bytecode - a compact instruction set executed by the stock Lua VM, and readable with off-the-shelf tools. A virtualizer rewrites that bytecode into its own private instruction set, encodes it, and bundles a bespoke dispatch loop to run it. The original logic no longer lives in any function you can decompile; it lives in the semantics of opcodes only the custom VM understands. Recovering it - devirtualization - means reverse-engineering that VM. This entry follows the foundation laid out in birk.blog's Lua Virtualization series (Part 1: the internals of the Lua VM), which builds toward devirtualizing the well-known Luraph protector.

Quick facts

Base VMLua 5.1 - a register-based VM whose instructions mirror its C API
Compile pathlua source -> luac bytecode (inspect with luac -l); decompile with unluac
Instruction format32-bit: 6-bit opcode + A (8b), B (9b), C (9b), or Bx (18b) / sBx (signed)
What virtualization changesRenumbered/secret opcodes + encoded bytecode + a custom dispatch loop
Why stock tools failunluac and luac -l only know the standard opcode set, not the custom VM

From Lua source to luac bytecode

Lua is popular in games, config systems, and embedded scripting because it is a small VM whose instruction set mirrors its C API, making C/C++ bindings easy. When you run Lua, the source is compiled to luac bytecode. You can see it with the luac -l listing flag. Compiling print("Hello World!") on Lua 5.1 yields four instructions:

GETGLOBAL 0 -1   ; R(0) := _G["print"]
LOADK     1 -2   ; R(1) := "Hello World!"
CALL      0  2 1 ; R(0)(R(1))
RETURN    0  1

Each instruction is an opcode plus register indexes. A negative index points into the constant table instead of the register stack, which is why GETGLOBAL 0 -1 reads constant 1 ("print") and writes it to register 0. lopcodes.h documents the semantics, e.g. OP_CALL is R(A), ... ,R(A+C-2) := R(A)(R(A+1), ... ,R(A+B-1)). Reverse the process with the decompiler unluac (java -jar unluac.jar luac.out) and you get working source back - but not byte-identical to the original, because several source forms compile to the same bytecode. That round-trip working at all is exactly what virtualization is designed to break.

How the Lua VM is laid out

To understand what a virtualizer is hiding, you first need the stock layout. From lopcodes.h, an instruction is 32 bits packed as a 6-bit opcode plus operands: A is 8 bits, B and C are 9 bits each, and Bx is the 18-bit fusion of B and C (with sBx its signed form). Constants come in a few types from lua.h - LUA_TNIL, LUA_TBOOLEAN, LUA_TNUMBER (every number is a double), and LUA_TSTRING (stored without the trailing \0, so its on-disk length is length - 1).

Code is organised into function prototypes. There is one main function with everything nested inside it; each prototype holds its instruction array (code + count), a constant array (sizek), nested sub-functions (sizep), and - unless compiled with the -s strip flag - debug data (line info, local names, upvalue names, source name). The subtle part is upvalues and closures. An upvalue is a variable from an enclosing scope captured by a nested function. When a CLOSURE opcode runs, the instructions immediately after it are not executed - they are metadata describing each captured upvalue: a MOVE means the upvalue is local to the enclosing function (in_stack = 1, the B register is its stack index), and a GETUPVAL means it comes from a further-out scope (in_stack = 0, B indexes the enclosing function's upvalue list). So a closure with 4 upvalues is followed by 4 metadata instructions that the VM skips. Knowing which trailing instructions are data, not code, is essential the moment the opcodes stop being standard.

What "virtualization" means as obfuscation

Virtualization is the heaviest tier of code protection. Instead of merely renaming locals or encoding strings (ordinary deobfuscation territory), a virtualizer compiles the program down to a brand-new instruction set that only its own interpreter understands. In practice that means: the opcodes are renumbered or fully redesigned (so GETGLOBAL is no longer opcode 5, and may not exist as a single opcode at all), the bytecode blob is encoded or encrypted, and the protector ships a hand-written dispatch loop - a big switch/handler table - that decodes and executes the private instructions at runtime. Luraph is the best-known commercial example for Lua, cited here purely as a public reference for the technique.

The effect is that every stock tool stops working. luac -l and unluac only know the standard opcode set, so pointed at a virtualized chunk they produce garbage or refuse outright - there is no longer a one-to-one map from bytes to known semantics. The logic you want is smeared across the custom opcode handlers and the order the dispatch loop invokes them, which is the same structural trick used by obfuscated JavaScript VMs in the browser - including the interrogator scripts that anti-bot vendors ship for fingerprinting. The language differs; the shape is identical.

How you devirtualize it

Recovering readable code from a virtualized script means reconstructing the custom VM, then replaying its program statically. A practical starting point is the Lua source itself: lvm.c handles opcode execution, ldump.c serialises bytecode, and print.c renders the luac -l listing. Patching ldump.c and print.c to emit extra debug output teaches you exactly how a chunk is laid out and what data it carries - the groundwork for parsing a non-standard one. From there the job is to map each custom opcode back to a known operation (load, call, arithmetic, jump, closure), recover the constant pool, and lift the dispatch trace into a high-level form you can read - the same flatten-and-saturate approach used to devirtualize a JavaScript VM.

Why does this matter beyond game scripts? Because the same VM-obfuscation pattern guards a lot of client-side anti-bot logic on the web, and teams scraping protected endpoints often hit it. Reversing a bespoke VM per target is slow and brittle, which is why many developers skip the reverse-engineering arms race entirely and let a managed web-data API such as Scrappey run a real browser and return the rendered result - the obfuscated VM executes as intended, server-side, and you consume the output. Reversing it yourself remains the right path when you need to understand or reimplement the logic; offloading it is the pragmatic path when you just need the data.

Code example

text
# $ luac -l hello.lua    (Lua 5.1) -> human-readable standard bytecode
main <hello.lua:0,0> (4 instructions)
  1  GETGLOBAL  0 -1   ; R0 := _G["print"]     (-1 => constant index, not a register)
  2  LOADK      1 -2   ; R1 := "Hello World!"
  3  CALL       0  2 1 ; R0(R1), 0 results
  4  RETURN     0  1

# Each 32-bit instruction packs:  [ opcode:6 | A:8 | C:9 | B:9 ]   (Bx = C+B = 18 bits)

# A virtualization obfuscator (e.g. Luraph) renumbers/redesigns these opcodes,
# encodes the bytecode, and ships its OWN dispatch loop to run them. Result:
# unluac and "luac -l" no longer understand the chunk. Recovering the logic
# means reverse-engineering that custom VM -- i.e. devirtualization.

Related terms

How Do You Devirtualize an Obfuscated JavaScript VM?
Devirtualization is the process of recovering a readable program from JavaScript that has been compiled into a tiny interpreter — a virtual …
How Does Deobfuscation Work?
Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators scra…
How to Reverse-Engineer API Requests for Scraping
Reverse-engineering API requests for scraping means watching the network traffic a website makes, spotting the JSON endpoints that feed its …
What Is WASM Fingerprinting?
WebAssembly (WASM) fingerprinting is a newer anti-bot technique that identifies a browser by measuring how its actual CPU behaves, instead o…
What Is Anti-Bot Detection?
Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…
What Is Browser Fingerprinting?
Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…
What Are Common Lua Obfuscation Techniques?
Lua obfuscation is the practice of rewriting a script so it still runs identically but actively resists reverse-engineering tools, ranging f…
What Is Dual-VM Lua Obfuscation?
Dual-VM Lua obfuscation runs your program through two stacked virtual machines - a deserialization VM that decodes an encrypted blob into an…
What Is a Deserialization VM?
A deserialization VM is the outer virtual machine in a stacked virtualizer that turns an encrypted data blob into the instruction stream the…
What Is Polymorphic (Self-Modifying) Bytecode?
Polymorphic bytecode is virtual-machine code that rewrites its own instructions at runtime before executing them, so the statically dumped i…
What Is Dynamic IAT Resolution (Import Hashing)?
Dynamic IAT resolution (import hashing) is an anti-analysis technique where a binary hides which OS APIs it uses by resolving them at runtim…

Concept map

How Lua Bytecode Virtualization connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Reverse Engineering
Building map…

Frequently asked questions

What is the difference between Lua bytecode and Lua virtualization?

Lua bytecode (luac) is the standard compiled form of a Lua program, executed by the normal Lua VM and readable with tools like luac -l and the unluac decompiler. Virtualization goes a step further: it recompiles the program into a custom, secret instruction set and ships a bespoke interpreter to run it, so the standard tools no longer apply. Plain bytecode is a known format; a virtualized chunk is a private one you must reverse-engineer.

Can I just use unluac to decompile a virtualized Lua script?

No. unluac and luac -l only understand the standard Lua opcode set. A virtualizer renumbers or redesigns the opcodes, encodes the bytecode, and runs it through its own dispatch loop, so stock tools produce garbage or fail. Recovering the original logic requires reconstructing the custom VM - mapping its opcodes back to real operations, recovering the constant pool, and lifting the dispatch trace into readable code.

Why does Lua VM virtualization matter outside of games?

Because the exact same pattern - compile logic to a private instruction set and hide it behind a custom interpreter - is used to obfuscate client-side anti-bot and fingerprinting scripts in the browser. The techniques for understanding the Lua VM transfer directly to devirtualizing an obfuscated JavaScript VM, which is why this is a foundational reverse-engineering skill rather than a Lua-only curiosity.

Last updated: 2026-06-04