The Deserialization VM: The Pre-VM Stage of a Lua Virtualizer

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

The Deserialization VM: The Pre-VM Stage of a Lua Virtualizer — conceptual illustration

On this page

A deserialization VM is the outer virtual machine in a stacked virtualizer that turns an encrypted data blob into the instruction stream the real VM executes, while enforcing anti-tamper checks along the way. Before any real logic runs, a pre-VM stage normalizes and decompresses the blob (in Luraph, the long string starting with LPH}), parses constants and function prototypes, and emits a structured table of instructions, register operands, and constants. Reversing it is easier than it looks because its output is structurally identical to a known deserialize function - giving you the exact shape to aim for. This follows birk.blog's Lua Virtualization Part 4, which devirtualizes Luraph as the public case study.

Job	Decode/decrypt the blob (e.g. LPH}...) into the real VM's instruction stream
Pipeline	Normalize + decompress -> parse constants -> parse prototypes -> clear globals -> entrypoint
Output table	Insts, REG_A/B/C, constants (encrypted), decrypted_constants, function_prototypes, stk_size
Anti-tamper	Hijacks string __tostring; print() crashes unless you neutralize it
Scale	~20,000 deserialization instructions run before ~5,000 real-VM instructions

The pre-VM pipeline and its output

Execution splits into three stages: a pre-VM stage that deserializes the raw blob, the real VM that runs the program, and a post-VM stage for error handling and return values. The pre-VM begins by normalizing and decompressing the blob, then parses the constants, then the function prototypes (each prototype's instructions extracted by a "get next function instructions" routine), clears the globals used as scratch interface, and reads the entrypoint index. Its output is one table whose fields you can map by matching against the plaintext deserialize routine: Insts, the register tables REG_A/REG_B/REG_C (all indexed by the virtual IP), function_prototypes, stk_size, and two constant tables. The raw constants table is still encrypted for the real VM; the decrypted_constants table holds them after runtime decryption - a concrete anchor for locating the decryption routine.

The __tostring anti-tamper trap

A subtle protection nearly blocks analysis: simply calling print crashes, regardless of arguments. The reason is that print internally calls tostring, and in Lua even strings have a metatable - so the VM hijacks __tostring (via debug.setmetatable("", ...) / getmetatable("").__tostring) to detect and punish inspection. Two workarounds combine: use io.write (which does no formatting, like C's printf) plus a custom recursive tostring for tables, and - the real fix - patch the deserializer so that during constant parsing any constant equal to "__tostring" is overwritten with an empty string, at the earliest point constants are plaintext. That defuses the trap before the deserialization VM can arm it.

One VM at a time: return as dispatcher

Logging every opcode shows ~20,000 deserialization instructions execute before the real VM, which itself runs ~5,000. A defining design choice appears in how closures are handled: what looks like OP_RETURN (return true, REG_C[VIP], 0) is actually a function dispatcher. Rather than nesting a child VM inside the parent (the IronBrew model, where both stay alive), the VM runs inside a pcall and returns metadata signalling "continue in this closure". The enclosing VM is stopped before the next is dispatched, so only one VM instance exists at any moment - significantly cutting memory use. A second, hidden path invokes closures through a metatable on the prototypes table. Understanding this stage is the groundwork for intercepting the real instruction stream rather than reversing the deserializer in full.

Code example

lua

-- The deserialized output table (offsets are per-sample "magic" numbers)
local Insts                 = EXECUTION_DATA[3]   -- instruction stream
local REG_A                 = EXECUTION_DATA[5]   -- A operands, indexed by VIP
local REG_B                 = EXECUTION_DATA[4]
local REG_C                 = EXECUTION_DATA[10]
local constants             = EXECUTION_DATA[9]   -- still ENCRYPTED for the real VM
local decrypted_constants   = EXECUTION_DATA[8]   -- plaintext after runtime decrypt
local function_prototypes   = EXECUTION_DATA[7]

-- Anti-tamper: print() crashes because string's __tostring is hijacked.
-- Earliest fix - blank the constant during deserialization, before it's used:
if data == "__tostring" then data = "" end

Related terms

What Is Lua Bytecode Virtualization?

Lua bytecode virtualization is an obfuscation technique that replaces Lua's standard virtual machine with a custom, secret one, so the compi…

What Is Dual-VM Lua Obfuscation?

Dual-VM Lua obfuscation runs your program through two stacked virtual machines - a deserialization VM that decodes an encrypted blob into an…

What Is Polymorphic (Self-Modifying) Bytecode?

Polymorphic bytecode is virtual-machine code that rewrites its own instructions at runtime before executing them, so the statically dumped i…

What Are Common Lua Obfuscation Techniques?

Lua obfuscation is the practice of rewriting a script so it still runs identically but actively resists reverse-engineering tools, ranging f…

How Does Deobfuscation Work?

Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators scra…

How Do You Devirtualize an Obfuscated JavaScript VM?

Devirtualization is the process of recovering a readable program from JavaScript that has been compiled into a tiny interpreter — a virtual …

What Is Dynamic IAT Resolution (Import Hashing)?

Dynamic IAT resolution (import hashing) is an anti-analysis technique where a binary hides which OS APIs it uses by resolving them at runtim…

Concept map

How Deserialization VM connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Reverse Engineering

Tools & solutions for this topic

Frequently asked questions

What does the deserialization VM actually do?

It is the outer stage that turns the encrypted data blob shipped with the script into the instruction stream the real VM runs. It normalizes and decompresses the blob, parses constants and function prototypes, and emits a structured table of instructions, register operands, and constants - some still encrypted for the inner VM to decrypt at runtime.

Why does calling print crash a virtualized Lua script?

Because print calls tostring, and the obfuscator hijacks the string type's __tostring metamethod as an anti-tamper trap. Use io.write plus a custom tostring instead, and patch the deserializer so any constant equal to "__tostring" is blanked out at parse time, before the trap can be set.

Does the obfuscator run both VMs at the same time?

Not in this design. What looks like a return opcode is a dispatcher that returns metadata telling the runtime to continue in the next closure, and the enclosing VM is stopped before the next one starts. So only a single VM instance is alive at any moment, which reduces memory use compared with the nested-VM (IronBrew) approach.

Last updated: 2026-01-27