Why code gets obfuscated
Obfuscation is used to protect intellectual property, hide license checks, slow down tampering, and make anti-analysis scripts harder to study. The trade-off is always the same: obfuscated code is bigger and slower, and because it still has to run, every secret it contains is recoverable. This page is framed around JavaScript — where most client-side obfuscation lives — but the same ideas apply to any language.
The golden rule: semantics are preserved
The single most important idea in deobfuscation is that an obfuscator may only apply semantics-preserving transformations. If f(2) returned "WordArray" in the original, it still returns "WordArray" after obfuscation.
That means you are never guessing. You can always recover the original behaviour by evaluating the obfuscated constructs, because they are guaranteed to produce the same values. Most deobfuscation is just "evaluate the parts that are constant, and simplify."
Layer 1 — String array encoding
The most common layer. All string and identifier literals are pulled out into one big array, and every use is replaced by a call to a decoder function.
// After obfuscation
const _0x2f9f = ["u7d1", "WordArray", "update", "secret", /* ...hundreds more */];
function _0x3b01(i) { return _0x2f9f[i - 336]; }
const key = _0x3b01(337);
_0x3b01(429)(_0x3b01(412));Often the array is rotated at load time by a self-invoking function that shifts elements until a checksum matches — defeating anyone who reads the array in source order:
(function (arr, target) {
while (true) {
const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2 /* ... */;
if (sum === target) break; // correct rotation found
arr.push(arr.shift()); // rotate by one and retry
}
})(_0x2f9f, 458958);How to reverse it: extract the array literal and decoder function as text, run the rotation loop yourself (it is deterministic), then replace every decoder call with its result. This is constant folding — _0x3b01(429) is a pure function of a literal, so evaluate and substitute:
output = source.replace(/_0x3b01\((\d+)\)/g, (_, n) => JSON.stringify(decode(+n)));After this pass _0x3b01(429)(_0x3b01(412)) becomes "update"("secret") — already far more readable.
Layer 2 — Decoder aliasing
Obfuscators frequently re-bind the decoder to a local variable so a naive find-and-replace misses it:
function handler() {
const d = _0x3b01; // alias
return d(512) + d(530); // won't match a /_0x3b01\(\d+\)/ regex
}Reverse it by scanning for assignment patterns like x = _0x3b01, collecting the alias names, then resolving x(NNN) calls within their scope using the same decoder. Restrict to short, single-purpose identifiers to avoid false positives.
Layer 3 — Member-access and literal disguising
With strings restored, you can undo the cosmetic disguises:
- Bracket → dot notation:
obj["update"]becomesobj.update(skip reserved words and non-identifier keys). - Numeric obfuscation:
0x1a4,1e3,0b1010, and arithmetic like0x1 << 0x4are constant expressions — evaluate them to420,1000,10,16. - String concatenation:
"up" + "da" + "te"folds to"update".
output = output
.replace(/\["([a-zA-Z_$][\w$]*)"\]/g, '.$1') // bracket -> dot
.replace(/0x([0-9a-fA-F]+)/g, (_, h) => parseInt(h, 16)); // hex -> decimalLayer 4 — Control-flow flattening
This is the layer that most resists regex. Straight-line code is rewritten into a while loop driven by a state variable and a switch, so the order of execution no longer matches the source order:
let state = 0;
while (true) {
switch (state) {
case 0: a = init(); state = 2; continue;
case 1: return a + b; // exit
case 2: b = step(a); state = 1; continue;
}
}The real flow is 0 → 2 → 1, but it is written 0, 1, 2. To undo it you build a small control-flow graph: each case is a basic block, and the state = N assignments are edges. Topologically re-thread the blocks in execution order and the original linear code falls out. This is where an AST becomes essential — text replacement cannot track state cleanly.
Layer 5 — Dead code and opaque predicates
Obfuscators inject branches that look conditional but always resolve the same way ("opaque predicates"), plus unreachable junk to pad the file:
if ((function () { return !![]; })()) { realWork(); } else { garbage(); }!![] is always true, so the else branch is dead. Once you evaluate the constant predicate you can delete the dead branch entirely. Removing decoder definitions, rotation IIFEs, and unreferenced helpers shrinks the file dramatically on the final pass.
From regex to ASTs
Text-based replacement gets you surprisingly far on string arrays, but it is fragile: it cannot respect scope, track variable values, or safely reorder code. Production deobfuscation works on the Abstract Syntax Tree instead. The workflow with a toolchain like Babel is: parse source into an AST, traverse and transform nodes (fold constants, inline the decoder, evaluate opaque predicates, rebuild control flow), then regenerate clean source. Babel's path.evaluate() even tells you when a node is statically constant.
import * as parser from '@babel/parser';
import traverse from '@babel/traverse';
import generate from '@babel/generator';
const ast = parser.parse(source);
traverse(ast, {
CallExpression(path) {
const { confident, value } = path.evaluate();
if (confident) path.replaceWithSourceString(JSON.stringify(value));
},
});
const clean = generate(ast).code;Because the AST understands scope and structure, a visitor can do things regex never could — like "replace every call to this function with its constant return value, but only within the scope where it is bound."
The hardest layer — bytecode VMs
The strongest obfuscators do not just hide the code — they replace it with a custom virtual machine. The original logic is compiled to a private bytecode (often a base64 blob), and the shipped script is an interpreter that walks that bytecode register by register. There is no JavaScript left to pretty-print.
; decoded bytecode, disassembled
LOAD_STRING r3, "update"
PROPACCESS r4 = r2[r3]
LOAD_STRING r5, "secret"
FUNC_CALL r6 = r4.call(r2, [r5])
JUMP_COND_NEG if(!r6) goto @1487Reversing a VM is a different discipline: (1) recover the bytecode blob the interpreter consumes; (2) map the opcodes by reading the interpreter's dispatch loop (which byte means LOAD_STRING, FUNC_CALL, JUMP…) and how operands are laid out; (3) write a two-pass disassembler — pass one collects jump targets, pass two emits labelled, human-readable instructions; (4) optionally lift the disassembly back into equivalent source. It is labour-intensive, but tractable: the interpreter is the spec, and it is right there in the file.
A practical order of operations
When you sit down with an obfuscated file, peel layers outermost-first. Each pass makes the next easier, because every layer you remove exposes more constants for the following pass to fold:
- Beautify — run it through a formatter so you can see structure.
- Decode strings — extract the array, run the rotation, fold every decoder call and its aliases.
- Simplify literals — bracket→dot, hex→decimal, concatenations.
- Restore control flow — un-flatten switch/state loops via an AST.
- Prune — evaluate opaque predicates, delete dead branches and obfuscator scaffolding.
- Rename — give
_0x3b01-style identifiers meaningful names based on what they now obviously do. - If a VM remains — recover bytecode, map opcodes, disassemble.
Tooling cheat sheet
- Beautifiers: Prettier,
js-beautify— always step one. - AST toolkits: Babel (
@babel/parser/traverse/generator),acorn,esprima,recast(preserves formatting). - Purpose-built:
webcrack,synchrony, and the REStringer family handle common obfuscator output (notably obfuscator.io) out of the box. - Analysis: AST Explorer (astexplorer.net) for prototyping visitors; a debugger for stepping a VM interpreter live.
