Why code gets obfuscated
People obfuscate code to protect intellectual property, hide license checks, slow down tampering, and make anti-analysis scripts harder to study. The trade-off is always the same: obfuscated code is bigger and slower, and because it still has to run, every secret it contains is recoverable. This page focuses on JavaScript — where most client-side obfuscation lives — but the same ideas apply to any language.
The golden rule: semantics are preserved
The single most important idea in deobfuscation is that an obfuscator may only apply semantics-preserving transformations — changes that keep the behaviour identical. If f(2) returned "WordArray" in the original, it still returns "WordArray" after obfuscation.
That means you are never guessing. You can always recover the original behaviour by running the obfuscated pieces, because they are guaranteed to produce the same values. Most deobfuscation is just "run the parts that are constant, and simplify."
Layer 1 — String array encoding
The most common layer. Every string and name in the code is pulled out into one big array, and each place that used it is replaced by a call to a decoder function that fetches it back by index.
// After obfuscation
const _0x2f9f = ["u7d1", "WordArray", "update", "secret", /* ...hundreds more */];
function _0x3b01(i) { return _0x2f9f[i - 336]; }
const key = _0x3b01(337);
_0x3b01(429)(_0x3b01(412));Often the array is rotated when the script loads: a self-running function shifts the elements around until a checksum matches. This means reading the array in source order shows the wrong values:
(function (arr, target) {
while (true) {
const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2 /* ... */;
if (sum === target) break; // correct rotation found
arr.push(arr.shift()); // rotate by one and retry
}
})(_0x2f9f, 458958);How to reverse it: grab the array and the decoder function as text, run the rotation loop yourself (it always produces the same result), then replace every decoder call with the value it returns. This is constant folding — _0x3b01(429) always returns the same thing for a given number, so evaluate it and substitute the answer:
output = source.replace(/_0x3b01\((\d+)\)/g, (_, n) => JSON.stringify(decode(+n)));After this pass _0x3b01(429)(_0x3b01(412)) becomes "update"("secret") — already far more readable.
Layer 2 — Decoder aliasing
Obfuscators often copy the decoder into a local variable, so a naive find-and-replace looking for the original name misses it:
function handler() {
const d = _0x3b01; // alias
return d(512) + d(530); // won't match a /_0x3b01\(\d+\)/ regex
}Reverse it by scanning for assignments like x = _0x3b01, collecting those alias names, then resolving x(NNN) calls within their scope using the same decoder. Limit this to short, single-purpose names to avoid false matches.
Layer 3 — Member-access and literal disguising
With the strings restored, you can undo the cosmetic disguises:
- Bracket → dot notation:
obj["update"]becomesobj.update(skip reserved words and keys that aren't valid names). - Numeric obfuscation:
0x1a4,1e3,0b1010, and arithmetic like0x1 << 0x4are all constants — work them out to420,1000,10,16. - String concatenation:
"up" + "da" + "te"folds to"update".
output = output
.replace(/\["([a-zA-Z_$][\w$]*)"\]/g, '.$1') // bracket -> dot
.replace(/0x([0-9a-fA-F]+)/g, (_, h) => parseInt(h, 16)); // hex -> decimalLayer 4 — Control-flow flattening
This is the layer that most resists regex. Normal top-to-bottom code is rewritten into a while loop driven by a state variable and a switch, so the order things run no longer matches the order they appear in the file:
let state = 0;
while (true) {
switch (state) {
case 0: a = init(); state = 2; continue;
case 1: return a + b; // exit
case 2: b = step(a); state = 1; continue;
}
}The real flow is 0 → 2 → 1, but it is written 0, 1, 2. To undo it you build a small control-flow graph (a map of which block leads to which): each case is a block of code, and the state = N assignments are the arrows between them. Re-thread the blocks into execution order and the original linear code falls out. This is where an AST (Abstract Syntax Tree — a structured tree of the code) becomes essential, because plain text replacement cannot track state cleanly.
Layer 5 — Dead code and opaque predicates
Obfuscators add branches that look conditional but always go the same way ("opaque predicates"), plus unreachable junk to bulk up the file:
if ((function () { return !![]; })()) { realWork(); } else { garbage(); }!![] is always true, so the else branch can never run. Once you work out that the condition is constant, you can delete the dead branch entirely. Removing decoder definitions, rotation functions, and unused helpers shrinks the file dramatically on this final pass.
From regex to ASTs
Text-based replacement gets you surprisingly far on string arrays, but it is fragile: it cannot respect scope (which variable means what, and where), track variable values, or safely reorder code. Serious deobfuscation works on the Abstract Syntax Tree (AST) instead — a structured tree representing the code's grammar. The workflow with a toolchain like Babel is: parse the source into an AST, traverse and transform its nodes (fold constants, inline the decoder, evaluate fake conditions, rebuild control flow), then regenerate clean source. Babel's path.evaluate() even tells you when a piece of code is a fixed constant.
import * as parser from '@babel/parser';
import traverse from '@babel/traverse';
import generate from '@babel/generator';
const ast = parser.parse(source);
traverse(ast, {
CallExpression(path) {
const { confident, value } = path.evaluate();
if (confident) path.replaceWithSourceString(JSON.stringify(value));
},
});
const clean = generate(ast).code;Because the AST understands scope and structure, a visitor (a function that runs on each matching node) can do things regex never could — like "replace every call to this function with its constant return value, but only within the scope where it is bound."
The hardest layer — bytecode VMs
The strongest obfuscators do not just hide the code — they replace it with a custom virtual machine (a mini interpreter built into the script). The original logic is compiled down to a private bytecode (a stream of low-level instructions, often shipped as a base64 blob), and what you actually see is an interpreter that walks that bytecode step by step. There is no JavaScript left to tidy up.
; decoded bytecode, disassembled
LOAD_STRING r3, "update"
PROPACCESS r4 = r2[r3]
LOAD_STRING r5, "secret"
FUNC_CALL r6 = r4.call(r2, [r5])
JUMP_COND_NEG if(!r6) goto @1487Reversing a VM is a different discipline: (1) recover the bytecode blob the interpreter reads; (2) figure out what each opcode (instruction byte) means by reading the interpreter's dispatch loop — which byte is LOAD_STRING, FUNC_CALL, JUMP… and how the arguments are arranged; (3) write a two-pass disassembler — pass one finds all the jump targets, pass two prints labelled, human-readable instructions; (4) optionally translate the disassembly back into equivalent source. It is labour-intensive, but doable: the interpreter is the spec, and it is sitting right there in the file.
A practical order of operations
When you sit down with an obfuscated file, peel the layers from the outside in. Each pass makes the next one easier, because every layer you remove exposes more constants for the following pass to simplify:
- Beautify — run it through a formatter so you can see the structure.
- Decode strings — extract the array, run the rotation, fold every decoder call and its aliases.
- Simplify literals — bracket→dot, hex→decimal, concatenations.
- Restore control flow — un-flatten the switch/state loops via an AST.
- Prune — evaluate the fake conditions, delete dead branches and obfuscator scaffolding.
- Rename — give
_0x3b01-style names meaningful ones based on what they now obviously do. - If a VM remains — recover the bytecode, map the opcodes, disassemble.
Tooling cheat sheet
- Beautifiers: Prettier,
js-beautify— always step one. - AST toolkits: Babel (
@babel/parser/traverse/generator),acorn,esprima,recast(preserves formatting). - Purpose-built:
webcrack,synchrony, and the REStringer family handle common obfuscator output (notably obfuscator.io) out of the box. - Analysis: AST Explorer (astexplorer.net) for prototyping visitors; a debugger for stepping through a VM interpreter live.
