Reverse Engineering

How Deobfuscation Works: Reversing Obfuscated Code

How Deobfuscation Works: Reversing Obfuscated Code — conceptual illustration
On this page

Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators never change what a program does — they only scramble how it looks — so every transformation an obfuscator applies has an inverse. Deobfuscation is the disciplined application of those inverses until the original structure re-emerges. Because the code still has to run, every secret it contains is, in principle, recoverable: the engine must undo the obfuscation to execute it, and so can you.

Quick facts

GoalRecover readable, original-equivalent source from scrambled code
Key principleObfuscation is semantics-preserving, therefore reversible
Common layersString arrays, control-flow flattening, dead code, bytecode VMs
Primary techniqueConstant folding + AST rewriting
Core toolsBabel/acorn ASTs, beautifiers, webcrack, custom disassemblers

Why code gets obfuscated

Obfuscation is used to protect intellectual property, hide license checks, slow down tampering, and make anti-analysis scripts harder to study. The trade-off is always the same: obfuscated code is bigger and slower, and because it still has to run, every secret it contains is recoverable. This page is framed around JavaScript — where most client-side obfuscation lives — but the same ideas apply to any language.

The golden rule: semantics are preserved

The single most important idea in deobfuscation is that an obfuscator may only apply semantics-preserving transformations. If f(2) returned "WordArray" in the original, it still returns "WordArray" after obfuscation.

That means you are never guessing. You can always recover the original behaviour by evaluating the obfuscated constructs, because they are guaranteed to produce the same values. Most deobfuscation is just "evaluate the parts that are constant, and simplify."

Layer 1 — String array encoding

The most common layer. All string and identifier literals are pulled out into one big array, and every use is replaced by a call to a decoder function.

// After obfuscation
const _0x2f9f = ["u7d1", "WordArray", "update", "secret", /* ...hundreds more */];
function _0x3b01(i) { return _0x2f9f[i - 336]; }
const key = _0x3b01(337);
_0x3b01(429)(_0x3b01(412));

Often the array is rotated at load time by a self-invoking function that shifts elements until a checksum matches — defeating anyone who reads the array in source order:

(function (arr, target) {
  while (true) {
    const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2 /* ... */;
    if (sum === target) break;     // correct rotation found
    arr.push(arr.shift());         // rotate by one and retry
  }
})(_0x2f9f, 458958);

How to reverse it: extract the array literal and decoder function as text, run the rotation loop yourself (it is deterministic), then replace every decoder call with its result. This is constant folding — _0x3b01(429) is a pure function of a literal, so evaluate and substitute:

output = source.replace(/_0x3b01\((\d+)\)/g, (_, n) => JSON.stringify(decode(+n)));

After this pass _0x3b01(429)(_0x3b01(412)) becomes "update"("secret") — already far more readable.

Layer 2 — Decoder aliasing

Obfuscators frequently re-bind the decoder to a local variable so a naive find-and-replace misses it:

function handler() {
  const d = _0x3b01;        // alias
  return d(512) + d(530);   // won't match a /_0x3b01\(\d+\)/ regex
}

Reverse it by scanning for assignment patterns like x = _0x3b01, collecting the alias names, then resolving x(NNN) calls within their scope using the same decoder. Restrict to short, single-purpose identifiers to avoid false positives.

Layer 3 — Member-access and literal disguising

With strings restored, you can undo the cosmetic disguises:

  • Bracket → dot notation: obj["update"] becomes obj.update (skip reserved words and non-identifier keys).
  • Numeric obfuscation: 0x1a4, 1e3, 0b1010, and arithmetic like 0x1 << 0x4 are constant expressions — evaluate them to 420, 1000, 10, 16.
  • String concatenation: "up" + "da" + "te" folds to "update".
output = output
  .replace(/\["([a-zA-Z_$][\w$]*)"\]/g, '.$1')          // bracket -> dot
  .replace(/0x([0-9a-fA-F]+)/g, (_, h) => parseInt(h, 16)); // hex -> decimal

Layer 4 — Control-flow flattening

This is the layer that most resists regex. Straight-line code is rewritten into a while loop driven by a state variable and a switch, so the order of execution no longer matches the source order:

let state = 0;
while (true) {
  switch (state) {
    case 0: a = init();   state = 2; continue;
    case 1: return a + b;            // exit
    case 2: b = step(a);  state = 1; continue;
  }
}

The real flow is 0 → 2 → 1, but it is written 0, 1, 2. To undo it you build a small control-flow graph: each case is a basic block, and the state = N assignments are edges. Topologically re-thread the blocks in execution order and the original linear code falls out. This is where an AST becomes essential — text replacement cannot track state cleanly.

Layer 5 — Dead code and opaque predicates

Obfuscators inject branches that look conditional but always resolve the same way ("opaque predicates"), plus unreachable junk to pad the file:

if ((function () { return !![]; })()) { realWork(); } else { garbage(); }

!![] is always true, so the else branch is dead. Once you evaluate the constant predicate you can delete the dead branch entirely. Removing decoder definitions, rotation IIFEs, and unreferenced helpers shrinks the file dramatically on the final pass.

From regex to ASTs

Text-based replacement gets you surprisingly far on string arrays, but it is fragile: it cannot respect scope, track variable values, or safely reorder code. Production deobfuscation works on the Abstract Syntax Tree instead. The workflow with a toolchain like Babel is: parse source into an AST, traverse and transform nodes (fold constants, inline the decoder, evaluate opaque predicates, rebuild control flow), then regenerate clean source. Babel's path.evaluate() even tells you when a node is statically constant.

import * as parser from '@babel/parser';
import traverse from '@babel/traverse';
import generate from '@babel/generator';

const ast = parser.parse(source);
traverse(ast, {
  CallExpression(path) {
    const { confident, value } = path.evaluate();
    if (confident) path.replaceWithSourceString(JSON.stringify(value));
  },
});
const clean = generate(ast).code;

Because the AST understands scope and structure, a visitor can do things regex never could — like "replace every call to this function with its constant return value, but only within the scope where it is bound."

The hardest layer — bytecode VMs

The strongest obfuscators do not just hide the code — they replace it with a custom virtual machine. The original logic is compiled to a private bytecode (often a base64 blob), and the shipped script is an interpreter that walks that bytecode register by register. There is no JavaScript left to pretty-print.

; decoded bytecode, disassembled
LOAD_STRING  r3, "update"
PROPACCESS   r4 = r2[r3]
LOAD_STRING  r5, "secret"
FUNC_CALL    r6 = r4.call(r2, [r5])
JUMP_COND_NEG if(!r6) goto @1487

Reversing a VM is a different discipline: (1) recover the bytecode blob the interpreter consumes; (2) map the opcodes by reading the interpreter's dispatch loop (which byte means LOAD_STRING, FUNC_CALL, JUMP…) and how operands are laid out; (3) write a two-pass disassembler — pass one collects jump targets, pass two emits labelled, human-readable instructions; (4) optionally lift the disassembly back into equivalent source. It is labour-intensive, but tractable: the interpreter is the spec, and it is right there in the file.

A practical order of operations

When you sit down with an obfuscated file, peel layers outermost-first. Each pass makes the next easier, because every layer you remove exposes more constants for the following pass to fold:

  1. Beautify — run it through a formatter so you can see structure.
  2. Decode strings — extract the array, run the rotation, fold every decoder call and its aliases.
  3. Simplify literals — bracket→dot, hex→decimal, concatenations.
  4. Restore control flow — un-flatten switch/state loops via an AST.
  5. Prune — evaluate opaque predicates, delete dead branches and obfuscator scaffolding.
  6. Rename — give _0x3b01-style identifiers meaningful names based on what they now obviously do.
  7. If a VM remains — recover bytecode, map opcodes, disassemble.

Tooling cheat sheet

  • Beautifiers: Prettier, js-beautify — always step one.
  • AST toolkits: Babel (@babel/parser/traverse/generator), acorn, esprima, recast (preserves formatting).
  • Purpose-built: webcrack, synchrony, and the REStringer family handle common obfuscator output (notably obfuscator.io) out of the box.
  • Analysis: AST Explorer (astexplorer.net) for prototyping visitors; a debugger for stepping a VM interpreter live.

Code example

javascript
// Folding a rotated string-array decoder — the core of most deobfuscation.
// 1. Extract the literal array and the decoder's offset.
const stringArray = ["u7d1", "WordArray", "update", "secret" /* ... */];
const OFFSET = 336;
const decode = (i) => stringArray[i - OFFSET];

// 2. Replay the rotation loop until the checksum matches (deterministic).
function rotateUntilValid(target) {
  for (let i = 0; i < stringArray.length; i++) {
    const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2; // sample
    if (sum === target) return;
    stringArray.push(stringArray.shift());
  }
}

// 3. Constant-fold every decoder call back into the source text.
function foldDecoderCalls(source) {
  return source.replace(/_0x3b01\((\d+)\)/g, (m, n) => {
    const value = decode(Number(n));
    return value === undefined ? m : JSON.stringify(value);
  });
}

Related terms

Concept map

How How Does Deobfuscation Work connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Reverse Engineering
Building map…

Frequently asked questions

Is deobfuscation always possible?

In principle, yes. Obfuscation only applies semantics-preserving transformations, and the runtime must undo them to execute the code. Anything the engine can resolve, you can resolve too — it is a question of effort, not possibility.

Why use an AST instead of regular expressions?

Regex cannot track scope, evaluate expressions, or reorder code safely. An AST understands program structure, so it can inline a decoder only within the scope it is bound, fold constants reliably, and rebuild flattened control flow — things text replacement cannot do correctly.

What makes a bytecode VM harder than normal obfuscation?

There is no JavaScript left to pretty-print — the logic lives in a custom bytecode interpreted at runtime. You have to recover the bytecode, reverse-engineer the opcode set from the interpreter, and write a disassembler before you can read the logic at all.

What is the first step on a fresh obfuscated file?

Beautify it. A formatter alone reveals the overall structure — the string array, the decoder, the rotation IIFE, and whether a VM interpreter is present — which tells you which layers you are dealing with before you write any transforms.

Last updated: 2026-05-28