How Deobfuscation Works: Reversing Obfuscated Code

Q: Is deobfuscation always possible?

In principle, yes. Obfuscation only applies semantics-preserving transformations — changes that keep the behaviour identical — and the runtime must undo them to execute the code. Anything the engine can resolve, you can resolve too. It is a question of effort, not possibility.

Q: Why use an AST instead of regular expressions?

Regex cannot track scope, evaluate expressions, or reorder code safely. An AST understands the program's structure, so it can inline a decoder only within the scope where it is defined, work out constants reliably, and rebuild scrambled control flow — things plain text replacement cannot do correctly.

Q: What makes a bytecode VM harder than normal obfuscation?

There is no JavaScript left to tidy up — the logic lives in a custom bytecode that a built-in interpreter runs. You have to recover that bytecode, reverse-engineer what each instruction means from the interpreter, and write a disassembler before you can even read the logic.

Q: What is the first step on a fresh obfuscated file?

Beautify it. A formatter alone reveals the overall structure — the string array, the decoder, the rotation function, and whether a VM interpreter is present — which tells you which layers you are facing before you write any transforms.

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How Deobfuscation Works: Reversing Obfuscated Code — conceptual illustration

On this page

Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators scramble how code looks, but never change what it does — so every scrambling step they apply can be undone. Deobfuscation just means applying those reversals, one by one, until the original structure reappears. And because the code still has to run, every secret inside it can be recovered: the engine must un-scramble it to execute it, so you can too.

Goal	Recover readable, original-equivalent source from scrambled code
Key principle	Obfuscation is semantics-preserving, therefore reversible
Common layers	String arrays, control-flow flattening, dead code, bytecode VMs
Primary technique	Constant folding + AST rewriting
Core tools	Babel/acorn ASTs, beautifiers, webcrack, custom disassemblers

Why code gets obfuscated

People obfuscate code to protect intellectual property, hide license checks, slow down tampering, and make anti-analysis scripts harder to study. The trade-off is always the same: obfuscated code is bigger and slower, and because it still has to run, every secret it contains is recoverable. This page focuses on JavaScript — where most client-side obfuscation lives — but the same ideas apply to any language.

The golden rule: semantics are preserved

The single most important idea in deobfuscation is that an obfuscator may only apply semantics-preserving transformations — changes that keep the behaviour identical. If f(2) returned "WordArray" in the original, it still returns "WordArray" after obfuscation.

That means you are never guessing. You can always recover the original behaviour by running the obfuscated pieces, because they are guaranteed to produce the same values. Most deobfuscation is just "run the parts that are constant, and simplify."

Layer 1 — String array encoding

The most common layer. Every string and name in the code is pulled out into one big array, and each place that used it is replaced by a call to a decoder function that fetches it back by index.

// After obfuscation
const _0x2f9f = ["u7d1", "WordArray", "update", "secret", /* ...hundreds more */];
function _0x3b01(i) { return _0x2f9f[i - 336]; }
const key = _0x3b01(337);
_0x3b01(429)(_0x3b01(412));

Often the array is rotated when the script loads: a self-running function shifts the elements around until a checksum matches. This means reading the array in source order shows the wrong values:

(function (arr, target) {
  while (true) {
    const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2 /* ... */;
    if (sum === target) break;     // correct rotation found
    arr.push(arr.shift());         // rotate by one and retry
  }
})(_0x2f9f, 458958);

How to reverse it: grab the array and the decoder function as text, run the rotation loop yourself (it always produces the same result), then replace every decoder call with the value it returns. This is constant folding — _0x3b01(429) always returns the same thing for a given number, so evaluate it and substitute the answer:

output = source.replace(/_0x3b01\((\d+)\)/g, (_, n) => JSON.stringify(decode(+n)));

After this pass _0x3b01(429)(_0x3b01(412)) becomes "update"("secret") — already far more readable.

Layer 2 — Decoder aliasing

Obfuscators often copy the decoder into a local variable, so a naive find-and-replace looking for the original name misses it:

function handler() {
  const d = _0x3b01;        // alias
  return d(512) + d(530);   // won't match a /_0x3b01\(\d+\)/ regex
}

Reverse it by scanning for assignments like x = _0x3b01, collecting those alias names, then resolving x(NNN) calls within their scope using the same decoder. Limit this to short, single-purpose names to avoid false matches.

Layer 3 — Member-access and literal disguising

With the strings restored, you can undo the cosmetic disguises:

Bracket → dot notation: obj["update"] becomes obj.update (skip reserved words and keys that aren't valid names).
Numeric obfuscation: 0x1a4, 1e3, 0b1010, and arithmetic like 0x1 << 0x4 are all constants — work them out to 420, 1000, 10, 16.
String concatenation: "up" + "da" + "te" folds to "update".

output = output
  .replace(/\["([a-zA-Z_$][\w$]*)"\]/g, '.$1')          // bracket -> dot
  .replace(/0x([0-9a-fA-F]+)/g, (_, h) => parseInt(h, 16)); // hex -> decimal

Layer 4 — Control-flow flattening

This is the layer that most resists regex. Normal top-to-bottom code is rewritten into a while loop driven by a state variable and a switch, so the order things run no longer matches the order they appear in the file:

let state = 0;
while (true) {
  switch (state) {
    case 0: a = init();   state = 2; continue;
    case 1: return a + b;            // exit
    case 2: b = step(a);  state = 1; continue;
  }
}

The real flow is 0 → 2 → 1, but it is written 0, 1, 2. To undo it you build a small control-flow graph (a map of which block leads to which): each case is a block of code, and the state = N assignments are the arrows between them. Re-thread the blocks into execution order and the original linear code falls out. This is where an AST (Abstract Syntax Tree — a structured tree of the code) becomes essential, because plain text replacement cannot track state cleanly.

Layer 5 — Dead code and opaque predicates

Obfuscators add branches that look conditional but always go the same way ("opaque predicates"), plus unreachable junk to bulk up the file:

if ((function () { return !![]; })()) { realWork(); } else { garbage(); }

!![] is always true, so the else branch can never run. Once you work out that the condition is constant, you can delete the dead branch entirely. Removing decoder definitions, rotation functions, and unused helpers shrinks the file dramatically on this final pass.

From regex to ASTs

Text-based replacement gets you surprisingly far on string arrays, but it is fragile: it cannot respect scope (which variable means what, and where), track variable values, or safely reorder code. Serious deobfuscation works on the Abstract Syntax Tree (AST) instead — a structured tree representing the code's grammar. The workflow with a toolchain like Babel is: parse the source into an AST, traverse and transform its nodes (fold constants, inline the decoder, evaluate fake conditions, rebuild control flow), then regenerate clean source. Babel's path.evaluate() even tells you when a piece of code is a fixed constant.

import * as parser from '@babel/parser';
import traverse from '@babel/traverse';
import generate from '@babel/generator';

const ast = parser.parse(source);
traverse(ast, {
  CallExpression(path) {
    const { confident, value } = path.evaluate();
    if (confident) path.replaceWithSourceString(JSON.stringify(value));
  },
});
const clean = generate(ast).code;

Because the AST understands scope and structure, a visitor (a function that runs on each matching node) can do things regex never could — like "replace every call to this function with its constant return value, but only within the scope where it is bound."

The hardest layer — bytecode VMs

The strongest obfuscators do not just hide the code — they replace it with a custom virtual machine (a mini interpreter built into the script). The original logic is compiled down to a private bytecode (a stream of low-level instructions, often shipped as a base64 blob), and what you actually see is an interpreter that walks that bytecode step by step. There is no JavaScript left to tidy up.

; decoded bytecode, disassembled
LOAD_STRING  r3, "update"
PROPACCESS   r4 = r2[r3]
LOAD_STRING  r5, "secret"
FUNC_CALL    r6 = r4.call(r2, [r5])
JUMP_COND_NEG if(!r6) goto @1487

Reversing a VM is a different discipline: (1) recover the bytecode blob the interpreter reads; (2) figure out what each opcode (instruction byte) means by reading the interpreter's dispatch loop — which byte is LOAD_STRING, FUNC_CALL, JUMP… and how the arguments are arranged; (3) write a two-pass disassembler — pass one finds all the jump targets, pass two prints labelled, human-readable instructions; (4) optionally translate the disassembly back into equivalent source. It is labour-intensive, but doable: the interpreter is the spec, and it is sitting right there in the file.

A practical order of operations

When you sit down with an obfuscated file, peel the layers from the outside in. Each pass makes the next one easier, because every layer you remove exposes more constants for the following pass to simplify:

Beautify — run it through a formatter so you can see the structure.
Decode strings — extract the array, run the rotation, fold every decoder call and its aliases.
Simplify literals — bracket→dot, hex→decimal, concatenations.
Restore control flow — un-flatten the switch/state loops via an AST.
Prune — evaluate the fake conditions, delete dead branches and obfuscator scaffolding.
Rename — give _0x3b01-style names meaningful ones based on what they now obviously do.
If a VM remains — recover the bytecode, map the opcodes, disassemble.

Tooling cheat sheet

Beautifiers: Prettier, js-beautify — always step one.
AST toolkits: Babel (@babel/parser/traverse/generator), acorn, esprima, recast (preserves formatting).
Purpose-built: webcrack, synchrony, and the REStringer family handle common obfuscator output (notably obfuscator.io) out of the box.
Analysis: AST Explorer (astexplorer.net) for prototyping visitors; a debugger for stepping through a VM interpreter live.

Code example

javascript

// Folding a rotated string-array decoder — the core of most deobfuscation.
// 1. Extract the literal array and the decoder's offset.
const stringArray = ["u7d1", "WordArray", "update", "secret" /* ... */];
const OFFSET = 336;
const decode = (i) => stringArray[i - OFFSET];

// 2. Replay the rotation loop until the checksum matches (deterministic).
function rotateUntilValid(target) {
  for (let i = 0; i < stringArray.length; i++) {
    const sum = parseInt(decode(347)) / 1 + parseInt(decode(541)) / 2; // sample
    if (sum === target) return;
    stringArray.push(stringArray.shift());
  }
}

// 3. Constant-fold every decoder call back into the source text.
function foldDecoderCalls(source) {
  return source.replace(/_0x3b01\((\d+)\)/g, (m, n) => {
    const value = decode(Number(n));
    return value === undefined ? m : JSON.stringify(value);
  });
}

Related terms

What Is Anti-Bot Detection?

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…

How Browser Fingerprinting Works

Browser fingerprinting is how a site combines signals — canvas, WebGL, audio, fonts, navigator probes, TLS (the encryption layer behind http…

What Is TLS Fingerprinting (JA3/JA4)?

TLS fingerprinting is a way to recognize what software made a connection just by looking at how it sets up encryption — before the server re…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

How Do You Devirtualize an Obfuscated JavaScript VM?

Devirtualization is the process of recovering a readable program from JavaScript that has been compiled into a tiny interpreter — a virtual …

What Is Lua Bytecode Virtualization?

Lua bytecode virtualization is an obfuscation technique that replaces Lua's standard virtual machine with a custom, secret one, so the compi…

What Are Common Lua Obfuscation Techniques?

Lua obfuscation is the practice of rewriting a script so it still runs identically but actively resists reverse-engineering tools, ranging f…

What Is Dynamic IAT Resolution (Import Hashing)?

Dynamic IAT resolution (import hashing) is an anti-analysis technique where a binary hides which OS APIs it uses by resolving them at runtim…

How Do You Instrument a Browser to Study Anti-Bot Scripts?

Instrumenting a browser means adding observation points so you can watch exactly which APIs a page calls -- which is how researchers study f…

Concept map

How How Does Deobfuscation Work connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Reverse Engineering

Tools & solutions for this topic

Frequently asked questions

Is deobfuscation always possible?