Web Scraping APIs

What Is Stateful Web Scraping?

What Is Stateful Web Scraping? — conceptual illustration
On this page

Stateful web scraping preserves cookies, session tokens, browser fingerprint, and proxy IP across multiple requests so the target site sees a single coherent user across the session. Stateless scraping starts fresh each request — fine for public listing pages but broken for anything that requires login, multi-step navigation, or session-derived authorization tokens. Most real scraping projects need state for some flows.

Quick facts

Stateful needsSame cookie jar, same IP, same fingerprint, same JA3 across requests
Required forLogin flows, cart/checkout, multi-page forms, CSRF-gated pages
Session lifetimeMinutes to hours; longer than typical anti-bot session windows
StorageCookie jar + session ID returned by API + sticky proxy session
Anti-patternRotating IP/fingerprint mid-session — looks fake to the target

Why state matters

A real user has a coherent session — same browser, same IP, same cookies — from login to logout. Sites use that coherence as a trust signal: a session that suddenly switches IP or fingerprint mid-flow is a bot. Stateless scraping breaks any flow where the second request depends on the first: logged-in pages, cart flows, CSRF-protected forms, paginated views that carry cursor state in cookies.

How stateful APIs implement it

A stateful scraping API exposes a session ID. The first request creates a session — assigning a sticky IP, a consistent fingerprint, and an empty cookie jar. Subsequent requests with the same session ID reuse all of it. The session has a TTL (typically 10 minutes to a few hours); after that it expires and you start fresh. Some APIs let you persist a session indefinitely by paying a per-session retainer.

When you do NOT need state

Public listing pages, product detail pages, blog posts, and most SEO crawl targets are stateless — each request stands on its own. Stateless requests are cheaper (no session retainer, IP can rotate freely) and parallelize better. Use state only where the flow requires it; default to stateless for everything else.

Code example

python
import requests

sess = 'login-flow-user-1234'
login = requests.post('https://publisher.scrappey.com/api/v1', json={
    'cmd': 'request.post',
    'url': 'https://target.com/login',
    'postData': 'user=...&pass=...',
    'session_id': sess
}, headers={'Authorization': 'YOUR_API_KEY'})

profile = requests.post('https://publisher.scrappey.com/api/v1', json={
    'cmd': 'request.get',
    'url': 'https://target.com/account',
    'session_id': sess
}, headers={'Authorization': 'YOUR_API_KEY'})

Related terms

Concept map

How Stateful Web Scraping connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

How long should a session live?

Match the natural duration of the user flow you are emulating. Login + 10 pages is a 5-minute session. Long-running monitoring (account dashboard refresh) can keep a session alive for hours. Beyond a few hours the target side will often invalidate it anyway.

Can I share one session across many parallel requests?

Within reason — a real browser session can fire 5-10 parallel requests when loading a page, so a few concurrent calls per session is fine. Many parallel requests per session is unrealistic and gets the session flagged.

Do I need a different session per user?

Yes if you are scraping logged-in views or per-account data. One session = one identity. Mixing accounts in one session is a security and detection problem.

Last updated: 2026-05-26