Discover every page on modern websites using powerful regex extraction and intelligent crawling. Map entire domains with automatic link discovery and domain filtering.
Free demo trial • No credit card required • Setup in <2 minutes
Finding all URLs on a website is essential for web scraping, SEO audits, site migrations, and competitive analysis. Whether you're mapping a competitor's site structure, preparing for a redesign, or conducting a comprehensive SEO audit, having a complete URL inventory is crucial.
Our tool uses powerful regex patterns to extract all links from each page, then automatically crawls discovered URLs from the same domain. With concurrent processing and automatic web access handling, you can map entire websites efficiently.
Perfect for SEO professionals, web developers, and data analysts who need comprehensive site mapping without manual work.
{
"cmd": "request.get",
"url": "https://example.com",
"requestType": "request",
"regex": "(?:href|src)=\"([^\"]+)\"|(?:href|src)='([^']+)'",
"filter": ["regex"]
}
// Response:
{
"solution": {
"regex": [
"/page1",
"/page2",
"https://example.com/page3"
]
}
}Enter a starting URL and discover all pages automatically
Choose the method that works best for your needs
Use site: operator to find indexed pages. Quick but may miss unindexed pages.
Parse XML sitemaps and robots.txt files to discover all declared URLs.
Use tools like ScreamingFrog or XML-Sitemaps.com for visual crawling.
Build your own crawler with Python, JavaScript, or other languages for full control.
Register for a free account to access all tools with unlimited usage and advanced features.
Fast, reliable, and optimized for comprehensive site mapping
Concurrent crawling with up to 5 simultaneous requests. Map large sites in minutes, not hours.
Intelligently extracts all links from each page and follows same-domain URLs automatically.
Automatic handling of CDN protection, bot management, and other web access challenges. Designed for high reliability.
Automatically filters links to only crawl URLs from the same domain, preventing external crawls.
Real-time progress updates showing discovered URLs, crawled pages, and current status.
Uses advanced regex patterns to extract all href and src attributes from HTML content.
Extract URLs with just a few lines of code
import requests
from urllib.parse import urljoin, urlparse
import re
API_KEY = "YOUR_API_KEY"
API_URL = "https://publisher.scrappey.com/api/v1"
def extract_domain(url):
"""Extract domain from URL"""
return urlparse(url).netloc
def normalize_url(url, base_url):
"""Convert relative URLs to absolute"""
if url.startswith('http'):
return url
return urljoin(base_url, url)
def find_urls_on_page(page_url, domain):
"""Use Scrappey to find all URLs on a page"""
payload = {
"cmd": "request.get",
"url": page_url,
"requestType": "request",
"regex": "(?:href|src)="([^"]+)"|(?:href|src)='([^']+)'",
"filter": ["regex"]
}
response = requests.post(f"{API_URL}?key={API_KEY}", json=payload)
data = response.json()
if data.get('solution', {}).get('regex'):
urls = data['solution']['regex']
# Filter to same domain
same_domain = []
for url in urls:
normalized = normalize_url(url, page_url)
if extract_domain(normalized) == domain:
same_domain.append(normalized)
return same_domain
return []
# Start crawling
start_url = "https://example.com"
domain = extract_domain(start_url)
visited = set()
queue = [start_url]
while queue and len(visited) < 200:
current_url = queue.pop(0)
if current_url in visited:
continue
visited.add(current_url)
print(f"Crawling: {current_url}")
new_urls = find_urls_on_page(current_url, domain)
for url in new_urls:
if url not in visited and url not in queue:
queue.append(url)
print(f"Found {len(new_urls)} URLs, Total: {len(visited)}")Map entire websites to identify orphan pages, broken links, and site structure issues for comprehensive SEO analysis.
Discover all pages on competitor websites to understand their content strategy and site architecture.
Create complete URL inventories before website redesigns or platform migrations to ensure nothing is missed.
Find all content pages, blog posts, and resources on a website for content analysis and research.
Our URL finder tool helps you crawl website for all URLs efficiently. Whether you need to get URLs for SEO analysis, site migration, or competitive research, this URL extractor makes it simple.
Getting URLs from a website is now easier than ever. Simply enter a starting URL and our tool will automatically discover and find all links to website pages. You can copy all URLs with a single click or download them as a CSV file.
This crawl list feature allows you to find all webpages on a site by following links automatically. Our intelligent list crawl system filters URLs to focus on the same domain, helping you how to find website links that matter most.
Perfect for developers, SEO professionals, and data analysts who need to find all webpages on a site quickly. The tool handles modern website complexity automatically, so you can crawl website for all urls without worrying about CAPTCHAs or JavaScript rendering.
Automate workflows visually. Streamline data collection processes.
Pre-built template for modern websites. Simplifies Scrappey integration.
Access via API marketplace. Easy integration with comprehensive docs.
Scalable actor-based automation. Reliable browser rendering.
AI-powered browser automation. Intelligent session management.
Scrape from your terminal. One command, pipeable output, CI-ready.
Portable skill for Claude Code + Codex. Browser-backed data access on demand.
LangChain connector — clean web data for any chain or agent.
LlamaIndex reader — load modern web pages straight into RAG.
Connect with 7,000+ apps. Automate workflows easily.
Visual workflow automation. Connect with 1,000+ apps easily.
Try It For Free. No Subscription Required. No Credit Card Required. Instant Set-Up. Your Free Trial Is Waiting For You!
Scrappey.com is a web scraping API that handles all the complex aspects of web scraping, such as handling dynamic content, rotating proxies, advanced request handling, headless browsers, and verification processing. It offers an all-in-one solution for extracting publicly available data from websites.
Scrappey.com provides a web scraping API that allows you to send requests to extract publicly available data from websites. It handles dynamic content and modern website complexity, including rotating proxies, advanced request handling, and verification processing. You can easily extract publicly available data from websites using their built-in features like headless browsers and AI-powered data extraction.
Yes, with Scrappey.com, you have the option to use Sticky Rotating Proxies for seamless scraping. Alternatively, you can also set your own proxies if desired.
Yes, Scrappey.com offers a free trial where you can try it out without a subscription or credit card. Instant setup is provided, so you can explore the full capabilities of the platform right away.
We only charge for successful requests. Failed requests are not counted towards your usage, so you only pay for what works.
No problem, you can pass any JavaScript snippet that needs to be executed by using our JavaScript scenario parameter. This allows you to interact with dynamic content, scroll pages, click buttons, wait for elements, and perform any custom JavaScript actions before extracting the data.
Scrappey.com offers simple and transparent pricing: €0.20 per 1,000 direct HTTP requests and €1.00 per 1,000 full-browser requests. Residential proxies are included on both tiers — no separate proxy billing, no hidden fees, no complicated pricing tiers. You only pay for successful requests.
Scrappey.com provides scalable access for extracting publicly available data. Whether you need to extract data from a few pages or a large dataset of publicly accessible content, you can do so with flexible usage options. Please note that Scrappey.com only supports scraping publicly available data, and users must comply with applicable laws and website terms of service.
Scrappey.com provides various support channels for assistance. You can refer to their documentation, frequently asked questions section, blog, and uptime status page. Additionally, you can get in touch with them via email or join their Discord community for further support.
We don't create custom scraping scripts, however we will gladly write some code snippets helping you to use our most powerful features: AI-powered data extraction and JavaScript scenario. Our documentation includes examples in multiple programming languages to get you started quickly.
Each API call to Scrappey counts as one request. Our pricing is based on successful requests. By default, JavaScript rendering is enabled, which allows you to extract data from modern websites with dynamic content. All features including proxies, challenge handling, and reliable web access handling are included in each request.
Scrappey's API is optimized for fast response time, even when working with JavaScript-heavy websites and browser verification flows, where access is authorized. If other tools struggle with sites that use browser verification, Scrappey is designed to handle these workflows efficiently, ensuring reliable data retrieval. Our reliable web access handling, residential proxies, and intelligent retry logic work together to maximize success rates.