crawler.sh
crawler.sh is a fast web crawler that extracts Markdown content and SEO insights from entire domains in seconds.
About crawler.sh
crawler.sh delivers enterprise-grade web crawling with the speed and simplicity developers need. Whether you're auditing site structure, migrating content, or analyzing SEO metadata, this tool crawls entire domains in seconds while respecting server load through configurable concurrency, depth limits, and polite delays. You maintain complete control over performance tuning, so crawls are fast without being aggressive.
Content extraction is intelligent and practical. The tool isolates main article content on each page and converts it to clean Markdown automatically, stripping noise and preserving structure. Each result includes word count, author byline, and excerpt data—ready for downstream pipelines, content management systems, or analysis workflows without manual cleanup.
Flexibility in export formats means crawler.sh fits seamlessly into existing workflows. Results stream as NDJSON for real-time pipeline processing, export as JSON arrays for programmatic access, or generate W3C-compliant Sitemap XML files for SEO tooling. Markdown content archives let you preserve site snapshots in human-readable format.
Privacy and security are built in by design. All crawling and analysis happen locally on your machine—no cloud uploads, no third-party processing. This approach is essential for sensitive sites, pre-release environments, staging servers, or proprietary content that shouldn't leave your infrastructure. The local-first model also eliminates API rate limits and external dependencies.
Features
- High-speed site crawling: Crawls entire domains in seconds with configurable concurrency, depth limits, and polite delays so users can tune performance without hammering servers.
- Content extraction to Markdown: Automatically isolates main article content on each page and converts it to clean Markdown, with word count, author byline, and excerpt for consistent downstream use.
- Multiple export formats: Streams crawl results as NDJSON for pipelines, or exports as JSON arrays, Sitemap XML that follows W3C guidelines, and Markdown content archives.
- Local-first, privacy-friendly design: All crawling and analysis happen on the user’s own machine, which is appealing for sensitive sites, pre-release environments, or proprietary content.
Pros
Cons
crawler.sh Pricing Plans
CLI Tool
$99 per year
Desktop Pro
$99 per year