01Converts HTML to clean Markdown using Mozilla Readability and Turndown GFM
02Link preservation for knowledge graphs and optional chunking for downstream processing
03Fast, token-efficient content extraction optimized for AI agents
04Configurable concurrent fetching and depth crawling for comprehensive data collection
05Smart caching with SHA-256 hashed URLs and polite crawling with robots.txt support
061 GitHub stars