015-10x faster parallel processing for rapid document parsing
02Y-coordinate based content ordering to preserve natural reading flow for AI models
03Production-ready reliability with 94%+ test coverage, strict TypeScript, and per-page error resilience
04Robust extraction of full text, base64-encoded images with metadata, and PDF metadata
05Flexible path support for absolute and relative file paths (Windows/Unix) and URL sources
06310 GitHub stars