01Multi-format support for PDF, DOCX, HTML, and Markdown extraction
02Advanced table extraction using PDFPlumber and AI-powered LlamaParse
03Automated document chunking and metadata extraction for RAG pipelines
04Support for multiple backends allowing for both local and cloud-based processing
050 GitHub stars
06Built-in OCR capabilities for scanned documents and complex layouts