01Performs embedding-based semantic chunking for enhanced Retrieval Augmented Generation (RAG) performance.
020 GitHub stars
03Extracts images from PDFs as ImageContent for direct AI analysis.
04Supports over 36 document formats across 15+ categories including PDF, DOCX, HTML, and code.
05Reduces token usage by approximately 40% with Token-Optimized Object Notation (TOON) output format.
06Offers built-in MCP prompts for common document analysis tasks like summarization, entity extraction, and Q&A.