01Operate fully offline with local OCR and embedding models, offering optional cloud backends for enhanced OCR or large-scale embedding.
02Organize content using named databases, collections, and automatic directory synchronization.
030 GitHub stars
04Ingest a wide range of file types: PDFs (text & image), images, spreadsheets, presentations, HTML, text, and 30+ programming languages.
05Perform semantic search, understanding the meaning behind your queries to find relevant passages even without exact keyword matches.
06Serve as an MCP server, providing LLMs like Claude Desktop and Claude Code direct access to your indexed content for contextual understanding.