01Content Quality Scoring for demoting boilerplate and low-value pages
02Hybrid Search combining BM25 keyword matching with vector semantic search
03600K+ indexed pages via its own PageRank-enabled web crawler
04Japanese NLP support with SudachiPy for high-quality Japanese search
050 GitHub stars
06Clean Content Extraction using Trafilatura to index only main content