010 GitHub stars
02Rich metadata schema implementation for enhanced search filtering
03Web crawling with HTML noise reduction and navigation filtering
04Context-preserving chunking strategies to minimize data loss during ingestion
05Structure-aware PDF chunking with automated table extraction
06Topic-aware paragraph splitting for research notes and internal documentation