01Supports PDF, DOCX, Markdown, and source code extraction
020 GitHub stars
03Generates structured JSON and JSONL datasets for model fine-tuning
04Identifies and flags ambiguous or low-quality training examples
05Automatically categorizes feedback into structural, substantive, and stylistic types
06Extracts source-feedback-revision-context patterns from multiple documents