01Automated image extraction with structural analysis, including detection of shapes and text within diagrams.
02Supports multiple document formats including Word (.docx), PDF, Excel (.xlsx/.xls), RTF, and text files.
032 GitHub stars
04Extracts and validates embedded links (HTTP/HTTPS) from documents.
05Allows granular control over document reading, such as specifying page ranges for PDFs or worksheets for Excel files.
06Intelligent understanding of technical diagrams (e.g., flowcharts, architecture diagrams) using OpenCV.