Extracts text and images from PDF files, with OCR support for scanned documents.
Provides a server for extracting text and images from PDF files. It supports extracting standard text, performing OCR on scanned documents, and extracting images as Base64 encoded data. Includes a built-in web debugger to simplify testing and integration of its capabilities. The server is built upon FastMCP.
Key Features
01Uses the MCP protocol for communication
02Extracts text from PDFs page by page
0316 GitHub stars
04Includes a web debugging interface for testing
05Extracts images from PDF pages as Base64 encoded data
06Performs OCR to recognize text in scanned PDFs
Use Cases
01Converting scanned documents to editable text via OCR
02Extracting images from PDF files for use in other applications