This Model Context Protocol (MCP) server leverages Playwright to intelligently extract the main content from any web page and convert it into clean, rich markdown. It excels at filtering out distracting elements like navigation, headers, footers, and advertisements, ensuring you get only the core information. Capable of handling dynamic and JavaScript-heavy sites, it preserves essential formatting, including headings, bold text, code blocks, lists, and tables, making it an invaluable tool for content collection and documentation across various AI clients.
Key Features
01Smartly identifies and extracts main content from web pages
02Generates clean markdown output by filtering non-content elements
03Preserves rich markdown formatting including headings, code blocks, lists, and tables
04Optionally includes image references and hyperlinks in the output
05Handles dynamic and JavaScript-heavy websites using Playwright
061 GitHub stars
Use Cases
01Extracting clean markdown content from documentation sites for AI models and research.
02Converting web articles, blogs, or technical content into structured markdown for easy consumption.
03Archiving website content by saving extracted markdown to local files for offline access or content management.