Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.

Google Search Research

Enables real-time web research and fact-checking using Google Search grounding within the Claude Code environment.

Blog Profile Analyzer

Analyzes blogs and online publications to extract deep insights into author perspectives, political leanings, and hidden biases.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline viewing, archiving, and editing.

Web Scraping & Content Extraction Tools

Automates documentation collection and structured data extraction using Playwright, BeautifulSoup, and Scrapy templates.

Crawl4AI Web Scraper

Converts complex web pages into clean, LLM-friendly markdown for seamless data extraction and processing.

Rip Video

Extracts audio, subtitles, and cover images from MP4 video files using MCP services and ffmpeg.

Video Metadata Parser & Downloader

Automates video metadata extraction and media downloading by processing structured task lists through MCP services.

Document Processor

Processes, analyzes, and transforms various file formats into structured data or new document types using a standardized CLI.

Tavily Web Search

Conducts real-time, AI-optimized web searches and content extraction to provide up-to-date information beyond Claude's knowledge cutoff.

Iterative Fact Verification

Ensures rigorous factual accuracy through systematic, multi-pass evidence validation and source tiering.

Web Crawler & Search Indexer

Crawls entire websites and builds searchable full-text indexes of content converted into Markdown format.

USPTO Patent and Trademark Database

Accesses USPTO APIs to perform comprehensive patent and trademark searches, analyze prosecution history, and track intellectual property assignments.

YouTube & Media Downloader

Downloads videos, audio, and subtitles from YouTube and other online platforms using yt-dlp.

Actress Classifier Development Guide

Streamlines the development of Python-based video classification systems with optimized scraping and incremental database management.

AI Tech Digest

Curates specialized AI technology news and technical insights using targeted search strategies and quality filtering rules.

Industry Research

Conducts comprehensive market analysis and trend forecasting across the consumer, technology, healthcare, and finance sectors.

Zhipu AI Search

Performs intelligent web searches via the Zhipu search engine with automated relative date resolution.

GitHub Harvester

Extracts and processes comprehensive data from GitHub repositories for ingestion into RAG pipelines and LLM knowledge bases.

Plan-Driven Development Workflow

Executes a structured, plan-driven implementation workflow that prioritizes context discovery and systematic validation for the kurly-crawler project.

X/Twitter Content Scraper

Extracts and analyzes posts, threads, profiles, and media from X (formerly Twitter) directly within your Claude workflow.

Exa Semantic Search

Powers Claude Code with semantic search, similar content discovery, and structured research capabilities via the Exa API.

Documentation Loader

Automates the retrieval and conversion of online framework documentation into local Markdown files for enhanced AI context.

Web Search (DuckDuckGo)

Enables instant web search capabilities using DuckDuckGo to retrieve real-time documentation, news, and technical resources without API keys.

DuckDuckGo Web Search

Integrates real-time web search capabilities using the DuckDuckGo engine to find documentation, news, and technical resources without API keys.

Article Extractor

Extracts clean, readable text from web articles and blog posts by removing ads, navigation, and clutter.

Wget URL Reader

Fetches and downloads content from any URL using the powerful wget command-line utility.

Threads Scraper

Scrapes and extracts post data from Threads profiles using automated browser navigation and authentication.

Brave Search Integration

Performs headless web searches and extracts readable markdown content using the Brave Search API without requiring a browser.

Blockchain Data Pipeline Validation

Validates blockchain data collection pipelines using a systematic 5-step empirical workflow to ensure data integrity and storage efficiency.

Hot List Data Enrichment

Automates company data enrichment for investment dashboards by fetching employee counts, job postings, and news mentions.

30 results loaded • More available

Scroll for more results...