Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17skills for Claude, ChatGPT & Codex.

PDF Processing

Extract data, manipulate documents, and programmatically generate PDF files using specialized libraries and tools.

Google Places Search

Searches the Google Places API for business details, location reviews, and geographic coordinates directly from the command line.

Competitive Ads Extractor

7,895

Extracts and analyzes competitor advertising strategies across platforms to provide actionable messaging and creative insights.

Video Downloader

7,895

Downloads high-quality videos and audio from YouTube and other platforms for offline viewing, editing, or archival.

Comprehensive Research Agent

7,304

Ensures high-fidelity web research through structured source validation, error recovery protocols, and transparent reasoning cycles.

Web Research

7,213

Orchestrates a structured, multi-agent workflow to conduct deep-dive research, synthesize information from multiple sources, and generate comprehensive reports.

Firecrawl Web Scraper

3,382

Scrapes web content and extracts structured data from any URL or search query using the Firecrawl MCP.

URL to Markdown Converter

3,229

Converts any live webpage into clean, structured Markdown format using Chrome CDP for full JavaScript rendering.

X to Markdown Converter

2,275

Converts X (Twitter) tweets, threads, and articles into clean Markdown files with YAML front matter.

Competitive Ads Extractor

2,257

Extracts and analyzes competitor advertisements from major ad libraries to provide actionable insights for messaging and creative strategy.

YouTube Video Downloader

2,257

Downloads YouTube videos and audio files with customizable quality settings and format options directly within Claude Code.

PubMed Database Explorer

2,188

Automates biomedical literature searches and programmatic data extraction from the PubMed database using E-utilities and advanced MeSH queries.

USPTO Patent & Trademark Database

2,188

Accesses USPTO APIs to perform comprehensive patent and trademark searches, retrieve examination histories, and analyze intellectual property data.

ClinicalTrials.gov Database Skill

2,188

Accesses the ClinicalTrials.gov API v2 to search, filter, and export clinical study data for medical research and patient matching.

Perplexity Search

2,066

Performs AI-powered web searches with real-time information and source citations to access data beyond the model's knowledge cutoff.

Reddit Data Retriever

1,439

Fetches Reddit content and research data using the Gemini CLI to bypass web access restrictions and 403 errors.

FireCrawl SDK Patterns

1,206

Implements production-ready design patterns and best practices for FireCrawl SDK integrations in TypeScript and Python.

YouTube Transcript Extractor

1,158

Extracts subtitles and transcripts from YouTube videos into local text files using CLI tools or browser automation.

Deep Research Workflow

1,158

Orchestrates multi-agent parallel workflows to perform comprehensive web research, competitive analysis, and data synthesis into structured reports.

FireCrawl Architecture Architect

1,056

Provides validated architectural blueprints for scaling FireCrawl integrations from MVPs to enterprise-grade microservices.

FireCrawl Reference Architecture

1,031

Implements production-ready project structures and architectural patterns for robust FireCrawl-based web scraping applications.

FireCrawl Cost Tuning

983

Optimizes FireCrawl operational costs through intelligent tier selection, usage monitoring, and budget-aware implementation strategies.

FireCrawl Core Workflow B

983

Executes secondary FireCrawl workflows to complement primary data collection and automated web scraping tasks.

FireCrawl Reliability Patterns

983

Implements robust reliability patterns like circuit breakers, idempotency, and graceful degradation for production-grade FireCrawl integrations.

FireCrawl Performance Tuning

983

Optimizes FireCrawl API performance using advanced caching, request batching, and connection pooling strategies.

FireCrawl Advanced Troubleshooting

983

Resolves complex FireCrawl errors using systematic evidence collection and deep-layer diagnostic techniques.

FireCrawl Installation & Authentication

983

Automates the installation and configuration of FireCrawl SDKs and API authentication for web scraping projects.

Exa Core Secondary Workflow

983

Executes optimized secondary search and data retrieval tasks using the Exa API to complement primary research workflows.

Exa Core Workflow A

983

Executes the primary integration workflow for the Exa search engine to implement core search and data retrieval features.

FireCrawl Core Workflow A

983

Automates the primary web crawling and data extraction process using the FireCrawl API to generate LLM-ready content.

30 results loaded • More available

Scroll for more results...