Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.

Deep Research Workflow

Orchestrates multi-agent parallel workflows to perform comprehensive web research, competitive analysis, and data synthesis into structured reports.

1,158

YouTube Transcript Extractor

Extracts subtitles and transcripts from YouTube videos into local text files using CLI tools or browser automation.

1,158

FireCrawl Architecture Architect

Provides validated architectural blueprints for scaling FireCrawl integrations from MVPs to enterprise-grade microservices.

1,056

FireCrawl Reference Architecture

Implements production-ready project structures and architectural patterns for robust FireCrawl-based web scraping applications.

1,031

FireCrawl Advanced Troubleshooting

Resolves complex FireCrawl errors using systematic evidence collection and deep-layer diagnostic techniques.

983

FireCrawl Cost Tuning

Optimizes FireCrawl operational costs through intelligent tier selection, usage monitoring, and budget-aware implementation strategies.

983

Exa Core Secondary Workflow

Executes optimized secondary search and data retrieval tasks using the Exa API to complement primary research workflows.

983

Exa Core Workflow A

Executes the primary integration workflow for the Exa search engine to implement core search and data retrieval features.

983

FireCrawl Performance Tuning

Optimizes FireCrawl API performance using advanced caching, request batching, and connection pooling strategies.

983

FireCrawl Installation & Authentication

Automates the installation and configuration of FireCrawl SDKs and API authentication for web scraping projects.

983

FireCrawl Reliability Patterns

Implements robust reliability patterns like circuit breakers, idempotency, and graceful degradation for production-grade FireCrawl integrations.

983

FireCrawl Core Workflow A

Automates the primary web crawling and data extraction process using the FireCrawl API to generate LLM-ready content.

983

FireCrawl Core Workflow B

Executes secondary FireCrawl workflows to complement primary data collection and automated web scraping tasks.

983

FireCrawl Rate Limit Handler

Implements robust rate limiting, exponential backoff, and idempotency patterns for FireCrawl API integrations.

982

Crypto News Aggregator

Aggregates real-time cryptocurrency news from over 50 authoritative sources with advanced filtering and relevance scoring.

982

YouTube Transcript Extractor

Extracts and saves YouTube video subtitles or transcripts to local text files using command-line tools or automated browser interaction.

925

Z.AI Multimodal CLI

Integrates vision analysis, real-time web search, and GitHub exploration capabilities into Claude Code workflows.

896

ZAI CLI Integration

Enhances Claude with real-time web search, vision-based image analysis, and advanced GitHub repository exploration.

825

Vault Protocol Logo Extractor

Extracts and organizes brand logos for DeFi vault protocols by identifying homepage links and automating asset retrieval.

765

Canonical Event Deduplication

Normalizes and merges duplicate data from multiple sources using reputation scoring and semantic hash-based grouping.

585

Twitter Reader

Fetches Twitter/X post content and metadata into clean Markdown format using the Jina.ai API to bypass JavaScript restrictions.

384

Web to Markdown Converter

Transforms web pages into clean, readable Markdown files optimized for AI ingestion and local documentation.

347

Perplexity AI Search

Performs real-time AI web searches with citations using Perplexity models to provide up-to-date information and scientific literature.

324

Dev Opinions Scanner

Aggregates and synthesizes real-world developer perspectives from Hacker News, Reddit, and major technical communities.

313

Ark Research

Researches technical solutions and gathers cross-platform evidence to inform architecture and implementation decisions.

305

Reverse API Engineer

Transforms browser traffic into production-ready Python API clients through automated HAR analysis and code generation.

284

X Research

Conducts real-time agentic research and sentiment analysis across X/Twitter to gather developer insights and industry trends.

253

Recent Topic Research & Sentiment

Researches and synthesizes real-world community discussions from the last 30 days across Reddit, X, and the web.

253

Google Search & Web Access

Enables Claude to search the live web and fetch content from specific URLs to provide up-to-date information.

250

Tavily Web Search & Extraction

Equips Claude with high-performance web search capabilities and deep content extraction tools powered by the Tavily API.

247

30 results loaded • More available

Scroll for more results...