Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.
Automates the retrieval and normalization of academic paper metadata from arXiv to support research pipelines and literature reviews.
Extracts and analyzes Reddit content including posts, comments, subreddits, and user profiles using the public JSON API.
Extracts and formats transcripts from YouTube videos using URLs or video IDs.
Converts diverse file formats including PDFs, Office documents, and media into structured, token-efficient Markdown for LLM processing.
Extracts text transcripts and captions from YouTube videos for content analysis, summarization, and documentation.
Empowers Claude with real-time web search, content extraction, and deep research capabilities using the Tavily API.
Enables high-quality web search, content extraction, and deep research capabilities using the Tavily API.
Converts existing applications into serverless Apify Actors for scalable web scraping and data processing workflows.
Monitor brand reputation and sentiment by scraping reviews and mentions across major social and review platforms.
Develops, debugs, and deploys high-performance serverless actors for web scraping, data processing, and automation on the Apify platform.
Analyzes market trends, competitive pricing, and consumer behavior by extracting real-time data from major social and location-based platforms.
Extracts and analyzes competitor data across social media, maps, and travel platforms using specialized Apify Actors.
Automates data extraction from over 55 major platforms including Instagram, TikTok, and Google Maps using AI-driven Apify Actors.
Monitors and extracts emerging trends from Google, Instagram, Facebook, YouTube, and TikTok to power data-driven content strategies.
Integrates the Tavily Search API to provide real-time, clean, and LLM-optimized web search results for RAG pipelines.
Parses and extracts structured content from complex PDF documents using LlamaParse and agentic OCR capabilities.
Transforms unstructured files like PDFs, Word documents, and presentations into structured Pydantic models using LlamaExtract services.
Converts websites into LLM-ready markdown and structured data using the Firecrawl API.
Performs semantic and neural web searches to find content based on context and meaning rather than simple keywords.
Orchestrates multi-agent parallel workflows to conduct comprehensive research and generate structured, insight-driven reports.
Downloads high-quality videos and HLS streams from platforms like YouTube, Vimeo, and Mux using optimized workflows for yt-dlp and ffmpeg.
Replicates existing websites into production-ready Next.js 16 and Tailwind CSS v4 codebases using Firecrawl MCP.
Scrapes, crawls, and extracts LLM-optimized content from any website using the Firecrawl CLI.
Automates web content extraction using a progressive four-tier strategy to bypass bot detection and CAPTCHAs.
Implements a four-tier progressive fallback strategy to reliably extract web content from any URL, regardless of bot detection or JavaScript requirements.
Downloads high-quality video and audio content from YouTube and HLS-based streaming platforms while resolving common authentication and formatting issues.
Implements sophisticated ID and content-based deduplication with reputation-aware canonical selection for multi-source data aggregation.
Automates web content extraction and competitive intelligence by intelligently selecting between WebFetch, Tavily, and agent-driven browsers.
Automates intelligent web content extraction and competitive intelligence gathering through a multi-tiered tool selection framework.
Automates web content extraction and competitive monitoring by intelligently selecting the optimal tool for any target URL or research task.
Scroll for more results...