Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17skills for Claude, ChatGPT & Codex.

Wayback Machine Explorer

Lists and manages archived snapshots from the Wayback Machine to track website history and recover lost content.

Wayback Cache Management

Manages local API response caching for Wayback Machine operations to optimize performance and ensure data freshness.

Wayback Machine Newest Capture

Locates and retrieves the most recent archived version of any URL from the Internet Archive's Wayback Machine.

Wayback Machine Archive Range

Retrieves and calculates the full historical archive span for any URL using the Wayback Machine.

GitHub User Explorer

Retrieves comprehensive GitHub user and organization profile data including repository counts, follower statistics, and account metadata.

Wayback Oldest Archive Finder

Retrieves the earliest archived snapshot of any URL from the Wayback Machine to identify a website's original version.

Wayback URL Archiver

Archives URLs to the Internet Archive's Wayback Machine for permanent digital preservation and snapshot tracking.

Wayback Machine Screenshot

Retrieves and manages historical visual snapshots of websites using the Internet Archive's Wayback Machine.

DeepSeek OCR Tool

Converts batches of images and scanned documents into structured markdown files using local DeepSeek-OCR models via Ollama.

Managing Fighter Images

Orchestrates a multi-source image pipeline to download, validate, and normalize fighter photos from Wikimedia, Sherdog, and Bing.

PDF Smart Extractor

Extracts and analyzes large PDF documents locally with semantic chunking to minimize token usage and maximize context efficiency.

Exa Semantic Search

Performs neural, context-aware web searches and deep research tasks to find high-quality information that keyword matching misses.

URL to Markdown Converter

Converts any webpage into clean, formatted Markdown using Chrome CDP for full JavaScript rendering and metadata extraction.

Web Browser & Scraper

Enables autonomous web scraping and content extraction using shot-scraper to interact with and retrieve data from websites.

Competitive Ads Extractor

Extracts and analyzes competitor advertisements from ad libraries to uncover winning messaging, pain points, and creative strategies.

FireCrawl

Crawls and scrapes websites to extract structured article content using the FireCrawl API.

Web Crawler & URL Extractor

Crawls websites using the Tavily API to convert web pages into local markdown files for offline analysis and documentation retrieval.

Website Content Extractor

Extracts clean, clutter-free article and blog content from URLs by stripping away ads, navigation, and unnecessary UI elements.

AI Web Research

Conducts comprehensive, web-grounded research with automatic citations and structured data output directly from your terminal.

IPO Tracker & TradingView Watchlist Generator

Tracks recent Initial Public Offerings and generates ready-to-import TradingView watchlists with enriched market data.

PDF to Markdown Converter

Converts complex PDF documents into clean, structured Markdown while preserving tables, formatting, and images for AI context.

Perplexity Search for Claude

Enables Claude to perform AI-powered web searches with real-time information and source citations using Perplexity models.

Pattern Radar Configurator

Configures sources, relevance weights, and domain interests for the Pattern Radar discovery tool.

USPTO Database Connector

Accesses comprehensive USPTO APIs for patent and trademark searches, examination history, and intellectual property analysis.

YouTube Transcript Downloader

Downloads and processes YouTube video transcripts, subtitles, and captions with automatic fallback to AI-powered transcription.

YouTube Transcript Extractor

Extracts subtitles and transcripts from YouTube videos and saves them as local text files with timestamps.

Bright Data Progressive Scraper

Automates resilient web content extraction using a four-tier fallback strategy to bypass bot detection and JavaScript hurdles.

Multi-Agent Research Assistant

Orchestrates parallel research agents across Perplexity, Claude, and Gemini to deliver synthesized, multi-perspective reports with source attribution.

YouTube Video Downloader

Downloads YouTube videos and audio with customizable quality and format settings using yt-dlp.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline viewing, editing, and archival.

30 results loaded • More available

Scroll for more results...