Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.

Advanced Web Research Agent

Conducts comprehensive web research by orchestrating parallel subagents to gather and synthesize information into structured reports.

PDF Text Extractor

Automates the downloading and text extraction of academic PDFs to provide high-fidelity evidence for research pipelines.

Web Research Agent

Conducts structured, multi-threaded web research by coordinating subagents to gather and synthesize complex information into comprehensive reports.

BioRxiv Database Research Tool

Enables efficient searching and retrieval of life sciences preprints from the bioRxiv server for research and analysis.

Bright Data Progressive Scraper

Retrieves web content through a four-tier progressive fallback strategy to bypass bot detection and access restrictions.

MarkItDown Document Converter

Converts complex file formats including PDF, Office documents, and media into clean Markdown optimized for LLM processing.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline access and archival.

Structured Web Research

Automates multi-step information gathering and synthesis using structured planning and parallel subagents.

BioRxiv Database Search

Searches and retrieves life sciences preprints from the bioRxiv database with advanced filtering and PDF download capabilities.

Bright Data Progressive Scraper

Implements a four-tier progressive scraping strategy to bypass bot detection and reliably extract web content.

Bright Data Progressive Scraper

Implements a four-tier progressive escalation strategy to reliably scrape web content and bypass advanced bot detection.

Video Downloader

Downloads high-quality video and audio content from YouTube and other platforms directly through your terminal workspace.

Bright Data Web Scraper

Automates web content retrieval using a progressive four-tier fallback strategy to bypass bot detection and access restrictions.

Web Research Agent

Conducts deep, multi-faceted web research by orchestrating parallel subagents to plan, gather, and synthesize complex information.

Structured Web Research

Conducts deep web investigations by delegating tasks to specialized subagents and synthesizing findings into organized reports.

Matrix Repomix

Packs external GitHub or local repositories into a token-efficient format for deep context analysis within Claude Code.

Advanced Progressive Web Scraper

Automates web content extraction using a four-tier fallback strategy to bypass bot detection and CAPTCHAs.

Structured Web Research

Conducts systematic web research through autonomous subagent delegation and multi-source synthesis.

Working Nomads Job Scraper

Scrapes and organizes remote job listings from workingnomads.com with advanced filtering and multi-format export capabilities.

Claude Community Insights & Feature Research

Analyzes Reddit community discussions to identify feature requests, user pain points, and emerging use cases for Claude AI and Claude Code.

YouTube Transcriber

Extracts subtitles and transcripts from YouTube videos directly into local text files using command-line tools or browser automation.

Browser Content Capture

Captures web content from JavaScript-rendered pages and authenticated sites using the agent-browser CLI.

Browser Content Capture

Captures web content from JavaScript-heavy, login-protected, and multi-page sites using the agent-browser CLI.

Documentation Scraper

Transforms documentation websites into structured, categorized reference files optimized for AI context and offline archives.

Google Places Search & Data

Queries the Google Places API to retrieve detailed location information, reviews, and search results directly within the Claude Code environment.

llms.txt Support

Detects and ingests LLM-optimized documentation via the llms.txt standard to accelerate context gathering for autonomous agents.

Documentation Scraper

Scrapes documentation websites and transforms them into organized, categorized reference files for AI context and offline archives.

Dev Opinions Scan

Aggregates and synthesizes technical opinions and developer reactions from major online communities like Reddit and Hacker News.

Z.AI CLI Multi-Tool

Enhances Claude with advanced vision analysis, real-time web searching, and deep GitHub repository exploration capabilities.

Nia - Intelligent Context & Documentation

Indexes and searches external repositories, documentation, and research papers to provide Claude with high-fidelity context for development tasks.

30 results loaded • More available

Scroll for more results...