Explore our collection of Agent Skills to enhance your AI workflow.
Combines multiple fine-tuned AI models into a single high-performance model without requiring additional training or expensive GPU resources.
Evaluates Large Language Models across 100+ industry-standard benchmarks using NVIDIA's enterprise-grade containerized architecture.
Quantizes Large Language Models to ultra-low bit precision without requiring calibration datasets for efficient inference and fine-tuning.
Optimizes LLM serving and structured data generation with RadixAttention prefix caching for high-performance agentic workflows.
Implements programmable safety rails and runtime validation for LLM applications using NVIDIA's NeMo Guardrails framework.
Simplifies PyTorch distributed training across multiple GPUs, TPUs, and nodes with minimal code changes and a unified API.
Implements language-independent subword tokenization using BPE and Unigram algorithms for robust NLP model training and inference.
Optimizes Large Language Model inference for maximum throughput and ultra-low latency on NVIDIA GPUs.
Deploys and manages high-performance RLHF training pipelines for large-scale language models using Ray and vLLM acceleration.
Decomposes neural network activations into interpretable, sparse features using SAELens for deep mechanistic interpretability research.
Integrates Weights & Biases into your workflow to track machine learning experiments, visualize training metrics, and manage model artifacts in real-time.
Facilitates causal interventions on PyTorch models using a declarative framework for mechanistic interpretability experiments.
Implements PyTorch-native agentic reinforcement learning workflows using Meta's torchforge library for scalable algorithm experimentation.
Facilitates high-performance distributed data processing and streaming for large-scale machine learning workloads.
Implements and optimizes Selective State Space Models (SSM) for high-performance sequence modeling and long-context AI applications.
Generates high-quality sentence, text, and image embeddings for RAG, semantic search, and clustering using state-of-the-art transformer models.
Serves large language models with high throughput and low latency using PagedAttention and continuous batching.
Monitors, debugs, and evaluates large language model applications with comprehensive tracing and systematic testing tools.
Transcribes and translates audio across 99 languages using OpenAI's robust general-purpose speech recognition models.
Streamlines the fine-tuning of large language models using Axolotl through expert YAML configuration and advanced training pattern guidance.
Build and optimize complex AI systems using declarative programming instead of manual prompt engineering.
Automates ML workload deployment across multiple cloud providers with intelligent cost optimization and spot instance management.
Evaluates Large Language Models across 60+ academic benchmarks using standardized prompts and metrics for reproducible research.
Compresses large language models to 4-bit precision to enable high-speed inference and deployment on consumer-grade hardware.
Implements and manages RWKV architectures for efficient, linear-time AI inference and long-context processing.
Optimizes large language model fine-tuning using LoRA, QLoRA, and other parameter-efficient methods to significantly reduce memory and hardware requirements.
Builds sophisticated LLM applications using agents, chains, and Retrieval-Augmented Generation (RAG) with a unified interface.
Optimizes Large Language Models using 4-bit activation-aware weight quantization to achieve 3x faster inference with minimal accuracy loss.
Generates high-quality music and sound effects from text descriptions using Meta's AudioCraft library.
Evaluates AI code generation models using industry-standard benchmarks and pass@k metrics.
Scroll for more results...