Does this skill work with any GPU architecture?

Yes, it includes specific optimization matrices for H100, A100, RTX 4090, and RTX 3090, with a generic fallback for other cards.

What is the benefit of layered configuration?

Layering ensures that 'quick_test' modes only reduce timesteps or epochs while keeping critical hardware settings like n_envs and torch.compile enabled for full speed.

Why is my A100 training speed lower than expected?

Slow training (e.g., <10,000 FPS) is usually caused by low environment counts (n_envs) or disabled torch.compile. This skill forces optimal hardware saturation.

How much faster can my training become?

By correctly saturating the GPU and using torch.compile, many users see a 10x+ increase in FPS compared to generic default configurations.

Does it support FP8 precision?

Yes, it automatically detects compute capability 9.0+ (H100) to enable FP8 precision for additional performance gains.

GPU-Aware ML Training Configuration

Name: GPU-Aware ML Training Configuration
Author: smith6jt-cop

bysmith6jt-cop

0•

Data Science & ML

Optimizes PPO training performance on A100 and H100 GPUs by automatically aligning hyperparameters with hardware capabilities.

This skill provides a specialized framework for configuring Reinforcement Learning (RL) training sessions, specifically PPO, to maximize throughput on high-end NVIDIA GPUs. It prevents common performance bottlenecks—such as under-utilization of A100/H100 cores and disabled compilation—by implementing a layered configuration strategy that detects hardware tiers before applying training modes. By ensuring optimal parallel environment counts, appropriate minibatch sizes, and enabling torch.compile, it helps developers achieve 10x or greater speedups in training frames per second (FPS).

Key Features

01FP8 precision support specifically for Hopper (H100) architecture

02Integrated torch.compile setup for 3-6x speed improvements after warmup

03Automatic GPU tier detection for H100, A100, and RTX 4090/3090

04Layered configuration logic that preserves hardware optimizations across test/prod modes

05Optimized environment scaling (n_envs) to ensure maximum GPU saturation

060 GitHub stars

Use Cases

01Standardizing ML training scripts to scale across different hardware tiers automatically

02Accelerating RL agent training in high-performance environments like Isaac Gym

03Debugging low GPU utilization and FPS drops on cloud-based A100 instances

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add smith6jt-cop/skills_registry gpu-aware-training-config

For use in Claude.ai and ChatGPT

Download Skill