Which NVIDIA architectures are supported by this skill?

The skill provides specialized guidance for Hopper (Compute 9.0), Ada Lovelace (Compute 8.9), Ampere (Compute 8.0-8.6), and Turing (Compute 7.5).

Can it help me write better CUDA code?

Yes, it provides code snippets for feature detection, asynchronous pipelines, and architecture-specific optimizations like TF32 conversion and thread block clustering.

Does it help with backward compatibility?

Yes, it assists in creating fallback strategies for older architectures and checking compute capability at compile time to ensure portability.

How does this skill improve Machine Learning performance?

It advises on the best use of Tensor Cores, specialized data types like FP8 and BF16, and hardware-specific features like the Transformer Engine.

NVIDIA GPU Architecture Advisor

Name: NVIDIA GPU Architecture Advisor
Author: keith-mvs

bykeith-mvs

•

Data Science & ML

Optimizes CUDA code and GPU applications with architecture-specific insights for NVIDIA Hopper, Ada Lovelace, and Ampere hardware.

The GPU Architecture Advisor skill empowers developers to maximize performance on NVIDIA hardware by providing deep technical guidance tailored to specific GPU architectures. From leveraging 4th-gen Tensor Cores and the Transformer Engine in Hopper to implementing Shader Execution Reordering in Ada Lovelace, this skill helps identify the optimal compute capabilities and hardware features for any high-performance computing project. It assists with writing performance-portable CUDA code, determining hardware-specific throughput bottlenecks, and implementing efficient memory management strategies across different GPU generations.

Key Features

01Deep-dive guidance on Tensor Core (WMMA/CUTLASS) and RT Core utilization

02Detailed insights into memory bandwidth and L2 cache configurations

031 GitHub stars

04Compute capability feature detection and runtime fallback strategies

05Automated suggestions for async pipeline and thread block clustering

06Architecture-specific optimization for Hopper, Ada, Ampere, and Turing

Use Cases

01Optimizing deep learning kernels for FP8 support on H100 (Hopper) GPUs

02Implementing specialized ray tracing optimizations for Ada Lovelace hardware

03Refactoring legacy CUDA code to leverage Ampere-era TF32 and BF16 formats

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add keith-mvs/nsight-copilot gpu-architecture-advisor

For use in Claude.ai and ChatGPT

Download Skill