Does this skill provide hardware acceleration?

Yes, it supports N:M structured pruning (such as 2:4 sparsity), which is specifically designed to trigger hardware acceleration on NVIDIA sparse tensor cores.

How much accuracy is typically lost during pruning?

Using advanced methods like SparseGPT or Wanda, you can typically achieve 50% sparsity with less than 1% accuracy loss on most standard LLM benchmarks.

What is the difference between Wanda and SparseGPT pruning?

Wanda prunes weights based on the product of their magnitude and input activations, whereas SparseGPT uses second-order Hessian information for higher precision at the cost of more computation.

Can I prune a model without retraining it from scratch?

Yes, the skill focuses on one-shot pruning methods that use a small calibration dataset to compress the model without requiring a full training cycle.

Model Pruning & LLM Compression

Name: Model Pruning & LLM Compression
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

Data Science & ML

Reduces Large Language Model size and accelerates inference using advanced pruning techniques like Wanda and SparseGPT.

The Model Pruning skill provides a comprehensive framework for compressing LLMs to enable efficient deployment on constrained hardware without significant loss in accuracy. By implementing state-of-the-art methods like Wanda (weights multiplied by activations) and SparseGPT (second-order pruning), this skill allows developers to achieve up to 50% sparsity in a one-shot manner. It supports unstructured, structured, and hardware-optimized N:M sparsity patterns, making it an essential tool for AI researchers and engineers looking to optimize model performance for edge devices, mobile platforms, and high-throughput production environments.

Key Features

01One-shot pruning using Wanda and SparseGPT algorithms

02Calibration workflows for activation-aware weight removal

03Support for NVIDIA-optimized N:M (2:4) structured sparsity

04Performance evaluation pipelines for pruned vs. baseline models

053,983 GitHub stars

06Layer-wise and iterative pruning strategies for accuracy recovery

Use Cases

01Deploying large-scale models like Llama-2 on mobile or edge devices

02Reducing memory footprints and cloud hosting costs for production LLMs

03Accelerating inference throughput on hardware with sparse tensor cores

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills model-pruning

For use in Claude.ai and ChatGPT

Download Skill