Which benchmarks are supported?

It supports leading datasets including HarmBench, JailbreakBench, ToxiGen, TruthfulQA, RobustBench, and AdvGLUE for comprehensive AI evaluation.

Does this skill help with OWASP compliance?

Yes, it maps results to OWASP LLM 2025 categories such as Prompt Injection (LLM01), Sensitive Data Disclosure (LLM02), and Model Poisoning (LLM04).

Can I use this for vision models?

Yes, the skill includes specific configurations for LLM, vision, multimodal, and embedding model types.

What metrics does it provide?

It provides detailed metrics such as Attack Success Rate (ASR), Toxicity Scores, Accuracy Disparity, and Robust Accuracy to help compare model performance.

What is the benchmark-datasets skill used for?

It is used to run standardized security and safety tests against AI models to identify vulnerabilities like jailbreaks, toxicity, and adversarial weaknesses.

AI Security & Safety Benchmarks

Name: AI Security & Safety Benchmarks
Author: pluginagentmarketplace

bypluginagentmarketplace

•

Security & Testing

Evaluates AI model security, robustness, and safety using standardized datasets like HarmBench, JailbreakBench, and AdvGLUE.

This skill provides a comprehensive framework for running industry-standard benchmarks against AI models to assess vulnerabilities, bias, and safety risks. By integrating datasets such as HarmBench for harmful behaviors, JailbreakBench for prompt injection defense, and RobustBench for adversarial robustness, it enables security researchers and developers to quantify an AI system's resistance to attacks and alignment with safety standards. It bridges the gap between model development and security auditing by providing structured mappings to the OWASP Top 10 for LLMs and the NIST AI Risk Management Framework.

Key Features

01Adversarial Robustness Assessment (RobustBench, AdvGLUE)

02Jailbreak & Prompt Injection Testing (JailbreakBench, AdvBench)

031 GitHub stars

04Standardized Safety Evaluation (HarmBench, ToxiGen, TruthfulQA)

05Privacy & Data Extraction Audits (Membership Inference, Model Inversion)

06Mapping to OWASP LLM 2025 and NIST AI RMF standards

Use Cases

01Red teaming Large Language Models (LLMs) before production deployment

02Auditing AI models for compliance with safety, bias, and truthfulness requirements

03Benchmarking the adversarial robustness of vision, multimodal, and embedding models

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add pluginagentmarketplace/custom-plugin-ai-red-teaming benchmark-datasets

For use in Claude.ai and ChatGPT

Download Skill

Key Features

01Adversarial Robustness Assessment (RobustBench, AdvGLUE)

02Jailbreak & Prompt Injection Testing (JailbreakBench, AdvBench)

031 GitHub stars

04Standardized Safety Evaluation (HarmBench, ToxiGen, TruthfulQA)

05Privacy & Data Extraction Audits (Membership Inference, Model Inversion)

06Mapping to OWASP LLM 2025 and NIST AI RMF standards

Use Cases

01Red teaming Large Language Models (LLMs) before production deployment

02Auditing AI models for compliance with safety, bias, and truthfulness requirements

03Benchmarking the adversarial robustness of vision, multimodal, and embedding models

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add pluginagentmarketplace/custom-plugin-ai-red-teaming benchmark-datasets

For use in Claude.ai and ChatGPT

Download Skill