Can I use this skill to implement RAG?

Yes, it provides standard patterns for generating embeddings, querying Cloudflare Vectorize, and passing context to LLMs for informed responses.

How is AI usage billed in 2025?

Cloudflare uses a unit-based pricing model billed in Neurons, with a free tier of 10,000 neurons per day and paid tiers for unlimited usage.

How does this skill handle the 2025 breaking changes?

It includes specific guidance on the new max_tokens default (256) and the non-backwards compatible BGE pooling parameter change from mean to cls.

Is streaming supported for text generation?

Yes, the skill recommends and provide examples for streaming text generation to avoid Worker timeouts and improve time-to-first-token.

Which AI models are supported by this skill?

The skill supports a wide array of models including Llama 4 Scout, Gemma 3, Mistral 3.1, Flux for images, Deepgram for audio, and BGE for embeddings.

Cloudflare Workers AI

Name: Cloudflare Workers AI
Author: jezweb

byjezweb

•

117

•

Cloud Infrastructure

Integrates Cloudflare's global GPU network into Claude Code to run high-performance LLMs, image generation, and text embeddings.

This skill provides comprehensive guidance for deploying and managing AI models on Cloudflare's serverless platform. It covers the 2025 updates for flagship models like Llama 4, Gemma 3, and Mistral 3.1, while addressing critical implementation details such as breaking changes in token defaults and BGE pooling parameters. Developers can use this skill to implement low-latency streaming, build robust RAG workflows with Vectorize, and leverage AI Gateway for advanced caching, logging, and cost tracking across Cloudflare's distributed infrastructure.

Key Features

01High-speed RAG implementation with optimized BGE and Gemma embeddings

02Production-ready patterns for Flux and Leonardo image generation

03Advanced audio capabilities with Deepgram Aura 2 and Whisper v3 Turbo

04AI Gateway integration for caching, analytics, and neuron-based cost tracking

05117 GitHub stars

06Support for 2025 LLMs including Llama 4 Scout, GPT-OSS, and Gemma 3

Use Cases

01Building serverless AI applications with low-latency streaming and global scaling

02Implementing multimodal workflows involving vision, audio, and image generation

03Developing high-performance RAG systems using Vectorize and BGE embeddings

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add jezweb/claude-skills cloudflare-workers-ai

For use in Claude.ai and ChatGPT

Download Skill