Skip to main content

[Preview] v1.79.3-stable - Built-in Guardrails on AI Gateway

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.79.3.rc.1

Key Highlightsโ€‹

  • LiteLLM Custom Guardrail - Built-in guardrail with UI configuration support
  • Performance Improvements - /responses API 19ร— Lower Median Latency
  • Veo3 Video Generation (Vertex AI + Google AI Studio) - Use OpenAI Video API to generate videos with Vertex AI and Google AI Studio Veo3 models

Built-in Guardrails on AI Gatewayโ€‹


This release introduces built-in guardrails for LiteLLM AI Gateway, allowing you to enforce protections without depending on an external guardrail API.

  • Blocking Keywords - Block known sensitive keywords like "litellm", "python", etc.
  • Pattern Detection - Block known sensitive patterns like emails, Social Security Numbers, API keys, etc.
  • Custom Regex Patterns - Define custom regex patterns for your specific use case.

Get started with the built-in guardrails on AI Gateway here.


Performance โ€“ /responses 19ร— Lower Median Latencyโ€‹

This update significantly improves /responses latency by integrating our internal network management for connection handling, eliminating per-request setup overhead.

Resultsโ€‹

MetricBeforeAfterImprovement
Median latency3,600 ms190 msโˆ’95% (~19ร— faster)
p95 latency4,300 ms280 msโˆ’93%
p99 latency4,600 ms590 msโˆ’87%
Average latency3,571 ms208 msโˆ’94%
RPS2311,059+358%

Test Setupโ€‹

CategorySpecification
Load TestingLocust: 1,000 concurrent users, 500 ramp-up
System4 vCPUs, 8 GB RAM, 4 workers, 4 instances
DatabasePostgreSQL (Redis unused)
Configurationconfig.yaml
Load Scriptno_cache_hits.py

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Azureazure/gpt-5-pro272K$15.00$120.00Responses API, reasoning, vision, PDF input
Azureazure/gpt-image-1-mini---Image generation - per pixel pricing
Azureazure/container---Container API - $0.03/session
OpenAIopenai/container---Container API - $0.03/session
Coherecohere/embed-v4.0128K$0.12-Embeddings with image input support
Geminigemini/gemini-live-2.5-flash-preview-native-audio-09-20251M$0.30$2.00Native audio, vision, web search
Vertex AIvertex_ai/minimaxai/minimax-m2-maas196K$0.30$1.20Function calling, tool choice
NVIDIAnvidia/nemotron-nano-9b-v2---Chat completions

OCR Modelsโ€‹

ProviderModelCost Per PageFeatures
Azure AIazure_ai/doc-intelligence/prebuilt-read$0.0015Document reading
Azure AIazure_ai/doc-intelligence/prebuilt-layout$0.01Layout analysis
Azure AIazure_ai/doc-intelligence/prebuilt-document$0.01Document processing
Vertex AIvertex_ai/mistral-ocr-2505$0.0005OCR processing

Search Modelsโ€‹

ProviderModelPricingFeatures
Firecrawlfirecrawl/searchTiered: $0.00166-$0.0166/query10-100 results per query
SearXNGsearxng/searchFreeOpen-source metasearch

Featuresโ€‹

  • Azure

    • Add Azure GPT-5-Pro Responses API support with reasoning capabilities - PR #16235
    • Add gpt-image-1-mini pricing for Azure with quality tiers (low/medium/high) - PR #16182
    • Add support for returning Azure Content Policy error information when exceptions from Azure OpenAI occur - PR #16231
    • Fix Azure GPT-5 incorrectly routed to O-series config (temperature parameter unsupported) - PR #16246
    • Fix Azure doesn't accept extra body param - PR #16116
    • Fix Azure DALL-E-3 health check content policy violation by using safe default prompt - PR #16329
  • Bedrock

    • Fix empty assistant message handling in AWS Bedrock Converse API to prevent 400 Bad Request errors - PR #15850
    • Fix: Filter AWS authentication params from Bedrock InvokeModel request body - PR #16315
    • Fix Bedrock proxy adding name to file content, breaks when cache_control in use - PR #16275
    • Fix global.anthropic.claude-haiku-4-5-20251001-v1:0 supports_reasoning flag and update pricing - PR #16263
  • Gemini (Google AI Studio + Vertex AI)

    • Add gemini live audio model cost in model map - PR #16183
    • Fix translation problem with Gemini parallel tool calls - PR #16194
    • Fix: Send Gemini API key via x-goog-api-key header with custom api_base - PR #16085
    • Fix image_config.aspect_ratio not working for gemini-2.5-flash-image - PR #15999
    • Fix Gemini minimal reasoning env overrides disabling thoughts - PR #16347
    • Fix cache_read_input_token_cost for gemini-2.5-flash - PR #16354
  • Anthropic

    • Fix Anthropic token counting for VertexAI - PR #16171
    • Fix anthropic-adapter: properly translate Anthropic image format to OpenAI - PR #16202
    • Enable automated prompt caching message format for Claude on Databricks - PR #16200
    • Add support for Anthropic Memory Tool - PR #16115
    • Propagate cache creation/read token costs for model info to fix Anthropic long context cost calculations - PR #16376
  • Vertex AI

    • Add Vertex MiniMAX m2 model support - PR #16373
    • Correctly map 429 Resource Exhausted to RateLimitError - PR #16363
    • Add vertex_credentials support to litellm.rerank() for Vertex AI - PR #16266
  • Databricks

  • Deepgram

    • Return the diarized transcript when it's required in the request - PR #16133
  • Fireworks

    • Update Fireworks audio endpoints to new api.fireworks.ai domains - PR #16346
  • Cohere

    • Add cohere embed-v4.0 model support - PR #16358
  • Watsonx

    • Support reasoning_effort for watsonx chat models - PR #16261
  • OpenAI

    • Remove automatic summary from reasoning_effort transformation - PR #16210
  • XAI

    • Remove Grok 4 Models Reasoning Effort Parameter - PR #16265
  • Hosted VLLM

    • Fix HostedVLLMRerankConfig will not be used - PR #16352

New Provider Supportโ€‹


LLM API Endpointsโ€‹

Featuresโ€‹

Bugsโ€‹

  • General
    • Fix index field not populated in streaming mode with n>1 and tool calls - PR #15962
    • Pass aws_region_name in litellm_params - PR #16321
    • Add retry-after header support for errors 502, 503, 504 - PR #16288

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Virtual Keys

    • UI - Delete Team Member with friction - PR #16167
    • UI - Litellm test key audio support - PR #16251
    • UI - Test Key Page Revert Model To Single Select - PR #16390
  • Models + Endpoints

    • UI - Add Model Existing Credentials Improvement - PR #16166
    • UI - Add Azure AD Token field and Azure API Key optional - PR #16331
    • UI - Fixed Label for vLLM in Model Create Flow - PR #16285
    • UI - Include Model Access Group Models on Team Models Table - PR #16298
    • Fix /model_group/info Returning Entire Model List for SSO Users - PR #16296
    • Litellm non root docker Model Hub Table fix - PR #16282
  • Guardrails

    • UI - Fix regression where Guardrail Entity Could not be selected and entity was not displayed - PR #16165
    • UI - Guardrail Info Page Show PII Config - PR #16164
    • Change guardrail_information to list type - PR #16127
    • UI - LiteLLM Guardrail - ensure you can see UI Friendly name for PII Patterns - PR #16382
    • UI - Guardrails - LiteLLM Content Filter, Allow Viewing/Editing Content Filter Settings - PR #16383
    • UI - Guardrails - allow updating guardrails through UI. Ensure litellm_params actually get updated in memory - PR #16384
  • SSO Settings

    • Support dot notation on ui sso - PR #16135
    • UI - Prevent trailing slash in sso proxy base url input - PR #16244
    • UI - SSO Proxy Base URL input validation and remove normalizing / - PR #16332
    • UI - Surface SSO Create errors on create flow - PR #16369
  • Usage & Analytics

    • UI - Tag Usage Top Model Table View and Label Fix - PR #16249
    • UI - Litellm usage date picker - PR #16264
  • Cache Settings

    • UI - Cache Settings Redis Add Semantic Cache Settings - PR #16398

Bugsโ€‹

  • General
    • UI - Remove encoding_format in request for embedding models - PR #16367
    • UI - Revert Changes for Test Key Multiple Model Select - PR #16372
    • UI - Various Small Issues - PR #16406

AI Integrationsโ€‹

Loggingโ€‹

  • Langfuse

    • Fix langfuse input tokens logic for cached tokens - PR #16203
  • Opik

    • Fix the bug with not incorrect attachment to existing trace & refactor - PR #15529
  • S3

    • S3 logger, add support for ssl_verify when using minio logger - PR #16211
    • Strip base64 in s3 - PR #16157
    • Add allowing Key based prefix to s3 path - PR #16237
    • Add Prometheus metric to track callback logging failures in S3 - PR #16209
  • OpenTelemetry

    • OTEL - Log Cost Breakdown on OTEL Logger - PR #16334
  • DataDog

    • Add DD Agent Host support for datadog callback - PR #16379

Guardrailsโ€‹

Secret Managersโ€‹


Spend Tracking, Budgets and Rate Limitingโ€‹

  • Cost Tracking
    • Fix OpenAI Responses API streaming tests usage field names and cost calculation - PR #16236

MCP Gatewayโ€‹

  • Configuration

Performance / Loadbalancing / Reliability improvementsโ€‹

  • Memory Leak Fixes

    • Resolve memory accumulation caused by Pydantic 2.11+ deprecation warnings - PR #16110
  • Session Management

    • Add shared_session support to responses API - PR #16260
  • Error Handling

    • Gracefully handle connection closed errors during streaming - PR #16294
    • Handle None values in daily spend sort key - PR #16245
  • Configuration

    • Remove minimum validation for cache control injection index - PR #16149
    • Improve clearing logic - only remove unvisited endpoints - PR #16400
  • Redis

    • Handle float redis_version from AWS ElastiCache Valkey - PR #16207
  • Hooks

    • Add parallel execution handling in during_call_hook - PR #16279
  • Infrastructure

    • Install runtime node for prisma - PR #16410

Documentation Updatesโ€‹

  • Provider Documentation

    • Docs - v1.79.1 - PR #16163
    • Fix broken link on model_management.md - PR #16217
    • Fix image generation response format - use 'images' array instead of 'image' object - PR #16378
  • General Documentation

    • Add minimum resource requirement for production - PR #16146
    • Add benchmark comparison with other AI gateways - PR #16248
    • LiteLLM content filter guard documentation - PR #16413
    • Fix typo of the word orginal - PR #16255
  • Security

    • Remove tornado test files (including test.key), fixes Python 3.13 security issues - PR #16342

New Contributorsโ€‹

  • @steve-gore-snapdocs made their first contribution in PR #16149
  • @timbmg made their first contribution in PR #16120
  • @Nivg made their first contribution in PR #16202
  • @pablobgar made their first contribution in PR #16194
  • @AlanPonnachan made their first contribution in PR #16150
  • @Chesars made their first contribution in PR #16236
  • @bowenliang123 made their first contribution in PR #16255
  • @dean-zavad made their first contribution in PR #16199
  • @alexkuzmik made their first contribution in PR #15529
  • @Granine made their first contribution in PR #16281
  • @Oodapow made their first contribution in PR #16279
  • @jgoodyear made their first contribution in PR #16275
  • @Qanpi made their first contribution in PR #16321
  • @ShimonMimoun made their first contribution in PR #16313
  • @andriykislitsyn made their first contribution in PR #16288
  • @reckless-huang made their first contribution in PR #16263
  • @chenmoneygithub made their first contribution in PR #16368
  • @stembe-digitalex made their first contribution in PR #16354
  • @jfcherng made their first contribution in PR #16352
  • @xingyaoww made their first contribution in PR #16246
  • @emerzon made their first contribution in PR #16373
  • @wwwillchen made their first contribution in PR #16376
  • @fabriciojoc made their first contribution in PR #16203
  • @jroberts2600 made their first contribution in PR #16273

Full Changelogโ€‹

View complete changelog on GitHub