Getting Started with Helicone — LLM Observability in One Line of Code

The Motivation

The landscape of AI development has fundamentally shifted. Every week, thousands of teams deploy LLM-powered applications to production—chatbots answering customer questions, AI assistants drafting emails, agents booking appointments, medical AI systems triaging patient symptoms. The possibilities seem endless, and the barrier to entry feels lower than ever. You can spin up a GPT-4 integration in an afternoon and have something impressive working by dinner.

But here's the uncomfortable reality that hits about two weeks after launch: you have no idea what your LLM application is actually doing.

Your monthly OpenAI bill jumped from $500 to $5,000, but you can't pinpoint which prompts or users caused the spike. Your response times suddenly doubled, but you don't know if it's your code, the model, or network latency. A customer reports that the AI gave an incorrect answer three days ago, but you have no record of what they asked or what the model returned. You suspect some users are gaming your system with unnecessarily long prompts, but you can't prove it. Worst of all, you're about to present cost projections to your CEO, and your best estimate is "somewhere between three thousand and fifteen thousand dollars per month."

Traditional application performance monitoring (APM) tools like Datadog, New Relic, or Sentry weren't built for this. They can tell you if your API returned a 500 error, but they can't tell you that your average token consumption per request increased by 40% after you tweaked your system prompt. They can track HTTP response times, but they can't measure time-to-first-token for streaming responses. They can log errors, but they can't capture the content quality of responses that technically succeed but are unhelpful.

The questions this article answers are:

"How do I track LLM costs, latency, and quality without rewriting my entire application?"
"What's the fastest path to production-grade observability for OpenAI, Claude, Gemini, or any LLM provider?"
"How can I trace multi-step agent workflows and understand which agents consume the most budget?"
"What metrics should I actually be tracking for LLM applications?"

This guide provides the complete blueprint for adding enterprise-grade observability to any LLM application in under five minutes. By the end, you'll understand how to instrument your code with a single line change, view every request in a searchable dashboard, track per-user costs, and set up the foundation for advanced features like caching, rate limiting, and prompt management.

                Key Innovation:
                Helicone uses a header-based proxy architecture that adds observability without SDK installations. Change your base URL, add one header, and every LLM call is automatically logged with tokens, costs, latency, and full request/response bodies.
            

Helicone Architecture: How It Works

Your Application

client.chat.completions.create()

HTTPS Request

Helicone Proxy

ai-gateway.helicone.ai

Log Request

Auth Check

Route

Store

LLM Provider

OpenAI / Claude / Gemini

Async

ClickHouse

Analytics Database

Response + Metadata

Response Returned

+ Logged to Dashboard

~1-5ms

Added Latency

100%

Request Coverage

2B+

Requests Processed

Quick Start: 60-Second Integration

helicone.ai

Get API Key

Free: 10K req/mo

Change URL

1 line of code

Done!

Full observability

# Before
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# After (2 changes)
client = OpenAI(

                                base_url="https://ai-gateway.helicone.ai",
                            

                                api_key=os.getenv("HELICONE_API_KEY")
                            
)

2 Lines

Code Changed

→

100%

Request Visibility

What Helicone Captures Automatically

Cost Tracking

Input/output tokens × pricing for 300+ models. Accurate to the penny.

Latency Metrics

Total latency + Time to First Token for streaming responses.

Token Counts

Input tokens, output tokens, total tokens per request.

Model Details

Model name, provider, version, parameters used.

Status & Errors

HTTP status codes, error messages, retry attempts.

Full Content

Complete request body, response body, system prompts.

User Analytics

Per-user costs, request counts, usage patterns.

Custom Properties

Tag requests with department, environment, version, etc.

The Challenge

The Problem is that traditional application monitoring tools fail completely when applied to Large Language Model applications. This is not a minor gap—it's a fundamental architectural mismatch.

Standard APM tools were designed for deterministic software. Your web server processes a request, queries a database, performs some calculations, and returns a response. The cost is essentially constant (server time), the latency is somewhat predictable, and success is binary: either the endpoint returned a 200 status code or it didn't. Traditional monitoring captures HTTP status codes, database query duration, memory usage, and error stack traces. This works beautifully for conventional software.

LLM applications break every single one of these assumptions. Every API call carries variable cost based on tokens consumed—a short response might cost $0.002 while a long one costs $0.04. The latency is unpredictable: model load times, queue depth, and token generation speed all fluctuate. A 200 status code tells you nothing about quality—a technically successful response might still be wrong, unhelpful, or off-topic. Streaming responses add another dimension: time-to-first-token (TTFT) matters more than total latency for user experience, but standard tools don't capture it.

Most critically, LLM applications often involve multi-step workflows where a single user query triggers dozens of API calls. A research agent might call GPT-4 to plan its approach, Claude to search documentation, GPT-4o-mini to summarize findings, and GPT-4 again to synthesize a final answer. Without tracing, you have no way to understand which step consumed your budget or introduced latency.

The consequences are concrete and expensive:

Cost overruns: A team discovers their bill jumped from $1,200 to $8,500 in a week because a prompt change inadvertently doubled average input tokens
Silent degradation: A healthcare AI assistant starts returning longer, less focused answers, but no alert fires because HTTP 200 is still returned
Debugging nightmares: A customer reports an error, but you have no record of their conversation context or the exact prompt that triggered the problem
Impossible optimization: You can't improve what you can't measure—without visibility into which prompts cost the most or which models perform best, you're flying blind

What's needed is observability purpose-built for LLMs: tracking input/output token counts, cost per request calculated from provider pricing, time-to-first-token for streaming, prompt versions, cache hit rates, per-user consumption metrics, and hierarchical traces for multi-agent workflows. Helicone was designed specifically to solve this observability gap.

Before vs. After Helicone

Without Helicone

No visibility into costs per request
Can't trace multi-step workflows
No record of prompts or responses
Unknown per-user consumption
Debugging requires guesswork
Bill surprises every month
No quality metrics tracked

With Helicone

$0.004 cost per request visible
Hierarchical traces for 47-step workflows
Full prompt/response history saved
Per-user costs tracked automatically
Debug with exact request context
Predictable budgets with alerts
TTFT, latency, quality metrics

All this changes with 2 lines of code

Change base_url + api_key = Complete observability

Lucifying the Problem

Let's lucify this concept with an everyday analogy.

Imagine you're driving a car without a dashboard. No speedometer, no fuel gauge, no check engine light, no odometer. The engine runs, the wheels turn, and you're technically making progress down the road. But you have no idea how fast you're going, how much fuel you have left, whether anything is wrong under the hood, or how far you've traveled. You just drive and hope for the best.

That works fine for a short trip down familiar roads. But what happens on a long journey? You might run out of gas with no warning. You might be driving dangerously fast without realizing it. A small mechanical problem might escalate into catastrophic failure because you never saw the warning signs. You have no way to plan stops or estimate arrival time. And when something eventually goes wrong—and it will—you won't have any data to diagnose the problem.

This is what running LLM applications without observability feels like. Your application "works," in the sense that API calls go out and responses come back. But underneath:

No speedometer = no latency visibility (you don't know if responses are fast or slow)
No fuel gauge = no cost tracking (your budget drains invisibly)
No check engine light = errors and degradation happen silently
No odometer = no usage metrics (no idea how much work the system is doing)

Now imagine installing a comprehensive dashboard. Suddenly you can see your speed in real-time, watch the fuel gauge, get alerted when something needs attention, and track every mile traveled. You gain the confidence to drive faster because you can see what's happening. You can plan fuel stops. You catch small problems before they become big ones. Your whole relationship with the vehicle changes from reactive panic to proactive control.

That's what Helicone does for LLM applications. It gives you the dashboard that makes invisible operations visible, transforms vague anxiety into concrete metrics, and enables you to confidently operate and optimize production systems.

Limitation of this analogy: Driving is typically a single-person, single-vehicle activity, while LLM applications often involve complex multi-agent workflows with parallel operations. A better extension of the metaphor would be managing a fleet of vehicles—tracking multiple cars simultaneously, understanding which routes cost the most, coordinating between drivers—but the core principle remains: you can't manage what you can't see.

Lucifying the Tech Terms

To solve this, we first need to lucify the key technical terms that underpin LLM observability. Understanding these five concepts will clarify both why traditional monitoring fails and how Helicone's architecture succeeds.

Observability vs. Monitoring

Definition: Observability is the ability to understand the internal state of a system by examining its external outputs (logs, metrics, traces), enabling you to ask arbitrary questions about system behavior. Monitoring is the narrower practice of tracking predefined metrics and alerting when they cross thresholds.

Simple Example: Monitoring tells you "API latency exceeded 2 seconds." Observability lets you investigate why by examining the specific request that was slow, its token counts, the model version used, whether it hit cache, and the full prompt context.

Analogy: Monitoring is like a smoke detector—it tells you there's a fire, but not where it started or what's burning. Observability is like security cameras and sensor systems throughout your building—you can rewind, zoom in, examine the context, and understand exactly what happened and why.

LLM Proxy

Definition: An LLM proxy is a server that sits between your application and LLM providers, intercepts API requests, logs them, optionally modifies them (for caching, routing, etc.), forwards them to the actual provider, and returns responses to your app—all while capturing metadata for observability.

Simple Example: Instead of your app calling api.openai.com directly, it calls oai.helicone.ai which forwards the request to OpenAI, logs it to ClickHouse, and returns the response. Your app sees no difference, but every call is now visible in a dashboard.

Analogy: Think of a proxy like a security checkpoint at an airport. Every traveler (API request) passes through, gets logged (passport scanned), potentially gets screened or routed to different gates (caching, rate limiting), and continues to their destination. The checkpoint doesn't prevent travel—it adds visibility and control without changing the traveler's final destination.

Time to First Token (TTFT)

Definition: Time to first token measures the latency from when you submit a request to when the model returns its first generated token in a streaming response. This metric captures model load time, queue waiting, and the initialization phase before text generation begins.

Simple Example: You ask a streaming LLM "Summarize this 10-page document" and see the first word appear in 1.2 seconds. That's your TTFT. The remaining tokens stream over the next 8 seconds, but the user perceived responsiveness based on that initial 1.2-second delay.

Analogy: TTFT is like the time between ordering food at a restaurant and seeing your server bring the first plate. Even if the full meal takes 30 minutes, seeing something arrive quickly makes you feel attended to. A 30-second wait before the first plate would feel agonizing, even if the remaining dishes arrive quickly thereafter.

Token Cost

Definition: Token cost is the financial expense of an LLM API call, calculated by multiplying input tokens by the provider's input price-per-token and output tokens by the output price-per-token. Prices vary dramatically by model (GPT-4: $10/M input tokens, GPT-4o-mini: $0.15/M input tokens).

Simple Example: Your prompt is 500 tokens (input) and the response is 300 tokens (output). If using GPT-4o-mini at $0.15/M input and $0.60/M output, your cost is: (500 × $0.15 / 1,000,000) + (300 × $0.60 / 1,000,000) = $0.000075 + $0.000180 = $0.000255 (~$0.26 per thousand such requests).

Analogy: Token cost is like paying for data by the byte when traveling abroad. Sending a short text message (small token count) costs pennies, but streaming a video (large token count) could cost dollars. You need to track every kilobyte to avoid bill shock. Similarly, tracking every token prevents LLM cost overruns.

AI Gateway

Definition: An AI Gateway is a unified API endpoint that presents a consistent OpenAI-compatible interface but can route requests to 100+ different LLM providers (OpenAI, Anthropic, Google, AWS Bedrock, etc.) based on model name, allowing you to switch providers without changing code.

Simple Example: You use a single OpenAI client pointed at ai-gateway.helicone.ai. When you request model="gpt-4o", it routes to OpenAI. When you request model="claude-sonnet-4", it routes to Anthropic. Same client, same format, different providers.

Analogy: An AI Gateway is like an international airport hub. You book all your flights through one airline (the gateway) using one app and one loyalty program, but your actual flights might be operated by partner airlines (different LLM providers). You never interact with each individual airline—the hub handles routing, but you get seamless travel.

Making the Blueprint

Now, let's make the blueprint for adding Helicone to your LLM application. This six-step plan shows you exactly what needs to happen, in order, without any code yet. Understanding the flow first makes execution straightforward.

Step 1: Create a Helicone Account

Sign up at helicone.ai and generate your API key. The free tier includes 10,000 requests per month with full feature access—no credit card required. Your API key will look like sk-helicone-XXXXXXXXXX and acts as your authentication token for all requests.

Why this step matters: Helicone needs to know who you are to associate logged requests with your account and enforce your plan limits.

Step 2: Configure Provider API Keys in Dashboard

Navigate to the Helicone dashboard's "Provider Keys" section and add your OpenAI, Anthropic, or other LLM provider API keys. These keys stay in Helicone's secure vault—you won't expose them in your application code when using the AI Gateway approach.

Why this step matters: The AI Gateway needs your provider keys to forward requests on your behalf. Storing them in Helicone's dashboard (rather than your codebase) centralizes key management and makes rotation easier.

Step 3: Change Base URL in Your Code

In your application, modify your LLM client initialization to point at Helicone's AI Gateway (https://ai-gateway.helicone.ai) instead of the provider's URL. This is typically a single-line change: update the base_url parameter.

Why this step matters: Routing your requests through Helicone's infrastructure is what enables logging, caching, and rate limiting. The base URL change redirects your traffic through the proxy.

Step 4: Add Authentication Header

Replace your provider API key with your Helicone API key in the client initialization. When using the AI Gateway, your Helicone key serves as the primary authentication—Helicone looks up your provider keys automatically.

Why this step matters: This authenticates you to Helicone's system and tells it which account should receive the logged data.

Step 5: Make Your First API Call

Run your application and make an LLM API call exactly as you normally would. Your code's request/response logic doesn't change—the only difference is the routing path. The call flows through Helicone, gets logged, forwards to the LLM provider, and returns the response to your app.

Why this step matters: This is the moment you verify that everything works. If successful, your application functions normally and you gain observability as a side effect.

Step 6: View Dashboard Metrics

Open the Helicone dashboard and navigate to the Requests page. You'll see your API call logged with full details: timestamp, model, input tokens, output tokens, calculated cost, latency, time-to-first-token (for streaming), status code, and complete request/response bodies.

Why this step matters: This confirms that Helicone captured your data and that you now have queryable, searchable visibility into all LLM operations.

Trade-offs to consider:

Helicone offers three integration methods. The AI Gateway (recommended) adds ~1-5ms latency but provides unified multi-provider routing. Provider-specific proxies add ~50-80ms latency but let you keep provider keys local. Async logging adds zero latency but sacrifices proxy features like caching and rate limiting. For most production applications, the AI Gateway's minimal latency overhead is worthwhile for the operational simplicity.

6-Step Integration Flow

Create Account

Configure Keys

Add provider keys
to dashboard

Change Base URL

Point to
ai-gateway.helicone.ai

Add Auth

Use Helicone
API key

Make API Call

Run your app
as normal

View Dashboard

See costs, tokens,
latency metrics

Executing the Blueprint

Let's carry out the blueprint plan with real, working code you can use immediately.

Complete Code Examples

All examples from this tutorial series are available in the GitHub repository. Includes healthcare triage assistant, multi-provider routing, framework integrations (LangChain, AutoGen, CrewAI), and async logging with comprehensive documentation.

View on GitHub

Python: AI Gateway Integration (Recommended)

The AI Gateway approach is the simplest and fastest path to Helicone observability. Here's a complete before/after comparison:

from openai import OpenAI
import os

# BEFORE Helicone: standard OpenAI client
# client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# AFTER Helicone: change two lines
client = OpenAI(
    base_url="https://ai-gateway.helicone.ai",   # Point to Helicone
    api_key=os.getenv("HELICONE_API_KEY"),         # Use Helicone key
)

# Everything else stays exactly the same
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful medical assistant."},
        {"role": "user", "content": "What are the symptoms of Type 2 diabetes?"}
    ],
    max_tokens=500
)

print(response.choices[0].message.content)
# Every call is automatically logged: tokens, cost, latency, TTFT

What just happened: By changing the base_url from OpenAI's default to ai-gateway.helicone.ai and swapping your API key, every request now flows through Helicone. The AI Gateway looks up your OpenAI key (configured in Step 2), forwards the request, logs the round-trip, and returns the response. Your application code is unchanged—same parameters, same response format, same error handling.

TypeScript: AI Gateway Integration

The pattern is identical in TypeScript:

import OpenAI from "openai";

// BEFORE Helicone
// const client = new OpenAI();

// AFTER Helicone
const client = new OpenAI({
  baseURL: "https://ai-gateway.helicone.ai",
  apiKey: process.env.HELICONE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a helpful medical assistant." },
    { role: "user", content: "What are the symptoms of Type 2 diabetes?" }
  ],
  max_tokens: 500,
});

console.log(response.choices[0].message.content);
// Automatically logged: tokens, cost ($), latency (ms), TTFT, model, status

Alternative: Provider-Specific Proxy

If you prefer managing provider keys locally (never uploading them to Helicone's dashboard), use the provider-specific proxy approach:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),          # Your key stays local
    base_url="https://oai.helicone.ai/v1",         # Helicone's OpenAI proxy
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}"
    }
)

# Use exactly as before—all calls are logged
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, world!"}]
)

Key difference: Here you're passing both your OpenAI key (in api_key) and your Helicone key (in headers). Helicone never sees your OpenAI key—it's sent directly to OpenAI's API through the proxy. This adds ~50-80ms latency vs. the AI Gateway's ~1-5ms, but some organizations prefer this model for security/compliance reasons.

Three Integration Methods Compared

AI Gateway

Latency:

~1-5ms overhead

Key Management:

Provider keys in Helicone dashboard

Features:

All features available

Best For:

New projects, multi-provider routing

Provider Proxy

Latency:

~50-80ms overhead

Key Management:

Provider keys stay local

Features:

All features available

Best For:

Security/compliance requirements

Async Logging

Latency:

0ms (zero overhead)

Key Management:

Provider keys stay local

Features:

Observability only (no caching/rate limiting)

Best For:

Latency-critical applications

Multi-Provider Routing with AI Gateway

The AI Gateway's killer feature is provider-agnostic routing. Switch between OpenAI, Claude, and Gemini by changing one string:

client = OpenAI(
    base_url="https://ai-gateway.helicone.ai",
    api_key=os.getenv("HELICONE_API_KEY"),
)

# OpenAI GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain HIPAA compliance"}]
)

# Anthropic Claude—same client, same format
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Explain HIPAA compliance"}]
)

# Google Gemini—same client, same format
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{"role": "user", "content": "Explain HIPAA compliance"}]
)

All three requests use the same OpenAI-compatible Python client. Helicone's AI Gateway translates the request to each provider's format, handles authentication, logs everything uniformly, and returns results in OpenAI's response schema. No provider-specific SDKs, no format conversions, no switching between clients.

Cost tracking benefit: Because Helicone logs all three requests in a unified format, you can compare costs across providers directly in the dashboard. See instantly that Gemini Flash costs 10× less than GPT-4o for the same task.

Why Multi-Provider Routing Matters

Cost Optimization

Route to cheapest provider for each task. Gemini Flash: 10× cheaper than GPT-4o for summaries.

Vendor Independence

Never locked into one provider. Switch models without rewriting code or learning new SDKs.

Automatic Fallbacks

Part 2 covers fallback chains: try GPT-4o → if fails, fallback to Claude → if fails, use Gemini.

Easy A/B Testing

Compare GPT-4o vs Claude Sonnet on the same prompts. See quality + cost differences side-by-side.

Provider	Helicone Proxy URL	Notes
OpenAI	`oai.helicone.ai/v1`	Dedicated subdomain
Anthropic	`anthropic.helicone.ai`	Dedicated subdomain
Azure OpenAI	`gateway.helicone.ai`	Uses Helicone-Target-Url header
Google Gemini	`gateway.helicone.ai`	Uses Helicone-Target-Url header
Together AI	`together.helicone.ai`	Dedicated subdomain
Groq	`groq.helicone.ai`	Dedicated subdomain
DeepSeek	`deepseek.helicone.ai`	Dedicated subdomain
AWS Bedrock	`bedrock.helicone.ai`	Dedicated subdomain
Any other	`gateway.helicone.ai`	Universal gateway with Helicone-Target-Url

Real-World Example: Healthcare AI Triage Assistant

Here's a complete example that demonstrates Helicone's value in a production scenario. This healthcare triage assistant classifies patient symptoms and uses Helicone headers to enable department-level cost analytics, per-patient tracking, and prompt versioning:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://ai-gateway.helicone.ai",
    api_key=os.getenv("HELICONE_API_KEY"),
)

def triage_patient(patient_id: str, symptoms: str, department: str) -> str:
    """Classify patient symptoms with full Helicone observability."""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a medical triage assistant. Classify urgency as: "
                    "EMERGENCY, URGENT, STANDARD, or LOW-PRIORITY. Give rationale."
                ),
            },
            {"role": "user", "content": f"Patient symptoms: {symptoms}"},
        ],
        max_tokens=200,
        temperature=0.1,  # Low temp for consistency
        extra_headers={
            "Helicone-User-Id": patient_id,                    # Per-patient analytics
            "Helicone-Property-Department": department,          # Filter by department
            "Helicone-Property-App": "triage-assistant",        # App-wide tagging
            "Helicone-Property-Environment": "production",      # Track by environment
            "Helicone-Prompt-Id": "triage-classifier-v1",       # Prompt versioning
        },
    )

    return response.choices[0].message.content

# Usage
result = triage_patient(
    patient_id="patient-7829",
    symptoms="Severe chest pain, shortness of breath, radiating to left arm",
    department="cardiology"
)
print(result)
# Output: "EMERGENCY — Symptoms consistent with acute coronary syndrome..."

What this unlocks in the Helicone dashboard:

Per-department costs: Filter by Property: Department = cardiology to see total cardiology LLM spend
Per-patient history: Filter by User: patient-7829 to view all triage requests for this patient
Prompt versioning: Filter by Prompt-Id: triage-classifier-v1 to analyze this specific prompt's performance and costs over time
Environment tracking: Separate production from staging costs

This example uses just five Helicone headers to transform a basic LLM call into a fully instrumented, production-ready operation. Check out the complete code with error handling and additional examples in the GitHub repository.

Your Helicone Dashboard at a Glance

Requests

Every API call logged with full context

                                    Model: gpt-4o-mini

                                    Tokens: 150 in / 89 out

                                    Cost: $0.004

                                    Latency: 1,230ms

                                    TTFT: 340ms

Cost Analytics

Track spending across models and users

$247

This Month

12.4K

Requests

User Analytics

Per-user costs and consumption patterns

user-7829 $12.40

user-4521 $8.20

user-9103 $5.60

Powerful Filters

Query by any dimension with HQL

Filter by model, status, date...

Custom property filtering

HQL for complex queries

Session Tracing

Visualize multi-step agent workflows

└─ /triage
├─ /triage/intake
├─ /triage/analysis
└─ /triage/report

Alerts

Get notified before problems escalate

Cost threshold: $500/day

Error rate: >5%

Latency spike: >3s avg

All this data is automatically captured from your 2-line code change

What's Next in Part 2

With observability in place, Part 2 transforms Helicone from a logging tool into a production control plane. We'll cover:

Sessions and tracing: Visualize multi-agent workflows as hierarchical trees. Track a 47-step agent workflow and see exactly which agent consumed your budget.
Intelligent caching: Reduce LLM costs by 20-30% by caching responses. Works for identical requests or semantically similar ones (bucket caching).
Rate limiting: Enforce per-user cost budgets ($5/day per user), request quotas (1000 requests/hour), or cost-based limits (500 cents/hour). Prevent runaway costs.
Retries and fallbacks: Automatically retry failed requests with exponential backoff, or fall back to cheaper providers (try GPT-4o, fall back to Claude if it fails).
Prompt management: Store prompts in Helicone's Playground, version them, and deploy updates without redeploying code.

Every feature is configured through HTTP headers—no SDK changes required. See you in Part 2!

Start Building Today

Clone the repository, add your Helicone API key, and run any example in under 60 seconds.

View GitHub Repository Open Helicone Dashboard