Why AI Observability Matters
Traditional application observability answers questions like: did the request succeed, how long did it take, which service was the bottleneck? AI applications add four new categories of questions that standard APM tooling cannot answer without domain-specific instrumentation.
Token cost attribution. LLM calls are priced per token. A single application may expose a dozen features backed by AI — chat, summarization, classification, embedding generation. Without token-level observability, you cannot tell which feature is responsible for your monthly bill. You need per-feature, per-user-segment cost attribution before you can optimize.
Latency drift detection. Model providers update their models without notice. A model upgrade can shift median response time by hundreds of milliseconds in either direction. Without a historical latency baseline per model version, regressions are invisible until users complain.
Hallucination detection via output monitoring. You cannot directly detect hallucinations programmatically, but you can build proxy metrics: output length variance, confidence score distributions, and downstream validation failure rates. Sudden changes in these metrics indicate model behavior shifts worth investigating.
Quota attribution per user or tenant. In multi-tenant applications, a single tenant hitting your service aggressively can exhaust your OpenAI token quota and degrade service for every other tenant. Observability that breaks down token consumption by tenant enables you to enforce fair-use policies before quota is exhausted.
OpenTelemetry provides the standard, vendor-neutral instrumentation layer to capture all four categories. Semantic Kernel is designed to emit this data with minimal configuration.
OpenTelemetry GenAI Semantic Conventions
The OpenTelemetry project defines GenAI semantic conventions — a standard schema for span attributes that describe AI operations. These conventions ensure that dashboards, APM tools, and custom queries can interpret AI telemetry from any compliant SDK without bespoke parsing.
The key attributes for .NET AI developers are:
| Attribute | Type | Description |
|---|---|---|
gen_ai.system | string | The AI provider — openai, azure_openai, anthropic |
gen_ai.request.model | string | The model or deployment name requested |
gen_ai.usage.input_tokens | int | Tokens consumed by the prompt |
gen_ai.usage.output_tokens | int | Tokens in the completion response |
gen_ai.response.finish_reason | string | Why generation stopped — stop, length, content_filter, tool_calls |
Semantic Kernel follows these conventions automatically for every AI call it orchestrates. When you query your traces by gen_ai.system = "azure_openai", you get a filtered view of all Azure OpenAI calls across your entire application — regardless of which plugin or function triggered them.
This convention-first approach is what makes AI observability portable. The same dashboard configuration works for a Semantic Kernel app today and a raw MEAI IChatClient app tomorrow — as long as both emit the standard attributes.
Enabling Semantic Kernel Telemetry
The Semantic Kernel architecture separates AI orchestration into a layered pipeline — plugins, planners, filters, and AI service connectors. Each layer emits telemetry when instrumentation is enabled. The switch that unlocks full telemetry including prompt content must be set before the Kernel is built.
// MUST be called before building the Kernel
AppContext.SetSwitch(
"Microsoft.SemanticKernel.Experimental.GenAI.EnableOTelDiagnosticsSensitive",
true);
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Microsoft.SemanticKernel*")
.AddOtlpExporter())
.WithMetrics(metrics => metrics
.AddMeter("Microsoft.SemanticKernel*")
.AddOtlpExporter());
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(
deploymentName: builder.Configuration["AzureOpenAI:Deployment"]!,
endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);
The wildcard Microsoft.SemanticKernel* as an ActivitySource name and Meter name catches all activity sources and meters registered by SK — the kernel itself, individual connectors (Azure OpenAI, OpenAI), and any SK plugins that emit their own spans. You do not need to enumerate them individually.
What you get from this configuration out of the box:
- A trace span for every
InvokePromptAsyncorInvokeFunctionAsynccall - Child spans for each AI service call, with
gen_ai.*attributes - Metrics: request count, latency histograms, token usage counters — all tagged by model and operation
The Experimental in the switch name signals that the attribute schema may evolve as the OpenTelemetry GenAI conventions mature. In practice, the attributes have been stable across SK 1.x releases.
Tracing IChatClient Calls from MEAI
Microsoft Extensions for AI (MEAI) provides the IChatClient abstraction that works across multiple AI providers — Azure OpenAI, Ollama, OpenAI, and any provider implementing the interface. When you use IChatClient directly (outside of Semantic Kernel), you add your own ActivitySource to produce compliant spans.
using System.Diagnostics;
using Microsoft.Extensions.AI;
public class ObservableChatService
{
private static readonly ActivitySource _source = new("MyApp.AI", "1.0.0");
private readonly IChatClient _client;
public ObservableChatService(IChatClient client) => _client = client;
public async Task<string> ChatAsync(string userMessage, CancellationToken ct = default)
{
using var activity = _source.StartActivity("chat.completion", ActivityKind.Client);
activity?.SetTag("gen_ai.request.model", "gpt-4o");
activity?.SetTag("user.message.length", userMessage.Length);
var messages = new List<ChatMessage>
{
new(ChatRole.User, userMessage)
};
var result = await _client.CompleteAsync(messages, cancellationToken: ct);
var text = result.Message.Text ?? string.Empty;
activity?.SetTag("gen_ai.usage.input_tokens", result.Usage?.InputTokenCount ?? 0);
activity?.SetTag("gen_ai.usage.output_tokens", result.Usage?.OutputTokenCount ?? 0);
activity?.SetStatus(ActivityStatusCode.Ok);
return text;
}
}
Register the custom activity source in your OpenTelemetry configuration so the SDK picks it up:
builder.Services.AddOpenTelemetry()
.WithTracing(t => t.AddSource("MyApp.AI"));
The pattern here — start a span, set gen_ai.* tags before the call, read token counts from the response and set them after — matches what Semantic Kernel does internally. Using the same attribute names means your Jaeger queries and Application Insights workbooks work identically whether the call came from SK or raw IChatClient.
Use ActivityKind.Client for calls to external AI services. This maps correctly in distributed trace views — the span appears as an outbound call to an external dependency, not as an internal service operation.
Custom Metrics: Token Usage and Cost
Span attributes capture per-request data. Metrics aggregate that data over time — total tokens consumed per hour, average cost per chat session, p99 latency by model. Both are necessary for a complete observability picture.
Use System.Diagnostics.Metrics (the .NET runtime metrics API, which OpenTelemetry bridges automatically):
using System.Diagnostics.Metrics;
public class AiMetrics
{
private static readonly Meter _meter = new("MyApp.AI.Metrics", "1.0.0");
private static readonly Counter<long> _inputTokens =
_meter.CreateCounter<long>("ai.tokens.input", "tokens", "Input tokens consumed");
private static readonly Counter<long> _outputTokens =
_meter.CreateCounter<long>("ai.tokens.output", "tokens", "Output tokens generated");
private static readonly Histogram<double> _requestCost =
_meter.CreateHistogram<double>("ai.request.cost", "USD", "Estimated cost per request");
public void RecordUsage(string model, int inputTokens, int outputTokens)
{
var tags = new TagList { { "gen_ai.request.model", model } };
_inputTokens.Add(inputTokens, tags);
_outputTokens.Add(outputTokens, tags);
// GPT-4o pricing: $5/1M input, $15/1M output (as of early 2026)
var cost = (inputTokens / 1_000_000.0 * 5.0) + (outputTokens / 1_000_000.0 * 15.0);
_requestCost.Record(cost, tags);
}
}
Register the meter so OpenTelemetry exports it:
builder.Services.AddOpenTelemetry()
.WithMetrics(m => m.AddMeter("MyApp.AI.Metrics"));
Inject AiMetrics as a singleton and call RecordUsage after each completion. Tag metrics with gen_ai.request.model so dashboards can break down consumption by model. Add additional tags for feature name (feature.name), user segment (user.tier), or tenant identifier (tenant.id) to enable the cost attribution use case described in the opening section.
A Counter<long> for tokens is append-only and efficient — the right shape for an ever-growing cumulative count. The Histogram<double> for cost captures the distribution, letting you see not just total spend but the per-request cost distribution — useful for identifying outlier requests that consumed unexpectedly large token budgets.
Setting Up .NET Aspire Dashboard
.NET Aspire provides a development-time orchestration and observability dashboard that receives OpenTelemetry data over OTLP with zero additional configuration. For local development, it is the fastest path from code to visible traces and metrics.
In your AppHost project:
// In your AppHost project (Program.cs)
var builder = DistributedApplication.CreateBuilder(args);
var api = builder.AddProject<Projects.MyAIApi>("api");
// Aspire automatically wires OTLP → dashboard
builder.Build().Run();
In your API project, a single call configures everything:
// In your API project (Program.cs)
builder.AddServiceDefaults(); // Configures OTel + OTLP exporter pointing to Aspire dashboard
// All SK traces and your custom metrics appear in the Aspire dashboard at https://localhost:18888
AddServiceDefaults() is the Aspire convention that wires up OpenTelemetry tracing and metrics with the OTLP exporter pre-configured to send to the Aspire dashboard endpoint. It also adds health checks, service discovery, and resilience defaults.
The Aspire dashboard at https://localhost:18888 gives you:
- A live trace view with parent-child span relationships rendered visually
- A structured log view correlated with traces
- A metrics explorer with time-series charts for any metric your app emits
Because the Aspire dashboard accepts standard OTLP, every SK span and every custom AiMetrics metric appears automatically — no dashboard configuration required.
Exporting to Jaeger and Application Insights
Jaeger for Local Development Without Aspire
If you are not using .NET Aspire orchestration, Jaeger provides a full-featured distributed tracing backend that runs in a single Docker container:
docker run -d -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one
Configure the OTLP exporter to point at Jaeger’s OTLP gRPC endpoint:
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Microsoft.SemanticKernel*")
.AddSource("MyApp.AI")
.AddOtlpExporter(otlp =>
{
otlp.Endpoint = new Uri("http://localhost:4317"); // Jaeger OTLP endpoint
}));
The Jaeger UI at http://localhost:16686 lets you search traces by service, operation name, duration range, and tag values. Filtering by gen_ai.request.model = gpt-4o immediately narrows the view to AI calls made to that model.
Application Insights for Production
For production deployments, Application Insights is the natural choice when your workload runs in Azure. The Azure Monitor OpenTelemetry distro handles the export configuration:
// Add package: Azure.Monitor.OpenTelemetry.AspNetCore
builder.Services.AddOpenTelemetry()
.UseAzureMonitor(options =>
{
options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
})
.WithTracing(tracing => tracing
.AddSource("Microsoft.SemanticKernel*")
.AddSource("MyApp.AI"));
UseAzureMonitor() registers the Application Insights exporter for traces, metrics, and logs simultaneously. All gen_ai.* attributes are preserved in Application Insights as custom dimensions, making them queryable via KQL:
dependencies
| where customDimensions["gen_ai.system"] == "azure_openai"
| summarize avg(duration), sum(toint(customDimensions["gen_ai.usage.input_tokens"])) by bin(timestamp, 1h)
This query gives you hourly average latency and total input tokens consumed against Azure OpenAI — the core of a cost and performance dashboard.
Building a Prompt Quality Monitoring Pipeline with PII Masking
Enabling EnableOTelDiagnosticsSensitive exports prompt content as span attributes, which is valuable for debugging but a serious risk in production — the telemetry backend becomes a secondary store of potentially sensitive user data.
The correct architecture for production is to keep sensitive diagnostics enabled (for span structure and token counts) but intercept spans before export to redact prompt content. OpenTelemetry’s BaseProcessor<Activity> hook provides exactly this interception point.
When managing PII in your AI pipelines, the approach mirrors what is covered in securing AI applications in .NET — the same principle of stripping sensitive data at the boundary applies whether that boundary is an LLM API call or a telemetry export.
using OpenTelemetry;
using System.Diagnostics;
public class PiiMaskingProcessor : BaseProcessor<Activity>
{
private static readonly string[] _sensitiveTagKeys =
[
"gen_ai.prompt",
"gen_ai.completion",
"gen_ai.system.message"
];
public override void OnEnd(Activity activity)
{
foreach (var tag in _sensitiveTagKeys)
{
if (activity.GetTagItem(tag) is not null)
{
activity.SetTag(tag, "[REDACTED]");
}
}
}
}
Register the processor in the tracing pipeline:
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Microsoft.SemanticKernel*")
.AddSource("MyApp.AI")
.AddProcessor(new PiiMaskingProcessor())
.AddOtlpExporter());
The processor runs in the export pipeline: OnEnd is called after the span is completed but before it is serialized and sent to the exporter. Replacing the tag value with [REDACTED] means the span still appears in traces — with its timing, parent-child relationships, and token count attributes intact — but prompt and completion text never reaches the telemetry backend.
For environments where you need some prompt visibility for debugging, consider a tiered approach: redact fully in production, keep sensitive data for spans sampled into a restricted-access debug trace store in staging.
Extending the Processor for Custom PII Patterns
The _sensitiveTagKeys list above targets the three standard SK-emitted sensitive attributes. Extend it based on your application’s attribute schema. If you add custom tags like user.query or document.content, include them in the redaction list:
private static readonly string[] _sensitiveTagKeys =
[
"gen_ai.prompt",
"gen_ai.completion",
"gen_ai.system.message",
"user.query", // custom attribute added by ObservableChatService
"document.content" // custom attribute from RAG pipeline
];
Combining Sampling and Redaction
High-volume AI services emit a large number of spans. Configure a sampling strategy alongside redaction to control both storage cost and data exposure:
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.SetSampler(new TraceIdRatioBasedSampler(0.1)) // sample 10% of traces
.AddSource("Microsoft.SemanticKernel*")
.AddSource("MyApp.AI")
.AddProcessor(new PiiMaskingProcessor())
.AddOtlpExporter());
A 10% sample rate dramatically reduces storage cost while retaining statistical validity for latency and error rate analysis. For debugging specific failures, use AlwaysOnSampler in a non-production environment or implement a tail-based sampler that captures 100% of error spans regardless of the configured ratio.
Putting It All Together: A Complete Observability Configuration
Here is a complete, production-ready observability setup combining SK telemetry, custom metrics, PII masking, and dual export targets (Aspire for local, Application Insights for production):
using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;
// MUST be before Kernel is built
AppContext.SetSwitch(
"Microsoft.SemanticKernel.Experimental.GenAI.EnableOTelDiagnosticsSensitive",
true);
var builder = WebApplication.CreateBuilder(args);
var isProduction = builder.Environment.IsProduction();
var otelBuilder = builder.Services.AddOpenTelemetry()
.WithTracing(tracing =>
{
tracing
.AddSource("Microsoft.SemanticKernel*")
.AddSource("MyApp.AI")
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddProcessor(new PiiMaskingProcessor());
if (isProduction)
{
// Application Insights for production
}
else
{
tracing.AddOtlpExporter(); // Aspire dashboard or Jaeger for local dev
}
})
.WithMetrics(metrics =>
{
metrics
.AddMeter("Microsoft.SemanticKernel*")
.AddMeter("MyApp.AI.Metrics")
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation();
if (isProduction)
{
// Application Insights for production
}
else
{
metrics.AddOtlpExporter();
}
});
if (isProduction)
{
otelBuilder.UseAzureMonitor(options =>
{
options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
});
}
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(
deploymentName: builder.Configuration["AzureOpenAI:Deployment"]!,
endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);
builder.Services.AddSingleton<AiMetrics>();
builder.Services.AddScoped<ObservableChatService>();
var app = builder.Build();
app.MapDefaultEndpoints(); // Aspire health check endpoints
app.Run();
This single configuration handles the complete observability lifecycle: spans from SK and custom sources flow through the PII masking processor, then to the correct backend depending on environment. Custom metrics from AiMetrics are exported on the same channel.
Further Reading
- OpenTelemetry .NET docs
- OpenTelemetry GenAI Semantic Conventions
- .NET Aspire Telemetry documentation
- Azure Monitor OpenTelemetry distro
For a complete background service implementation with telemetry, see Build AI Background Services in .NET.