OpenTelemetry for AI Apps in .NET: Observability Guide

Intermediate Original .NET 9 Microsoft.SemanticKernel 1.54.0 OpenTelemetry.Extensions.Hosting 1.9.0 OpenTelemetry.Exporter.OpenTelemetryProtocol 1.9.0

By Rajesh Mishra · Mar 21, 2026 · 14 min read

Verified Mar 2026 .NET 9 Microsoft.SemanticKernel 1.54.0

In 30 Seconds

OpenTelemetry is the standard for AI observability in .NET. Semantic Kernel emits gen_ai.* compliant spans and metrics natively. Enable them with a one-line AppContext switch and wire to .NET Aspire dashboard (local dev) or Application Insights (production) via the OTLP exporter. Custom ActivitySource spans and Counter<long> / Histogram<double> metrics give you token cost tracking and prompt quality monitoring on top of the built-in telemetry.

Why AI Observability Matters

Traditional application observability answers questions like: did the request succeed, how long did it take, which service was the bottleneck? AI applications add four new categories of questions that standard APM tooling cannot answer without domain-specific instrumentation.

Token cost attribution. LLM calls are priced per token. A single application may expose a dozen features backed by AI — chat, summarization, classification, embedding generation. Without token-level observability, you cannot tell which feature is responsible for your monthly bill. You need per-feature, per-user-segment cost attribution before you can optimize.

Latency drift detection. Model providers update their models without notice. A model upgrade can shift median response time by hundreds of milliseconds in either direction. Without a historical latency baseline per model version, regressions are invisible until users complain.

Hallucination detection via output monitoring. You cannot directly detect hallucinations programmatically, but you can build proxy metrics: output length variance, confidence score distributions, and downstream validation failure rates. Sudden changes in these metrics indicate model behavior shifts worth investigating.

Quota attribution per user or tenant. In multi-tenant applications, a single tenant hitting your service aggressively can exhaust your OpenAI token quota and degrade service for every other tenant. Observability that breaks down token consumption by tenant enables you to enforce fair-use policies before quota is exhausted.

OpenTelemetry provides the standard, vendor-neutral instrumentation layer to capture all four categories. Semantic Kernel is designed to emit this data with minimal configuration.

OpenTelemetry GenAI Semantic Conventions

The OpenTelemetry project defines GenAI semantic conventions — a standard schema for span attributes that describe AI operations. These conventions ensure that dashboards, APM tools, and custom queries can interpret AI telemetry from any compliant SDK without bespoke parsing.

The key attributes for .NET AI developers are:

Attribute	Type	Description
`gen_ai.system`	string	The AI provider — `openai`, `azure_openai`, `anthropic`
`gen_ai.request.model`	string	The model or deployment name requested
`gen_ai.usage.input_tokens`	int	Tokens consumed by the prompt
`gen_ai.usage.output_tokens`	int	Tokens in the completion response
`gen_ai.response.finish_reason`	string	Why generation stopped — `stop`, `length`, `content_filter`, `tool_calls`

Semantic Kernel follows these conventions automatically for every AI call it orchestrates. When you query your traces by gen_ai.system = "azure_openai", you get a filtered view of all Azure OpenAI calls across your entire application — regardless of which plugin or function triggered them.

This convention-first approach is what makes AI observability portable. The same dashboard configuration works for a Semantic Kernel app today and a raw MEAI IChatClient app tomorrow — as long as both emit the standard attributes.

Enabling Semantic Kernel Telemetry

The Semantic Kernel architecture separates AI orchestration into a layered pipeline — plugins, planners, filters, and AI service connectors. Each layer emits telemetry when instrumentation is enabled. The switch that unlocks full telemetry including prompt content must be set before the Kernel is built.

// MUST be called before building the Kernel
AppContext.SetSwitch(
    "Microsoft.SemanticKernel.Experimental.GenAI.EnableOTelDiagnosticsSensitive",
    true);

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.SemanticKernel*")
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("Microsoft.SemanticKernel*")
        .AddOtlpExporter());

builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(
        deploymentName: builder.Configuration["AzureOpenAI:Deployment"]!,
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);

The wildcard Microsoft.SemanticKernel* as an ActivitySource name and Meter name catches all activity sources and meters registered by SK — the kernel itself, individual connectors (Azure OpenAI, OpenAI), and any SK plugins that emit their own spans. You do not need to enumerate them individually.

What you get from this configuration out of the box:

A trace span for every InvokePromptAsync or InvokeFunctionAsync call
Child spans for each AI service call, with gen_ai.* attributes
Metrics: request count, latency histograms, token usage counters — all tagged by model and operation

The Experimental in the switch name signals that the attribute schema may evolve as the OpenTelemetry GenAI conventions mature. In practice, the attributes have been stable across SK 1.x releases.

Tracing IChatClient Calls from MEAI

Microsoft Extensions for AI (MEAI) provides the IChatClient abstraction that works across multiple AI providers — Azure OpenAI, Ollama, OpenAI, and any provider implementing the interface. When you use IChatClient directly (outside of Semantic Kernel), you add your own ActivitySource to produce compliant spans.

using System.Diagnostics;
using Microsoft.Extensions.AI;

public class ObservableChatService
{
    private static readonly ActivitySource _source = new("MyApp.AI", "1.0.0");
    private readonly IChatClient _client;

    public ObservableChatService(IChatClient client) => _client = client;

    public async Task<string> ChatAsync(string userMessage, CancellationToken ct = default)
    {
        using var activity = _source.StartActivity("chat.completion", ActivityKind.Client);
        activity?.SetTag("gen_ai.request.model", "gpt-4o");
        activity?.SetTag("user.message.length", userMessage.Length);

        var messages = new List<ChatMessage>
        {
            new(ChatRole.User, userMessage)
        };

        var result = await _client.CompleteAsync(messages, cancellationToken: ct);
        var text = result.Message.Text ?? string.Empty;

        activity?.SetTag("gen_ai.usage.input_tokens", result.Usage?.InputTokenCount ?? 0);
        activity?.SetTag("gen_ai.usage.output_tokens", result.Usage?.OutputTokenCount ?? 0);
        activity?.SetStatus(ActivityStatusCode.Ok);

        return text;
    }
}

builder.Services.AddOpenTelemetry()
    .WithTracing(t => t.AddSource("MyApp.AI"));

The pattern here — start a span, set gen_ai.* tags before the call, read token counts from the response and set them after — matches what Semantic Kernel does internally. Using the same attribute names means your Jaeger queries and Application Insights workbooks work identically whether the call came from SK or raw IChatClient.

Use ActivityKind.Client for calls to external AI services. This maps correctly in distributed trace views — the span appears as an outbound call to an external dependency, not as an internal service operation.

Custom Metrics: Token Usage and Cost

Span attributes capture per-request data. Metrics aggregate that data over time — total tokens consumed per hour, average cost per chat session, p99 latency by model. Both are necessary for a complete observability picture.

Use System.Diagnostics.Metrics (the .NET runtime metrics API, which OpenTelemetry bridges automatically):

using System.Diagnostics.Metrics;

public class AiMetrics
{
    private static readonly Meter _meter = new("MyApp.AI.Metrics", "1.0.0");

    private static readonly Counter<long> _inputTokens =
        _meter.CreateCounter<long>("ai.tokens.input", "tokens", "Input tokens consumed");

    private static readonly Counter<long> _outputTokens =
        _meter.CreateCounter<long>("ai.tokens.output", "tokens", "Output tokens generated");

    private static readonly Histogram<double> _requestCost =
        _meter.CreateHistogram<double>("ai.request.cost", "USD", "Estimated cost per request");

    public void RecordUsage(string model, int inputTokens, int outputTokens)
    {
        var tags = new TagList { { "gen_ai.request.model", model } };
        _inputTokens.Add(inputTokens, tags);
        _outputTokens.Add(outputTokens, tags);

        // GPT-4o pricing: $5/1M input, $15/1M output (as of early 2026)
        var cost = (inputTokens / 1_000_000.0 * 5.0) + (outputTokens / 1_000_000.0 * 15.0);
        _requestCost.Record(cost, tags);
    }
}

builder.Services.AddOpenTelemetry()
    .WithMetrics(m => m.AddMeter("MyApp.AI.Metrics"));

Inject AiMetrics as a singleton and call RecordUsage after each completion. Tag metrics with gen_ai.request.model so dashboards can break down consumption by model. Add additional tags for feature name (feature.name), user segment (user.tier), or tenant identifier (tenant.id) to enable the cost attribution use case described in the opening section.

A Counter<long> for tokens is append-only and efficient — the right shape for an ever-growing cumulative count. The Histogram<double> for cost captures the distribution, letting you see not just total spend but the per-request cost distribution — useful for identifying outlier requests that consumed unexpectedly large token budgets.

Setting Up .NET Aspire Dashboard

.NET Aspire provides a development-time orchestration and observability dashboard that receives OpenTelemetry data over OTLP with zero additional configuration. For local development, it is the fastest path from code to visible traces and metrics.

In your AppHost project:

// In your AppHost project (Program.cs)
var builder = DistributedApplication.CreateBuilder(args);

var api = builder.AddProject<Projects.MyAIApi>("api");
// Aspire automatically wires OTLP → dashboard

builder.Build().Run();

In your API project, a single call configures everything:

// In your API project (Program.cs)
builder.AddServiceDefaults(); // Configures OTel + OTLP exporter pointing to Aspire dashboard

// All SK traces and your custom metrics appear in the Aspire dashboard at https://localhost:18888

AddServiceDefaults() is the Aspire convention that wires up OpenTelemetry tracing and metrics with the OTLP exporter pre-configured to send to the Aspire dashboard endpoint. It also adds health checks, service discovery, and resilience defaults.

The Aspire dashboard at https://localhost:18888 gives you:

A live trace view with parent-child span relationships rendered visually
A structured log view correlated with traces
A metrics explorer with time-series charts for any metric your app emits

Because the Aspire dashboard accepts standard OTLP, every SK span and every custom AiMetrics metric appears automatically — no dashboard configuration required.

Exporting to Jaeger and Application Insights

Jaeger for Local Development Without Aspire

If you are not using .NET Aspire orchestration, Jaeger provides a full-featured distributed tracing backend that runs in a single Docker container:

docker run -d -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one

Configure the OTLP exporter to point at Jaeger’s OTLP gRPC endpoint:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.SemanticKernel*")
        .AddSource("MyApp.AI")
        .AddOtlpExporter(otlp =>
        {
            otlp.Endpoint = new Uri("http://localhost:4317"); // Jaeger OTLP endpoint
        }));

The Jaeger UI at http://localhost:16686 lets you search traces by service, operation name, duration range, and tag values. Filtering by gen_ai.request.model = gpt-4o immediately narrows the view to AI calls made to that model.

Application Insights for Production

For production deployments, Application Insights is the natural choice when your workload runs in Azure. The Azure Monitor OpenTelemetry distro handles the export configuration:

// Add package: Azure.Monitor.OpenTelemetry.AspNetCore
builder.Services.AddOpenTelemetry()
    .UseAzureMonitor(options =>
    {
        options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
    })
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.SemanticKernel*")
        .AddSource("MyApp.AI"));

UseAzureMonitor() registers the Application Insights exporter for traces, metrics, and logs simultaneously. All gen_ai.* attributes are preserved in Application Insights as custom dimensions, making them queryable via KQL:

dependencies
| where customDimensions["gen_ai.system"] == "azure_openai"
| summarize avg(duration), sum(toint(customDimensions["gen_ai.usage.input_tokens"])) by bin(timestamp, 1h)

This query gives you hourly average latency and total input tokens consumed against Azure OpenAI — the core of a cost and performance dashboard.

Building a Prompt Quality Monitoring Pipeline with PII Masking

Enabling EnableOTelDiagnosticsSensitive exports prompt content as span attributes, which is valuable for debugging but a serious risk in production — the telemetry backend becomes a secondary store of potentially sensitive user data.

The correct architecture for production is to keep sensitive diagnostics enabled (for span structure and token counts) but intercept spans before export to redact prompt content. OpenTelemetry’s BaseProcessor<Activity> hook provides exactly this interception point.

When managing PII in your AI pipelines, the approach mirrors what is covered in securing AI applications in .NET — the same principle of stripping sensitive data at the boundary applies whether that boundary is an LLM API call or a telemetry export.

using OpenTelemetry;
using System.Diagnostics;

public class PiiMaskingProcessor : BaseProcessor<Activity>
{
    private static readonly string[] _sensitiveTagKeys =
    [
        "gen_ai.prompt",
        "gen_ai.completion",
        "gen_ai.system.message"
    ];

    public override void OnEnd(Activity activity)
    {
        foreach (var tag in _sensitiveTagKeys)
        {
            if (activity.GetTagItem(tag) is not null)
            {
                activity.SetTag(tag, "[REDACTED]");
            }
        }
    }
}

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.SemanticKernel*")
        .AddSource("MyApp.AI")
        .AddProcessor(new PiiMaskingProcessor())
        .AddOtlpExporter());

The processor runs in the export pipeline: OnEnd is called after the span is completed but before it is serialized and sent to the exporter. Replacing the tag value with [REDACTED] means the span still appears in traces — with its timing, parent-child relationships, and token count attributes intact — but prompt and completion text never reaches the telemetry backend.

For environments where you need some prompt visibility for debugging, consider a tiered approach: redact fully in production, keep sensitive data for spans sampled into a restricted-access debug trace store in staging.

Extending the Processor for Custom PII Patterns

The _sensitiveTagKeys list above targets the three standard SK-emitted sensitive attributes. Extend it based on your application’s attribute schema. If you add custom tags like user.query or document.content, include them in the redaction list:

private static readonly string[] _sensitiveTagKeys =
[
    "gen_ai.prompt",
    "gen_ai.completion",
    "gen_ai.system.message",
    "user.query",           // custom attribute added by ObservableChatService
    "document.content"     // custom attribute from RAG pipeline
];

Combining Sampling and Redaction

High-volume AI services emit a large number of spans. Configure a sampling strategy alongside redaction to control both storage cost and data exposure:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .SetSampler(new TraceIdRatioBasedSampler(0.1)) // sample 10% of traces
        .AddSource("Microsoft.SemanticKernel*")
        .AddSource("MyApp.AI")
        .AddProcessor(new PiiMaskingProcessor())
        .AddOtlpExporter());

A 10% sample rate dramatically reduces storage cost while retaining statistical validity for latency and error rate analysis. For debugging specific failures, use AlwaysOnSampler in a non-production environment or implement a tail-based sampler that captures 100% of error spans regardless of the configured ratio.

Putting It All Together: A Complete Observability Configuration

Here is a complete, production-ready observability setup combining SK telemetry, custom metrics, PII masking, and dual export targets (Aspire for local, Application Insights for production):

using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;

// MUST be before Kernel is built
AppContext.SetSwitch(
    "Microsoft.SemanticKernel.Experimental.GenAI.EnableOTelDiagnosticsSensitive",
    true);

var builder = WebApplication.CreateBuilder(args);
var isProduction = builder.Environment.IsProduction();

var otelBuilder = builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing
            .AddSource("Microsoft.SemanticKernel*")
            .AddSource("MyApp.AI")
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddProcessor(new PiiMaskingProcessor());

        if (isProduction)
        {
            // Application Insights for production
        }
        else
        {
            tracing.AddOtlpExporter(); // Aspire dashboard or Jaeger for local dev
        }
    })
    .WithMetrics(metrics =>
    {
        metrics
            .AddMeter("Microsoft.SemanticKernel*")
            .AddMeter("MyApp.AI.Metrics")
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation();

        if (isProduction)
        {
            // Application Insights for production
        }
        else
        {
            metrics.AddOtlpExporter();
        }
    });

if (isProduction)
{
    otelBuilder.UseAzureMonitor(options =>
    {
        options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
    });
}

builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(
        deploymentName: builder.Configuration["AzureOpenAI:Deployment"]!,
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!);

builder.Services.AddSingleton<AiMetrics>();
builder.Services.AddScoped<ObservableChatService>();

var app = builder.Build();
app.MapDefaultEndpoints(); // Aspire health check endpoints
app.Run();

This single configuration handles the complete observability lifecycle: spans from SK and custom sources flow through the PII masking processor, then to the correct backend depending on environment. Custom metrics from AiMetrics are exported on the same channel.

⚠ Production Considerations

Enabling sensitive diagnostics (EnableOTelDiagnosticsSensitive) exports prompt content and responses as span attributes. In production, always add a PII-masking processor or disable sensitive diagnostics and track only token counts and latency.
The OTLP exporter is buffered and async. On abrupt process shutdown (SIGKILL), the last few spans may be lost. Add a graceful shutdown hook or use the batch exporter with a short export interval.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

🧠 Architect’s Note

Treat AI telemetry as a first-class concern from day one. Token cost histograms per feature flag or user segment reveal exactly which AI features are expensive — giving product teams actionable data to optimize before your monthly bill arrives.

AI-Friendly Summary

Summary

Key Takeaways

Enable SK telemetry with AppContext.SetSwitch before building the Kernel
SK follows OpenTelemetry GenAI semantic conventions (gen_ai.*) out of the box
.NET Aspire dashboard receives all AI telemetry automatically via AddServiceDefaults()
Custom metrics (token cost histograms, latency counters) are added via System.Diagnostics.Metrics
A custom BaseProcessor<Activity> can mask PII before spans are exported to production backends

Implementation Checklist

Add OpenTelemetry NuGet packages to your project
Call AppContext.SetSwitch to enable SK diagnostic telemetry
Configure OTLP exporter pointing to Aspire dashboard or Jaeger for local dev
Add Azure.Monitor.OpenTelemetry.AspNetCore for Application Insights in production
Implement custom Counter<long> for token usage tracking
Add a PII-masking BaseProcessor<Activity> before exporting sensitive prompt data

Frequently Asked Questions

Does Semantic Kernel support OpenTelemetry natively?

Yes. Semantic Kernel emits OpenTelemetry traces and metrics out of the box. Enable sensitive data (prompt content) by setting AppContext.SetSwitch('Microsoft.SemanticKernel.Experimental.GenAI.EnableOTelDiagnosticsSensitive', true) before building the kernel.

What are gen_ai.* semantic conventions?

The OpenTelemetry GenAI semantic conventions define standard span attributes for AI operations — gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens. Semantic Kernel follows these conventions so dashboards and APM tools can natively display AI observability data.

How do I export OpenTelemetry data to Application Insights from .NET?

Add the Azure.Monitor.OpenTelemetry.AspNetCore package and call builder.Services.AddOpenTelemetry().UseAzureMonitor(). All traces and metrics flow to Application Insights automatically, including Semantic Kernel AI spans.

Can I track token costs with OpenTelemetry in .NET?

Yes. Read gen_ai.usage.input_tokens and gen_ai.usage.output_tokens from span attributes or from completion.Usage.InputTokenCount and OutputTokenCount. Multiply by per-model pricing to compute per-request cost, then emit as a custom Histogram<double> metric.

What is the difference between traces and metrics for AI observability?

Traces record individual request timelines — what functions were called, in what order, with what arguments. Metrics aggregate over time — average latency, total tokens consumed per minute, error rates. Both are needed: traces for debugging individual failures, metrics for detecting drift and quota trends.

How do I mask PII in Semantic Kernel traces?

Implement a custom OpenTelemetry processor that inherits from BaseProcessor<Activity>. In OnEnd, inspect activity tags whose keys match prompt-related gen_ai.* attributes, and replace values with redacted placeholders before the span is exported.

Does .NET Aspire integrate with OpenTelemetry automatically?

Yes. Calling builder.AddServiceDefaults() in an Aspire-orchestrated project sets up OpenTelemetry with the OTLP exporter pointing to the Aspire dashboard. All SK traces and metrics appear in the Aspire dashboard with no additional configuration.

Track your progress through this learning path.

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#OpenTelemetry #Observability #Semantic Kernel #.NET AI #Azure OpenAI

OpenTelemetry for AI Apps in .NET: Observability Guide

Why AI Observability Matters

OpenTelemetry GenAI Semantic Conventions

Enabling Semantic Kernel Telemetry

Tracing IChatClient Calls from MEAI

Custom Metrics: Token Usage and Cost

Setting Up .NET Aspire Dashboard

Exporting to Jaeger and Application Insights

Jaeger for Local Development Without Aspire

Application Insights for Production

Building a Prompt Quality Monitoring Pipeline with PII Masking

Extending the Processor for Custom PII Patterns

Combining Sampling and Redaction

Putting It All Together: A Complete Observability Configuration

Further Reading

⚠ Production Considerations

🧠 Architect’s Note

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Azure OpenAI Structured Outputs in C#: JSON Schema Guide

Getting Started with Semantic Kernel in .NET: Step-by-Step Setup

Semantic Kernel for .NET Developers: Complete 2026 Guide

Was this article useful?

Discussion

Why AI Observability Matters

OpenTelemetry GenAI Semantic Conventions

Enabling Semantic Kernel Telemetry

Tracing IChatClient Calls from MEAI

Custom Metrics: Token Usage and Cost

Setting Up .NET Aspire Dashboard

Exporting to Jaeger and Application Insights

Jaeger for Local Development Without Aspire

Application Insights for Production

Building a Prompt Quality Monitoring Pipeline with PII Masking

Extending the Processor for Custom PII Patterns

Combining Sampling and Redaction

Putting It All Together: A Complete Observability Configuration

Further Reading

Related Articles

⚠ Production Considerations

🧠 Architect’s Note

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Azure OpenAI Structured Outputs in C#: JSON Schema Guide

Getting Started with Semantic Kernel in .NET: Step-by-Step Setup

Semantic Kernel for .NET Developers: Complete 2026 Guide

Was this article useful?

Discussion