Skip to main content

Build an AI-Powered Minimal API with .NET 9 and Azure OpenAI

Intermediate Original .NET 9 Microsoft.Extensions.AI 10.3.0 Azure.AI.OpenAI 2.1.0
By Rajesh Mishra · Mar 21, 2026 · 17 min read
Verified Mar 2026 .NET 9 Microsoft.Extensions.AI 10.3.0
In 30 Seconds

This workshop builds a production-ready AI Minimal API with .NET 9 and Microsoft.Extensions.AI 10.3.0. It covers project scaffolding, IChatClient DI registration via AddAzureOpenAIChatClient(), a streaming SSE chat endpoint using CompleteStreamingAsync(), a structured output extraction endpoint using JSON schema formatting, an embeddings endpoint via IEmbeddingGenerator, resilience via ConfigureHttpClientDefaults with AddStandardResilienceHandler(), OpenAPI documentation with Scalar, and deployment to Azure Container Apps using a Dockerfile and managed identity via DefaultAzureCredential.

What You'll Build

Build a production-ready AI API with .NET 9 Minimal API and Microsoft.Extensions.AI. IChatClient DI, streaming SSE, structured output, and deployment.

Microsoft.Extensions.AI 10.3.0Azure.AI.OpenAI 2.1.0 .NET 9 · 17 min read to complete

Minimal API paired with Microsoft.Extensions.AI is the most direct path to a production AI backend in .NET 9. No controller ceremony, no heavy framework overhead — just a handful of POST endpoints backed by provider-agnostic AI abstractions that compose cleanly with ASP.NET Core’s DI container.

This workshop builds the full stack from dotnet new to Azure Container Apps deployment. By the end you will have a runnable API with a streaming chat endpoint, a structured output extraction endpoint, an embeddings endpoint, resilience middleware, and Scalar-powered API docs — all in a single Program.cs that stays under 100 lines.

Prerequisites

  • .NET 9 SDK installed
  • An Azure subscription with an Azure OpenAI resource provisioned
  • A deployed chat model (GPT-4o or GPT-4o-mini) and an embedding model (text-embedding-3-small)
  • Azure CLI installed for Container Apps deployment
  • Docker Desktop (for the final deployment section)

1. Project Setup

Scaffold a new Minimal API project and install the required packages:

dotnet new webapi --use-minimal-apis -o AiApi
cd AiApi

dotnet add package Microsoft.Extensions.AI --version 10.3.0
dotnet add package Microsoft.Extensions.AI.AzureAIInference --version 10.3.0
dotnet add package Azure.AI.OpenAI --version 2.1.0
dotnet add package Microsoft.Extensions.Resilience --version 9.3.0
dotnet add package Microsoft.AspNetCore.OpenApi --version 9.0.0
dotnet add package Scalar.AspNetCore --version 2.0.0
dotnet add package Azure.Identity --version 1.13.1

The Microsoft.Extensions.AI package provides IChatClient and IEmbeddingGenerator. The Microsoft.Extensions.AI.AzureAIInference package provides the AddAzureOpenAIChatClient() and AddAzureOpenAIEmbeddingGenerator() extension methods that wire those interfaces into ASP.NET Core DI.

Add your Azure OpenAI configuration to appsettings.Development.json:

{
  "AzureOpenAI": {
    "Endpoint": "https://your-resource.openai.azure.com/",
    "ApiKey": "your-api-key",
    "ChatDeployment": "gpt-4o",
    "EmbeddingDeployment": "text-embedding-3-small"
  }
}

The ApiKey is for local development only. In the deployment section, you will replace it with managed identity via DefaultAzureCredential.

2. DI Registration

The entire dependency setup lives in Program.cs. MEAI’s extension methods make this concise:

using Azure;
using Azure.Identity;
using Microsoft.Extensions.AI;
using Microsoft.AspNetCore.OpenApi;
using Scalar.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"];
var chatDeployment = builder.Configuration["AzureOpenAI:ChatDeployment"]!;
var embeddingDeployment = builder.Configuration["AzureOpenAI:EmbeddingDeployment"]!;

// Use API key locally; managed identity in production
AzureKeyCredential? keyCredential = !string.IsNullOrEmpty(apiKey)
    ? new AzureKeyCredential(apiKey)
    : null;

// Register IChatClient
if (keyCredential is not null)
{
    builder.Services.AddAzureOpenAIChatClient(
        new Uri(endpoint),
        keyCredential);
}
else
{
    builder.Services.AddAzureOpenAIChatClient(
        new Uri(endpoint),
        new DefaultAzureCredential());
}

// Register IEmbeddingGenerator<string, Embedding<float>>
if (keyCredential is not null)
{
    builder.Services.AddAzureOpenAIEmbeddingGenerator(
        new Uri(endpoint),
        keyCredential);
}
else
{
    builder.Services.AddAzureOpenAIEmbeddingGenerator(
        new Uri(endpoint),
        new DefaultAzureCredential());
}

// Resilience: standard retry, circuit breaker, timeout on all HttpClients
builder.Services.ConfigureHttpClientDefaults(b =>
    b.AddStandardResilienceHandler());

// OpenAPI
builder.Services.AddOpenApi();

AddAzureOpenAIChatClient() registers IChatClient as a singleton. AddAzureOpenAIEmbeddingGenerator() registers IEmbeddingGenerator<string, Embedding<float>> as a singleton. Both are stateless and thread-safe — singleton lifetime is correct for these clients.

For a deep dive on the DI patterns behind these registrations, including keyed services for multi-provider setups, see Dependency Injection for AI Services in ASP.NET Core.

3. Chat Streaming SSE Endpoint

Server-Sent Events (SSE) deliver a streaming AI response to any HTTP client — browsers with the native EventSource API, curl, or any fetch-based client — without WebSockets.

// Request model
record ChatRequest(string Message, string? SystemPrompt = null);

app.MapPost("/chat", async (
    ChatRequest request,
    IChatClient chatClient,
    HttpResponse response,
    CancellationToken ct) =>
{
    // Configure SSE headers before writing any body
    response.ContentType = "text/event-stream";
    response.Headers.CacheControl = "no-cache";
    response.Headers.Connection = "keep-alive";

    var messages = new List<ChatMessage>();

    if (!string.IsNullOrEmpty(request.SystemPrompt))
        messages.Add(new ChatMessage(ChatRole.System, request.SystemPrompt));

    messages.Add(new ChatMessage(ChatRole.User, request.Message));

    await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
    {
        var text = update.Text;
        if (!string.IsNullOrEmpty(text))
        {
            // SSE format: each event is "data: {payload}\n\n"
            await response.WriteAsync($"data: {text}\n\n", ct);
            await response.Body.FlushAsync(ct);
        }
    }

    // Signal stream end with a sentinel event
    await response.WriteAsync("data: [DONE]\n\n", ct);
    await response.Body.FlushAsync(ct);
})
.WithName("StreamChat")
.WithSummary("Stream a chat response as Server-Sent Events")
.WithDescription("Accepts a user message and optional system prompt. Streams tokens as SSE data events. Ends with data: [DONE].")
.Produces(200, contentType: "text/event-stream");

Key implementation details:

  • response.ContentType and response.Headers.CacheControl must be set before any body write — headers are sent on the first flush.
  • update.Text is the text fragment property on StreamingChatCompletionUpdate. Do not use .Content — that property does not exist on the MEAI streaming type.
  • Double newline \n\n after each data line is required by the SSE specification. A single \n only terminates the field; the event is not dispatched until the blank line.
  • FlushAsync() after each write sends the buffered bytes to the client immediately, rather than waiting for the response to complete.

Test with curl:

curl -X POST http://localhost:5000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Explain dependency injection in three sentences."}' \
  --no-buffer

The --no-buffer flag tells curl to print each SSE event as it arrives rather than buffering the full response.

4. Structured Output Endpoint

Structured output guarantees the model returns JSON that conforms to a specific schema — useful for extraction, classification, and entity recognition tasks where you need to deserialize the response reliably.

// The target schema for extraction
record ExtractedContact(
    string Name,
    string? Email,
    string? Phone,
    string? Company);

record ExtractRequest(string Text);

app.MapPost("/extract", async (
    ExtractRequest request,
    IChatClient chatClient,
    CancellationToken ct) =>
{
    var systemPrompt = """
        Extract contact information from the provided text.
        Return a JSON object with fields: Name (string), Email (string or null),
        Phone (string or null), Company (string or null).
        If a field is not present in the text, set it to null.
        Return only the JSON object, no explanation.
        """;

    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.System, systemPrompt),
        new ChatMessage(ChatRole.User, request.Text)
    };

    var result = await chatClient.CompleteAsync(messages, cancellationToken: ct);

    var responseText = result.Message.Text ?? string.Empty;

    // Strip markdown code fences if the model wraps the JSON
    if (responseText.StartsWith("```"))
    {
        responseText = responseText
            .Replace("```json", "")
            .Replace("```", "")
            .Trim();
    }

    try
    {
        var contact = JsonSerializer.Deserialize<ExtractedContact>(
            responseText,
            new JsonSerializerOptions { PropertyNameCaseInsensitive = true });

        return Results.Ok(contact);
    }
    catch (JsonException)
    {
        return Results.BadRequest(new { error = "Model returned invalid JSON", raw = responseText });
    }
})
.WithName("ExtractContact")
.WithSummary("Extract contact information from unstructured text")
.WithDescription("Uses the AI model with a structured output prompt to extract Name, Email, Phone, and Company fields.")
.Produces<ExtractedContact>(200)
.ProducesProblem(400);

The JsonSerializer.Deserialize fallback approach works reliably across all MEAI providers and versions. If you are on a MEAI version that exposes ChatResponseFormat.ForJsonSchema(), you can pass it in ChatOptions.ResponseFormat to engage the model’s native JSON schema mode, which eliminates the need for the system prompt instruction and produces more reliable output.

Test the extraction endpoint:

curl -X POST http://localhost:5000/extract \
  -H "Content-Type: application/json" \
  -d '{"text": "Reach out to Jane Smith at [email protected] or call 555-1234. She works at Acme Corp."}'

Expected response:

{
  "name": "Jane Smith",
  "email": "[email protected]",
  "phone": "555-1234",
  "company": "Acme Corp"
}

5. Embeddings Endpoint

Embeddings convert text into dense vector representations that enable semantic search, clustering, and similarity comparisons. The IEmbeddingGenerator<string, Embedding<float>> interface from MEAI provides a provider-agnostic way to generate them.

record EmbedRequest(string[] Texts);
record EmbedResponse(float[][] Embeddings, int Dimensions);

app.MapPost("/embed", async (
    EmbedRequest request,
    IEmbeddingGenerator<string, Embedding<float>> generator,
    CancellationToken ct) =>
{
    if (request.Texts is null || request.Texts.Length == 0)
        return Results.BadRequest(new { error = "Texts array must not be empty." });

    if (request.Texts.Length > 100)
        return Results.BadRequest(new { error = "Maximum 100 texts per request." });

    var generated = await generator.GenerateAsync(request.Texts, cancellationToken: ct);

    var vectors = generated
        .Select(e => e.Vector.ToArray())
        .ToArray();

    return Results.Ok(new EmbedResponse(
        Embeddings: vectors,
        Dimensions: vectors[0].Length));
})
.WithName("GenerateEmbeddings")
.WithSummary("Generate text embeddings using Azure OpenAI")
.WithDescription("Converts an array of text strings to float vector embeddings. Maximum 100 texts per request.")
.Produces<EmbedResponse>(200)
.ProducesProblem(400);

GenerateAsync accepts an IEnumerable<string> and returns GeneratedEmbeddings<Embedding<float>>. The result is indexable — generated[0].Vector is the ReadOnlyMemory<float> for the first input. Calling .ToArray() materializes it into a standard array suitable for JSON serialization.

Test the embeddings endpoint:

curl -X POST http://localhost:5000/embed \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Hello world", "Azure OpenAI embeddings in .NET"]}'

The response will contain two float arrays of 1536 dimensions each (for text-embedding-3-small).

6. Resilience Middleware

The single line added in the DI setup section — builder.Services.ConfigureHttpClientDefaults(b => b.AddStandardResilienceHandler()) — automatically applies a standard resilience pipeline to every HttpClient instance the DI container creates, including those used internally by MEAI.

The standard resilience handler from Microsoft.Extensions.Resilience configures:

StrategyDefault
Total request timeout30 seconds
Retry3 retries, exponential backoff, jitter
Circuit breakerOpens at 10% failure over 30s, breaks for 5s
Attempt timeout10 seconds per attempt

For most AI APIs this is appropriate out of the box. For Azure OpenAI specifically, if you need to handle 429 (rate limit) responses correctly and respect Retry-After headers, you will want additional configuration. See Add Resilience to AI Calls in .NET — Polly Retry, Circuit Breaker, and Rate Limiting for the full treatment, including custom Polly pipelines that read Retry-After and implement client-side token bucket rate limiting.

To verify the resilience handler is active, you can check that it is registered during startup:

// After building the app, confirm resilience is wired in
// This check is optional — for diagnostic purposes only
app.Logger.LogInformation("Standard resilience handler configured for all HttpClient instances.");

7. OpenAPI Documentation with Scalar

The .WithName(), .WithSummary(), and .WithDescription() calls on each endpoint automatically populate the OpenAPI document. Wire up Scalar to render an interactive UI:

var app = builder.Build();

app.MapOpenApi();
app.MapScalarApiReference(options =>
{
    options.Title = "AI Minimal API";
    options.Theme = ScalarTheme.DeepSpace;
});

// ... endpoint registrations ...

app.Run();

Navigate to /scalar/v1 in the browser to see the interactive documentation. The Scalar UI allows you to send requests directly from the browser, which is useful for manual testing during development.

To see the raw OpenAPI JSON document:

curl http://localhost:5000/openapi/v1.json

The output is a standard OpenAPI 3.1 document that integrates with any API gateway, Postman collection generator, or client SDK generator.

Complete Program.cs

Here is the full Program.cs combining all sections:

using System.Text.Json;
using Azure;
using Azure.Identity;
using Microsoft.Extensions.AI;
using Scalar.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"];

AzureKeyCredential? keyCredential = !string.IsNullOrEmpty(apiKey)
    ? new AzureKeyCredential(apiKey)
    : null;

// Register IChatClient
if (keyCredential is not null)
{
    builder.Services.AddAzureOpenAIChatClient(new Uri(endpoint), keyCredential);
}
else
{
    builder.Services.AddAzureOpenAIChatClient(new Uri(endpoint), new DefaultAzureCredential());
}

// Register IEmbeddingGenerator<string, Embedding<float>>
if (keyCredential is not null)
{
    builder.Services.AddAzureOpenAIEmbeddingGenerator(new Uri(endpoint), keyCredential);
}
else
{
    builder.Services.AddAzureOpenAIEmbeddingGenerator(new Uri(endpoint), new DefaultAzureCredential());
}

// Resilience on all HttpClient instances
builder.Services.ConfigureHttpClientDefaults(b => b.AddStandardResilienceHandler());

// OpenAPI
builder.Services.AddOpenApi();

var app = builder.Build();

app.MapOpenApi();
app.MapScalarApiReference(options =>
{
    options.Title = "AI Minimal API";
    options.Theme = ScalarTheme.DeepSpace;
});

// ── /chat — Streaming SSE ──────────────────────────────────────────────────
record ChatRequest(string Message, string? SystemPrompt = null);

app.MapPost("/chat", async (
    ChatRequest request,
    IChatClient chatClient,
    HttpResponse response,
    CancellationToken ct) =>
{
    response.ContentType = "text/event-stream";
    response.Headers.CacheControl = "no-cache";
    response.Headers.Connection = "keep-alive";

    var messages = new List<ChatMessage>();
    if (!string.IsNullOrEmpty(request.SystemPrompt))
        messages.Add(new ChatMessage(ChatRole.System, request.SystemPrompt));
    messages.Add(new ChatMessage(ChatRole.User, request.Message));

    await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
    {
        var text = update.Text;
        if (!string.IsNullOrEmpty(text))
        {
            await response.WriteAsync($"data: {text}\n\n", ct);
            await response.Body.FlushAsync(ct);
        }
    }

    await response.WriteAsync("data: [DONE]\n\n", ct);
    await response.Body.FlushAsync(ct);
})
.WithName("StreamChat")
.WithSummary("Stream a chat response as Server-Sent Events")
.Produces(200, contentType: "text/event-stream");

// ── /extract — Structured Output ──────────────────────────────────────────
record ExtractedContact(string Name, string? Email, string? Phone, string? Company);
record ExtractRequest(string Text);

app.MapPost("/extract", async (
    ExtractRequest request,
    IChatClient chatClient,
    CancellationToken ct) =>
{
    var systemPrompt = """
        Extract contact information from the provided text.
        Return a JSON object with fields: Name (string), Email (string or null),
        Phone (string or null), Company (string or null).
        If a field is not present in the text, set it to null.
        Return only the JSON object, no explanation.
        """;

    var messages = new List<ChatMessage>
    {
        new ChatMessage(ChatRole.System, systemPrompt),
        new ChatMessage(ChatRole.User, request.Text)
    };

    var result = await chatClient.CompleteAsync(messages, cancellationToken: ct);
    var responseText = (result.Message.Text ?? string.Empty).Trim();

    if (responseText.StartsWith("```"))
        responseText = responseText.Replace("```json", "").Replace("```", "").Trim();

    try
    {
        var contact = JsonSerializer.Deserialize<ExtractedContact>(
            responseText, new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
        return Results.Ok(contact);
    }
    catch (JsonException)
    {
        return Results.BadRequest(new { error = "Model returned invalid JSON", raw = responseText });
    }
})
.WithName("ExtractContact")
.WithSummary("Extract contact information from unstructured text")
.Produces<ExtractedContact>(200)
.ProducesProblem(400);

// ── /embed — Embeddings ────────────────────────────────────────────────────
record EmbedRequest(string[] Texts);
record EmbedResponse(float[][] Embeddings, int Dimensions);

app.MapPost("/embed", async (
    EmbedRequest request,
    IEmbeddingGenerator<string, Embedding<float>> generator,
    CancellationToken ct) =>
{
    if (request.Texts is null || request.Texts.Length == 0)
        return Results.BadRequest(new { error = "Texts array must not be empty." });

    if (request.Texts.Length > 100)
        return Results.BadRequest(new { error = "Maximum 100 texts per request." });

    var generated = await generator.GenerateAsync(request.Texts, cancellationToken: ct);
    var vectors = generated.Select(e => e.Vector.ToArray()).ToArray();

    return Results.Ok(new EmbedResponse(Embeddings: vectors, Dimensions: vectors[0].Length));
})
.WithName("GenerateEmbeddings")
.WithSummary("Generate text embeddings")
.Produces<EmbedResponse>(200)
.ProducesProblem(400);

app.Run();

8. Deploy to Azure Container Apps

Dockerfile

Create a multi-stage Dockerfile in the project root:

# Build stage
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src
COPY ["AiApi.csproj", "."]
RUN dotnet restore "./AiApi.csproj"
COPY . .
RUN dotnet publish "AiApi.csproj" -c Release -o /app/publish --no-restore

# Runtime stage
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS final
WORKDIR /app
EXPOSE 8080
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "AiApi.dll"]

The runtime image uses the ASP.NET Core runtime (not the SDK), keeping the final image lean. Port 8080 is the default for Azure Container Apps.

Build and Push to Azure Container Registry

# Create a resource group and registry
az group create --name rg-ai-api --location eastus
az acr create --resource-group rg-ai-api --name youracr --sku Basic --admin-enabled false

# Build and push using ACR Tasks (no Docker daemon required)
az acr build --registry youracr --image ai-api:latest .

Deploy to Azure Container Apps

# Create a Container Apps environment
az containerapp env create \
  --name cae-ai-api \
  --resource-group rg-ai-api \
  --location eastus

# Deploy the container app with a system-assigned managed identity
az containerapp create \
  --name ca-ai-api \
  --resource-group rg-ai-api \
  --environment cae-ai-api \
  --image youracr.azurecr.io/ai-api:latest \
  --target-port 8080 \
  --ingress external \
  --assign-identity system \
  --env-vars \
    "AzureOpenAI__Endpoint=https://your-resource.openai.azure.com/" \
  --registry-server youracr.azurecr.io \
  --registry-identity system

Notice that AzureOpenAI:ApiKey is intentionally absent from --env-vars. The app uses DefaultAzureCredential when no API key is configured — the managed identity handles authentication.

Grant the Managed Identity Access to Azure OpenAI

# Get the managed identity's principal ID
PRINCIPAL_ID=$(az containerapp show \
  --name ca-ai-api \
  --resource-group rg-ai-api \
  --query "identity.principalId" \
  --output tsv)

# Get the Azure OpenAI resource ID
OPENAI_RESOURCE_ID=$(az cognitiveservices account show \
  --name your-openai-resource \
  --resource-group rg-openai \
  --query id \
  --output tsv)

# Assign "Cognitive Services OpenAI User" role
az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Cognitive Services OpenAI User" \
  --scope $OPENAI_RESOURCE_ID

With this role assignment, the container app’s managed identity can call Azure OpenAI. No API keys are stored anywhere — not in environment variables, not in secrets, not in code.

Verify the deployment:

# Get the app URL
APP_URL=$(az containerapp show \
  --name ca-ai-api \
  --resource-group rg-ai-api \
  --query "properties.configuration.ingress.fqdn" \
  --output tsv)

# Test the chat endpoint
curl -X POST "https://$APP_URL/chat" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello from Azure Container Apps!"}' \
  --no-buffer

⚠ Production Considerations

  • SSE endpoints must disable response buffering. If you use a reverse proxy (NGINX, Azure Front Door) without streaming support, tokens accumulate server-side and the client receives them all at once — defeating the point. Set X-Accel-Buffering: no for NGINX or configure Azure Front Door's streaming mode.
  • IChatClient is registered as Singleton by default. Never store mutable request state (conversation history, per-user context) inside a service that holds a Singleton IChatClient reference — that state leaks across requests. Keep conversation history in a Scoped service or pass it explicitly into each CompleteAsync call.
  • DefaultAzureCredential probes multiple credential sources in order. In local dev, ensure you are signed in via az login or the credential chain silently falls through to ManagedIdentityCredential which fails without a managed identity assigned. Use AzureCliCredential explicitly in local dev and DefaultAzureCredential in production for clearer error messages.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

🧠 Architect’s Note

Minimal API is the right host for AI endpoints in .NET 9. The low ceremony matches the typical AI endpoint surface: a handful of POST routes, no complex routing hierarchies, and DI-first composition. The MEAI abstractions keep your endpoint handlers thin — they receive IChatClient or IEmbeddingGenerator via parameter injection and hand off to the client. Business logic, prompt construction, and response parsing belong in dedicated services registered in DI, not inline in the endpoint handler.

AI-Friendly Summary

Summary

This workshop builds a production-ready AI Minimal API with .NET 9 and Microsoft.Extensions.AI 10.3.0. It covers project scaffolding, IChatClient DI registration via AddAzureOpenAIChatClient(), a streaming SSE chat endpoint using CompleteStreamingAsync(), a structured output extraction endpoint using JSON schema formatting, an embeddings endpoint via IEmbeddingGenerator, resilience via ConfigureHttpClientDefaults with AddStandardResilienceHandler(), OpenAPI documentation with Scalar, and deployment to Azure Container Apps using a Dockerfile and managed identity via DefaultAzureCredential.

Key Takeaways

  • Register IChatClient and IEmbeddingGenerator via MEAI extension methods — no manual Azure SDK wiring
  • Streaming SSE requires text/event-stream content type, data: prefix, double newline, and FlushAsync per chunk
  • Structured output: use ChatResponseFormat.ForJsonSchema or fall back to system prompt + JsonSerializer.Deserialize
  • AddStandardResilienceHandler() gives retry, circuit breaker, and timeout with zero manual Polly configuration
  • Managed identity via DefaultAzureCredential removes API key secrets from Container Apps entirely
  • Program.cs stays under 100 lines with Minimal API — all AI capabilities compose as DI registrations

Implementation Checklist

  • Scaffold with dotnet new webapi --use-minimal-apis and add MEAI + Azure OpenAI packages
  • Register IChatClient with AddAzureOpenAIChatClient(endpoint, credential)
  • Implement POST /chat endpoint streaming tokens as SSE with text/event-stream
  • Implement POST /extract endpoint returning structured JSON from model output
  • Implement POST /embed endpoint using IEmbeddingGenerator<string, Embedding<float>>
  • Add AddStandardResilienceHandler() via ConfigureHttpClientDefaults
  • Configure OpenAPI with Scalar UI for interactive endpoint documentation
  • Build Dockerfile and deploy to Azure Container Apps with managed identity

Frequently Asked Questions

Why use Microsoft.Extensions.AI instead of the raw Azure.AI.OpenAI SDK for Minimal API?

Microsoft.Extensions.AI provides the IChatClient and IEmbeddingGenerator abstractions that register cleanly into ASP.NET Core's DI container via AddAzureOpenAIChatClient(). Your endpoint handlers stay provider-agnostic — swapping from Azure OpenAI to Ollama for local dev changes one line of DI registration, not every endpoint handler. The raw SDK requires you to manage AzureOpenAIClient lifetimes manually and precludes easy mocking in tests.

How do I stream a chat response as Server-Sent Events from a Minimal API endpoint?

Set HttpResponse.ContentType to 'text/event-stream' and HttpResponse.Headers.CacheControl to 'no-cache'. Then iterate CompleteStreamingAsync() and for each StreamingChatCompletionUpdate, write 'data: {update.Text}\n\n' to HttpResponse.Body and call FlushAsync() after each write. The client consumes this with EventSource in the browser or any SSE-aware HTTP client.

What is the correct way to use structured JSON output with Microsoft.Extensions.AI 10.3.0?

Call CompleteAsync with a ChatOptions that sets ResponseFormat using ChatResponseFormat.ForJsonSchema(typeof(T), strict: true) if your MEAI version exposes that API. In practice, structured output support varies across MEAI versions, so a safe fallback is to instruct the model via a system message to return JSON matching your schema, then JsonSerializer.Deserialize<T> the response text. This approach works with all providers.

How does IEmbeddingGenerator work in a Minimal API endpoint?

Register IEmbeddingGenerator<string, Embedding<float>> via builder.Services.AddAzureOpenAIEmbeddingGenerator(). Inject it into your endpoint handler via the handler parameter list. Call GenerateAsync(new[] { text }) which returns GeneratedEmbeddings<Embedding<float>>. Access each embedding via result[0].Vector which is a ReadOnlyMemory<float>.

How do I add resilience to Minimal API AI calls without writing Polly pipelines manually?

Call builder.Services.ConfigureHttpClientDefaults(b => b.AddStandardResilienceHandler()) from Microsoft.Extensions.Resilience. This automatically applies retry with exponential backoff, circuit breaker, and timeout to all HttpClient instances created by the DI container — including the ones MEAI uses internally. No manual ResiliencePipelineBuilder required for standard scenarios.

How do I deploy a .NET 9 Minimal API with AI to Azure Container Apps?

Build a multi-stage Dockerfile: SDK image for build, runtime image for final. Push to Azure Container Registry. Create the Container App with az containerapp create, enabling a managed identity via --assign-identity system. Grant the managed identity 'Cognitive Services OpenAI User' role on your Azure OpenAI resource, then use DefaultAzureCredential in your code — no API keys stored in environment variables or secrets.

Can I add OpenAPI/Swagger documentation to AI endpoints in a .NET 9 Minimal API?

Yes. Add Microsoft.AspNetCore.OpenApi and Scalar.AspNetCore packages. Call builder.Services.AddOpenApi() and app.MapScalarApiReference() after app.MapOpenApi(). Decorate each endpoint with .WithName(), .WithSummary(), and .WithDescription() to document the AI endpoints. Scalar provides a clean interactive UI at /scalar/v1.

You Might Also Enjoy

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#Minimal API #.NET 9 #Azure OpenAI #Microsoft.Extensions.AI #.NET AI