Minimal API paired with Microsoft.Extensions.AI is the most direct path to a production AI backend in .NET 9. No controller ceremony, no heavy framework overhead — just a handful of POST endpoints backed by provider-agnostic AI abstractions that compose cleanly with ASP.NET Core’s DI container.
This workshop builds the full stack from dotnet new to Azure Container Apps deployment. By the end you will have a runnable API with a streaming chat endpoint, a structured output extraction endpoint, an embeddings endpoint, resilience middleware, and Scalar-powered API docs — all in a single Program.cs that stays under 100 lines.
Prerequisites
- .NET 9 SDK installed
- An Azure subscription with an Azure OpenAI resource provisioned
- A deployed chat model (GPT-4o or GPT-4o-mini) and an embedding model (
text-embedding-3-small) - Azure CLI installed for Container Apps deployment
- Docker Desktop (for the final deployment section)
1. Project Setup
Scaffold a new Minimal API project and install the required packages:
dotnet new webapi --use-minimal-apis -o AiApi
cd AiApi
dotnet add package Microsoft.Extensions.AI --version 10.3.0
dotnet add package Microsoft.Extensions.AI.AzureAIInference --version 10.3.0
dotnet add package Azure.AI.OpenAI --version 2.1.0
dotnet add package Microsoft.Extensions.Resilience --version 9.3.0
dotnet add package Microsoft.AspNetCore.OpenApi --version 9.0.0
dotnet add package Scalar.AspNetCore --version 2.0.0
dotnet add package Azure.Identity --version 1.13.1
The Microsoft.Extensions.AI package provides IChatClient and IEmbeddingGenerator. The Microsoft.Extensions.AI.AzureAIInference package provides the AddAzureOpenAIChatClient() and AddAzureOpenAIEmbeddingGenerator() extension methods that wire those interfaces into ASP.NET Core DI.
Add your Azure OpenAI configuration to appsettings.Development.json:
{
"AzureOpenAI": {
"Endpoint": "https://your-resource.openai.azure.com/",
"ApiKey": "your-api-key",
"ChatDeployment": "gpt-4o",
"EmbeddingDeployment": "text-embedding-3-small"
}
}
The ApiKey is for local development only. In the deployment section, you will replace it with managed identity via DefaultAzureCredential.
2. DI Registration
The entire dependency setup lives in Program.cs. MEAI’s extension methods make this concise:
using Azure;
using Azure.Identity;
using Microsoft.Extensions.AI;
using Microsoft.AspNetCore.OpenApi;
using Scalar.AspNetCore;
var builder = WebApplication.CreateBuilder(args);
var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"];
var chatDeployment = builder.Configuration["AzureOpenAI:ChatDeployment"]!;
var embeddingDeployment = builder.Configuration["AzureOpenAI:EmbeddingDeployment"]!;
// Use API key locally; managed identity in production
AzureKeyCredential? keyCredential = !string.IsNullOrEmpty(apiKey)
? new AzureKeyCredential(apiKey)
: null;
// Register IChatClient
if (keyCredential is not null)
{
builder.Services.AddAzureOpenAIChatClient(
new Uri(endpoint),
keyCredential);
}
else
{
builder.Services.AddAzureOpenAIChatClient(
new Uri(endpoint),
new DefaultAzureCredential());
}
// Register IEmbeddingGenerator<string, Embedding<float>>
if (keyCredential is not null)
{
builder.Services.AddAzureOpenAIEmbeddingGenerator(
new Uri(endpoint),
keyCredential);
}
else
{
builder.Services.AddAzureOpenAIEmbeddingGenerator(
new Uri(endpoint),
new DefaultAzureCredential());
}
// Resilience: standard retry, circuit breaker, timeout on all HttpClients
builder.Services.ConfigureHttpClientDefaults(b =>
b.AddStandardResilienceHandler());
// OpenAPI
builder.Services.AddOpenApi();
AddAzureOpenAIChatClient() registers IChatClient as a singleton. AddAzureOpenAIEmbeddingGenerator() registers IEmbeddingGenerator<string, Embedding<float>> as a singleton. Both are stateless and thread-safe — singleton lifetime is correct for these clients.
For a deep dive on the DI patterns behind these registrations, including keyed services for multi-provider setups, see Dependency Injection for AI Services in ASP.NET Core.
3. Chat Streaming SSE Endpoint
Server-Sent Events (SSE) deliver a streaming AI response to any HTTP client — browsers with the native EventSource API, curl, or any fetch-based client — without WebSockets.
// Request model
record ChatRequest(string Message, string? SystemPrompt = null);
app.MapPost("/chat", async (
ChatRequest request,
IChatClient chatClient,
HttpResponse response,
CancellationToken ct) =>
{
// Configure SSE headers before writing any body
response.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
response.Headers.Connection = "keep-alive";
var messages = new List<ChatMessage>();
if (!string.IsNullOrEmpty(request.SystemPrompt))
messages.Add(new ChatMessage(ChatRole.System, request.SystemPrompt));
messages.Add(new ChatMessage(ChatRole.User, request.Message));
await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
{
var text = update.Text;
if (!string.IsNullOrEmpty(text))
{
// SSE format: each event is "data: {payload}\n\n"
await response.WriteAsync($"data: {text}\n\n", ct);
await response.Body.FlushAsync(ct);
}
}
// Signal stream end with a sentinel event
await response.WriteAsync("data: [DONE]\n\n", ct);
await response.Body.FlushAsync(ct);
})
.WithName("StreamChat")
.WithSummary("Stream a chat response as Server-Sent Events")
.WithDescription("Accepts a user message and optional system prompt. Streams tokens as SSE data events. Ends with data: [DONE].")
.Produces(200, contentType: "text/event-stream");
Key implementation details:
response.ContentTypeandresponse.Headers.CacheControlmust be set before any body write — headers are sent on the first flush.update.Textis the text fragment property onStreamingChatCompletionUpdate. Do not use.Content— that property does not exist on the MEAI streaming type.- Double newline
\n\nafter each data line is required by the SSE specification. A single\nonly terminates the field; the event is not dispatched until the blank line. FlushAsync()after each write sends the buffered bytes to the client immediately, rather than waiting for the response to complete.
Test with curl:
curl -X POST http://localhost:5000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Explain dependency injection in three sentences."}' \
--no-buffer
The --no-buffer flag tells curl to print each SSE event as it arrives rather than buffering the full response.
4. Structured Output Endpoint
Structured output guarantees the model returns JSON that conforms to a specific schema — useful for extraction, classification, and entity recognition tasks where you need to deserialize the response reliably.
// The target schema for extraction
record ExtractedContact(
string Name,
string? Email,
string? Phone,
string? Company);
record ExtractRequest(string Text);
app.MapPost("/extract", async (
ExtractRequest request,
IChatClient chatClient,
CancellationToken ct) =>
{
var systemPrompt = """
Extract contact information from the provided text.
Return a JSON object with fields: Name (string), Email (string or null),
Phone (string or null), Company (string or null).
If a field is not present in the text, set it to null.
Return only the JSON object, no explanation.
""";
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, systemPrompt),
new ChatMessage(ChatRole.User, request.Text)
};
var result = await chatClient.CompleteAsync(messages, cancellationToken: ct);
var responseText = result.Message.Text ?? string.Empty;
// Strip markdown code fences if the model wraps the JSON
if (responseText.StartsWith("```"))
{
responseText = responseText
.Replace("```json", "")
.Replace("```", "")
.Trim();
}
try
{
var contact = JsonSerializer.Deserialize<ExtractedContact>(
responseText,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
return Results.Ok(contact);
}
catch (JsonException)
{
return Results.BadRequest(new { error = "Model returned invalid JSON", raw = responseText });
}
})
.WithName("ExtractContact")
.WithSummary("Extract contact information from unstructured text")
.WithDescription("Uses the AI model with a structured output prompt to extract Name, Email, Phone, and Company fields.")
.Produces<ExtractedContact>(200)
.ProducesProblem(400);
The JsonSerializer.Deserialize fallback approach works reliably across all MEAI providers and versions. If you are on a MEAI version that exposes ChatResponseFormat.ForJsonSchema(), you can pass it in ChatOptions.ResponseFormat to engage the model’s native JSON schema mode, which eliminates the need for the system prompt instruction and produces more reliable output.
Test the extraction endpoint:
curl -X POST http://localhost:5000/extract \
-H "Content-Type: application/json" \
-d '{"text": "Reach out to Jane Smith at [email protected] or call 555-1234. She works at Acme Corp."}'
Expected response:
{
"name": "Jane Smith",
"email": "[email protected]",
"phone": "555-1234",
"company": "Acme Corp"
}
5. Embeddings Endpoint
Embeddings convert text into dense vector representations that enable semantic search, clustering, and similarity comparisons. The IEmbeddingGenerator<string, Embedding<float>> interface from MEAI provides a provider-agnostic way to generate them.
record EmbedRequest(string[] Texts);
record EmbedResponse(float[][] Embeddings, int Dimensions);
app.MapPost("/embed", async (
EmbedRequest request,
IEmbeddingGenerator<string, Embedding<float>> generator,
CancellationToken ct) =>
{
if (request.Texts is null || request.Texts.Length == 0)
return Results.BadRequest(new { error = "Texts array must not be empty." });
if (request.Texts.Length > 100)
return Results.BadRequest(new { error = "Maximum 100 texts per request." });
var generated = await generator.GenerateAsync(request.Texts, cancellationToken: ct);
var vectors = generated
.Select(e => e.Vector.ToArray())
.ToArray();
return Results.Ok(new EmbedResponse(
Embeddings: vectors,
Dimensions: vectors[0].Length));
})
.WithName("GenerateEmbeddings")
.WithSummary("Generate text embeddings using Azure OpenAI")
.WithDescription("Converts an array of text strings to float vector embeddings. Maximum 100 texts per request.")
.Produces<EmbedResponse>(200)
.ProducesProblem(400);
GenerateAsync accepts an IEnumerable<string> and returns GeneratedEmbeddings<Embedding<float>>. The result is indexable — generated[0].Vector is the ReadOnlyMemory<float> for the first input. Calling .ToArray() materializes it into a standard array suitable for JSON serialization.
Test the embeddings endpoint:
curl -X POST http://localhost:5000/embed \
-H "Content-Type: application/json" \
-d '{"texts": ["Hello world", "Azure OpenAI embeddings in .NET"]}'
The response will contain two float arrays of 1536 dimensions each (for text-embedding-3-small).
6. Resilience Middleware
The single line added in the DI setup section — builder.Services.ConfigureHttpClientDefaults(b => b.AddStandardResilienceHandler()) — automatically applies a standard resilience pipeline to every HttpClient instance the DI container creates, including those used internally by MEAI.
The standard resilience handler from Microsoft.Extensions.Resilience configures:
| Strategy | Default |
|---|---|
| Total request timeout | 30 seconds |
| Retry | 3 retries, exponential backoff, jitter |
| Circuit breaker | Opens at 10% failure over 30s, breaks for 5s |
| Attempt timeout | 10 seconds per attempt |
For most AI APIs this is appropriate out of the box. For Azure OpenAI specifically, if you need to handle 429 (rate limit) responses correctly and respect Retry-After headers, you will want additional configuration. See Add Resilience to AI Calls in .NET — Polly Retry, Circuit Breaker, and Rate Limiting for the full treatment, including custom Polly pipelines that read Retry-After and implement client-side token bucket rate limiting.
To verify the resilience handler is active, you can check that it is registered during startup:
// After building the app, confirm resilience is wired in
// This check is optional — for diagnostic purposes only
app.Logger.LogInformation("Standard resilience handler configured for all HttpClient instances.");
7. OpenAPI Documentation with Scalar
The .WithName(), .WithSummary(), and .WithDescription() calls on each endpoint automatically populate the OpenAPI document. Wire up Scalar to render an interactive UI:
var app = builder.Build();
app.MapOpenApi();
app.MapScalarApiReference(options =>
{
options.Title = "AI Minimal API";
options.Theme = ScalarTheme.DeepSpace;
});
// ... endpoint registrations ...
app.Run();
Navigate to /scalar/v1 in the browser to see the interactive documentation. The Scalar UI allows you to send requests directly from the browser, which is useful for manual testing during development.
To see the raw OpenAPI JSON document:
curl http://localhost:5000/openapi/v1.json
The output is a standard OpenAPI 3.1 document that integrates with any API gateway, Postman collection generator, or client SDK generator.
Complete Program.cs
Here is the full Program.cs combining all sections:
using System.Text.Json;
using Azure;
using Azure.Identity;
using Microsoft.Extensions.AI;
using Scalar.AspNetCore;
var builder = WebApplication.CreateBuilder(args);
var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"];
AzureKeyCredential? keyCredential = !string.IsNullOrEmpty(apiKey)
? new AzureKeyCredential(apiKey)
: null;
// Register IChatClient
if (keyCredential is not null)
{
builder.Services.AddAzureOpenAIChatClient(new Uri(endpoint), keyCredential);
}
else
{
builder.Services.AddAzureOpenAIChatClient(new Uri(endpoint), new DefaultAzureCredential());
}
// Register IEmbeddingGenerator<string, Embedding<float>>
if (keyCredential is not null)
{
builder.Services.AddAzureOpenAIEmbeddingGenerator(new Uri(endpoint), keyCredential);
}
else
{
builder.Services.AddAzureOpenAIEmbeddingGenerator(new Uri(endpoint), new DefaultAzureCredential());
}
// Resilience on all HttpClient instances
builder.Services.ConfigureHttpClientDefaults(b => b.AddStandardResilienceHandler());
// OpenAPI
builder.Services.AddOpenApi();
var app = builder.Build();
app.MapOpenApi();
app.MapScalarApiReference(options =>
{
options.Title = "AI Minimal API";
options.Theme = ScalarTheme.DeepSpace;
});
// ── /chat — Streaming SSE ──────────────────────────────────────────────────
record ChatRequest(string Message, string? SystemPrompt = null);
app.MapPost("/chat", async (
ChatRequest request,
IChatClient chatClient,
HttpResponse response,
CancellationToken ct) =>
{
response.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";
response.Headers.Connection = "keep-alive";
var messages = new List<ChatMessage>();
if (!string.IsNullOrEmpty(request.SystemPrompt))
messages.Add(new ChatMessage(ChatRole.System, request.SystemPrompt));
messages.Add(new ChatMessage(ChatRole.User, request.Message));
await foreach (var update in chatClient.CompleteStreamingAsync(messages, cancellationToken: ct))
{
var text = update.Text;
if (!string.IsNullOrEmpty(text))
{
await response.WriteAsync($"data: {text}\n\n", ct);
await response.Body.FlushAsync(ct);
}
}
await response.WriteAsync("data: [DONE]\n\n", ct);
await response.Body.FlushAsync(ct);
})
.WithName("StreamChat")
.WithSummary("Stream a chat response as Server-Sent Events")
.Produces(200, contentType: "text/event-stream");
// ── /extract — Structured Output ──────────────────────────────────────────
record ExtractedContact(string Name, string? Email, string? Phone, string? Company);
record ExtractRequest(string Text);
app.MapPost("/extract", async (
ExtractRequest request,
IChatClient chatClient,
CancellationToken ct) =>
{
var systemPrompt = """
Extract contact information from the provided text.
Return a JSON object with fields: Name (string), Email (string or null),
Phone (string or null), Company (string or null).
If a field is not present in the text, set it to null.
Return only the JSON object, no explanation.
""";
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, systemPrompt),
new ChatMessage(ChatRole.User, request.Text)
};
var result = await chatClient.CompleteAsync(messages, cancellationToken: ct);
var responseText = (result.Message.Text ?? string.Empty).Trim();
if (responseText.StartsWith("```"))
responseText = responseText.Replace("```json", "").Replace("```", "").Trim();
try
{
var contact = JsonSerializer.Deserialize<ExtractedContact>(
responseText, new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
return Results.Ok(contact);
}
catch (JsonException)
{
return Results.BadRequest(new { error = "Model returned invalid JSON", raw = responseText });
}
})
.WithName("ExtractContact")
.WithSummary("Extract contact information from unstructured text")
.Produces<ExtractedContact>(200)
.ProducesProblem(400);
// ── /embed — Embeddings ────────────────────────────────────────────────────
record EmbedRequest(string[] Texts);
record EmbedResponse(float[][] Embeddings, int Dimensions);
app.MapPost("/embed", async (
EmbedRequest request,
IEmbeddingGenerator<string, Embedding<float>> generator,
CancellationToken ct) =>
{
if (request.Texts is null || request.Texts.Length == 0)
return Results.BadRequest(new { error = "Texts array must not be empty." });
if (request.Texts.Length > 100)
return Results.BadRequest(new { error = "Maximum 100 texts per request." });
var generated = await generator.GenerateAsync(request.Texts, cancellationToken: ct);
var vectors = generated.Select(e => e.Vector.ToArray()).ToArray();
return Results.Ok(new EmbedResponse(Embeddings: vectors, Dimensions: vectors[0].Length));
})
.WithName("GenerateEmbeddings")
.WithSummary("Generate text embeddings")
.Produces<EmbedResponse>(200)
.ProducesProblem(400);
app.Run();
8. Deploy to Azure Container Apps
Dockerfile
Create a multi-stage Dockerfile in the project root:
# Build stage
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src
COPY ["AiApi.csproj", "."]
RUN dotnet restore "./AiApi.csproj"
COPY . .
RUN dotnet publish "AiApi.csproj" -c Release -o /app/publish --no-restore
# Runtime stage
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS final
WORKDIR /app
EXPOSE 8080
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "AiApi.dll"]
The runtime image uses the ASP.NET Core runtime (not the SDK), keeping the final image lean. Port 8080 is the default for Azure Container Apps.
Build and Push to Azure Container Registry
# Create a resource group and registry
az group create --name rg-ai-api --location eastus
az acr create --resource-group rg-ai-api --name youracr --sku Basic --admin-enabled false
# Build and push using ACR Tasks (no Docker daemon required)
az acr build --registry youracr --image ai-api:latest .
Deploy to Azure Container Apps
# Create a Container Apps environment
az containerapp env create \
--name cae-ai-api \
--resource-group rg-ai-api \
--location eastus
# Deploy the container app with a system-assigned managed identity
az containerapp create \
--name ca-ai-api \
--resource-group rg-ai-api \
--environment cae-ai-api \
--image youracr.azurecr.io/ai-api:latest \
--target-port 8080 \
--ingress external \
--assign-identity system \
--env-vars \
"AzureOpenAI__Endpoint=https://your-resource.openai.azure.com/" \
--registry-server youracr.azurecr.io \
--registry-identity system
Notice that AzureOpenAI:ApiKey is intentionally absent from --env-vars. The app uses DefaultAzureCredential when no API key is configured — the managed identity handles authentication.
Grant the Managed Identity Access to Azure OpenAI
# Get the managed identity's principal ID
PRINCIPAL_ID=$(az containerapp show \
--name ca-ai-api \
--resource-group rg-ai-api \
--query "identity.principalId" \
--output tsv)
# Get the Azure OpenAI resource ID
OPENAI_RESOURCE_ID=$(az cognitiveservices account show \
--name your-openai-resource \
--resource-group rg-openai \
--query id \
--output tsv)
# Assign "Cognitive Services OpenAI User" role
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Cognitive Services OpenAI User" \
--scope $OPENAI_RESOURCE_ID
With this role assignment, the container app’s managed identity can call Azure OpenAI. No API keys are stored anywhere — not in environment variables, not in secrets, not in code.
Verify the deployment:
# Get the app URL
APP_URL=$(az containerapp show \
--name ca-ai-api \
--resource-group rg-ai-api \
--query "properties.configuration.ingress.fqdn" \
--output tsv)
# Test the chat endpoint
curl -X POST "https://$APP_URL/chat" \
-H "Content-Type: application/json" \
-d '{"message": "Hello from Azure Container Apps!"}' \
--no-buffer