Why This Layer Exists
Before Microsoft.Extensions.AI, every AI provider in .NET had its own client library with its own API surface. Azure OpenAI used AzureOpenAIClient. OpenAI used OpenAIClient. Ollama used third-party libraries with completely different method signatures. Switching providers meant rewriting every call site.
Microsoft.Extensions.AI solves this the same way ILogger solved logging — by defining a common interface that all providers implement. You program against IChatClient, register the provider in DI, and swap implementations without changing your business code.
Where It Sits in the Stack
If you only need chat completion or embedding generation, Microsoft.Extensions.AI is all you need. SK and Agent Framework are additional layers that add orchestration and agent capabilities on top.
The Two Core Interfaces
IChatClient
The interface for text generation — chat, completion, reasoning:
public interface IChatClient : IDisposable
{
Task<ChatResponse> GetResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
IEnumerable<ChatMessage> messages,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
ChatClientMetadata Metadata { get; }
}
Two methods — synchronous completion and streaming. That’s the entire contract.
IEmbeddingGenerator
The interface for vector embeddings:
public interface IEmbeddingGenerator<TInput, TEmbedding> : IDisposable
where TEmbedding : Embedding
{
Task<GeneratedEmbeddings<TEmbedding>> GenerateAsync(
IEnumerable<TInput> values,
EmbeddingGenerationOptions? options = null,
CancellationToken cancellationToken = default);
}
Turns text (or other inputs) into vectors. Used by RAG systems, semantic search, and memory stores.
Getting Started
Install the package:
dotnet add package Microsoft.Extensions.AI
For Azure OpenAI:
dotnet add package Azure.AI.OpenAI
Basic Chat Completion
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
IChatClient client = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new System.ClientModel.ApiKeyCredential(
Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!))
.GetChatClient("chat-deployment")
.AsIChatClient();
var response = await client.GetResponseAsync("What is dependency injection?");
Console.WriteLine(response.Text);
The key method is .AsIChatClient() — it wraps the provider-specific client in the IChatClient interface.
Using Dependency Injection
The real power shows in DI-based applications:
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
var builder = Host.CreateApplicationBuilder(args);
// Register the AI client — change this one line to switch providers
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(
new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
new System.ClientModel.ApiKeyCredential(
builder.Configuration["AzureOpenAI:Key"]!))
.GetChatClient("chat-deployment")
.AsIChatClient());
builder.Services.AddTransient<MyAiService>();
var app = builder.Build();
var service = app.Services.GetRequiredService<MyAiService>();
await service.RunAsync();
Your service depends only on IChatClient:
public class MyAiService
{
private readonly IChatClient _chatClient;
public MyAiService(IChatClient chatClient)
{
_chatClient = chatClient;
}
public async Task RunAsync()
{
var response = await _chatClient.GetResponseAsync(
"Explain the strategy pattern in 3 sentences.");
Console.WriteLine(response.Text);
}
}
Switching to Ollama
To swap Azure OpenAI for a local Ollama instance, change only the registration:
dotnet add package OllamaChatClient
// Before (Azure OpenAI)
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("chat-deployment")
.AsIChatClient());
// After (Ollama - local)
builder.Services.AddChatClient(services =>
new OllamaChatClient(new Uri("http://localhost:11434"), "llama3"));
MyAiService doesn’t change at all. It still receives an IChatClient and calls GetResponseAsync on it.
Middleware Pipelines
The killer feature of Microsoft.Extensions.AI is composable middleware. You can wrap any IChatClient with cross-cutting concerns:
Logging
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("chat-deployment")
.AsIChatClient()
.AsBuilder()
.UseLogging()
.Build(services));
Every request and response gets logged through ILogger.
Caching
builder.Services.AddDistributedMemoryCache();
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("chat-deployment")
.AsIChatClient()
.AsBuilder()
.UseDistributedCache()
.Build(services));
Identical prompts return cached responses, saving tokens and reducing latency.
Stacking Middleware
Middleware composes. Order matters — outermost wraps first:
builder.Services.AddChatClient(services =>
new AzureOpenAIClient(endpoint, key)
.GetChatClient("chat-deployment")
.AsIChatClient()
.AsBuilder()
.UseLogging() // Log all requests
.UseDistributedCache() // Cache after logging
.UseOpenTelemetry() // Trace all calls
.Build(services));
This gives you observability (logging + telemetry) and performance (caching) across every AI call in your application, regardless of which service makes the call. You can also add resilience middleware to handle 429 rate limit errors from Azure OpenAI — see Fix Azure OpenAI 429 Too Many Requests in .NET for the Polly-based patterns that integrate cleanly with this pipeline.
Streaming
Every IChatClient supports streaming out of the box:
await foreach (var update in _chatClient.GetStreamingResponseAsync("Explain SOLID principles"))
{
Console.Write(update.Text);
}
This works with Azure OpenAI, OpenAI, Ollama — any provider. The streaming contract is part of the interface.
Embedding Generation
For RAG and semantic search, use IEmbeddingGenerator:
using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
new AzureOpenAIClient(endpoint, key)
.GetEmbeddingClient("text-embedding-3-small")
.AsIEmbeddingGenerator();
var embeddings = await embeddingGenerator.GenerateAsync(
["What is dependency injection?", "Explain the repository pattern"]);
foreach (var embedding in embeddings)
{
Console.WriteLine($"Vector dimension: {embedding.Vector.Length}");
}
When to Use This vs Semantic Kernel
The decision tree is straightforward:
Use Microsoft.Extensions.AI directly when:
- You need chat completion in a service
- You need embeddings for vector search
- You want provider-agnostic code with DI
- You don’t need function calling, plugins, or memory
- You want the lightest possible dependency
Use Semantic Kernel when:
- You need automatic function calling (tool use)
- You need plugins exposing C# code to the LLM
- You need memory (vector stores, chat history management)
- You need prompt templates with variable substitution
- You’re building features that chain multiple AI operations
SK uses IChatClient internally. Registering SK in your DI still gives you provider-agnostic code — SK just adds the orchestration layer.
Integration with Semantic Kernel
SK consumes IChatClient implementations. When you configure SK with AddAzureOpenAIChatCompletion, it registers an IChatClient under the hood:
// These are equivalent:
// 1. Direct Extensions.AI registration
builder.Services.AddChatClient(services => azureClient.AsIChatClient());
// 2. SK registration (also registers IChatClient)
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion("chat-deployment", endpoint, key);
This means services that only need IChatClient can coexist with services that need the full Kernel — both share the same underlying AI connection.
Next Steps
- What is Semantic Kernel? — The orchestration layer built on these interfaces
- Microsoft.Extensions.AI 1.0.3 Released — Release notes and what’s new
- Semantic Kernel Memory and Vector Stores — Using IEmbeddingGenerator for RAG
- University: Microsoft.Extensions.AI vs Semantic Kernel vs Agent Framework — Decision guide for choosing the right layer