What You Will Build
By the end of this workshop you will have a running Blazor Server chatbot that:
- Streams AI responses token-by-token to the browser using Semantic Kernel and Azure OpenAI
- Maintains isolated per-user conversation history across turns in a single browser session
- Calls a
[KernelFunction]plugin automatically when the LLM decides it is relevant - Validates input and surfaces errors gracefully without crashing the streaming loop
- Ships as a Docker container with managed identity secrets management
The architecture looks like this:
Step 1 — Project Setup
Create a new Blazor Server project and install the required NuGet packages:
dotnet new blazorserver -o AIChatbot
cd AIChatbot
dotnet add package Microsoft.SemanticKernel --version 1.54.0
dotnet add package Azure.AI.OpenAI --version 2.1.0
Microsoft.SemanticKernel brings in the Semantic Kernel core, the IChatCompletionService abstraction, and the Azure OpenAI connector. Azure.AI.OpenAI provides the underlying HTTP client and authentication support.
Open appsettings.json and add your Azure OpenAI credentials:
{
"AzureOpenAI": {
"Endpoint": "https://<your-resource>.openai.azure.com/",
"ApiKey": "<your-api-key>",
"DeploymentName": "gpt-4o"
},
"Logging": {
"LogLevel": {
"Default": "Information"
}
}
}
Never commit the API key to source control. You will replace this with Key Vault references in the deployment step.
Step 2 — Dependency Injection Configuration
Open Program.cs and configure Semantic Kernel and the per-user chat history:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using AIChatbot.Plugins;
var builder = WebApplication.CreateBuilder(args);
// Add Blazor services
builder.Services.AddRazorPages();
builder.Services.AddServerSideBlazor();
// Configure Semantic Kernel with Azure OpenAI
var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"]!;
var deployment = builder.Configuration["AzureOpenAI:DeploymentName"]!;
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);
// Register the weather plugin so it participates in function calling
builder.Services.AddScoped<WeatherPlugin>();
// Register ChatHistory as Scoped — each Blazor circuit gets its own instance
// This is the critical isolation point for per-user conversation state
builder.Services.AddScoped(_ =>
new ChatHistory("You are a helpful .NET assistant. Answer concisely and accurately."));
var app = builder.Build();
app.UseHttpsRedirection();
app.UseStaticFiles();
app.UseRouting();
app.MapBlazorHub();
app.MapFallbackToPage("/_Host");
app.Run();
AddKernel() returns an IKernelBuilder. Chaining .AddAzureOpenAIChatCompletion() registers the Azure OpenAI chat completion service as the IChatCompletionService implementation inside the kernel’s internal service provider.
ChatHistory is registered as Scoped. In Blazor Server, the DI Scoped lifetime maps to the SignalR circuit lifetime — one scope per browser tab connection. This means every user gets an isolated ChatHistory for the duration of their session without any manual session management.
Step 3 — Building the Chat UI Component
Create the chat component at Pages/Chat.razor:
@page "/chat"
@rendermode InteractiveServer
@attribute [StreamRendering(true)]
@inject Kernel Kernel
@inject ChatHistory ChatHistory
@inject WeatherPlugin WeatherPlugin
@using Microsoft.SemanticKernel
@using Microsoft.SemanticKernel.ChatCompletion
@using Microsoft.SemanticKernel.Connectors.OpenAI
@using System.Text
<PageTitle>AI Chatbot</PageTitle>
<div class="chat-container">
<div class="chat-messages" id="chatMessages">
@foreach (var message in _displayMessages)
{
<div class="message @(message.IsUser ? "user-message" : "assistant-message")">
<div class="message-bubble">
<span class="message-role">@(message.IsUser ? "You" : "Assistant")</span>
<p class="message-content">@message.Content</p>
</div>
</div>
}
@if (_isStreaming)
{
<div class="message assistant-message">
<div class="message-bubble">
<span class="message-role">Assistant</span>
<p class="message-content">@_streamingBuffer<span class="cursor">|</span></p>
</div>
</div>
}
@if (!string.IsNullOrEmpty(_errorMessage))
{
<div class="message error-message">
<div class="message-bubble error">
<p class="message-content">@_errorMessage</p>
</div>
</div>
}
</div>
<div class="chat-input-area">
<textarea
@bind="_userInput"
@bind:event="oninput"
@onkeydown="HandleKeyDown"
placeholder="Type a message..."
rows="2"
disabled="@_isStreaming"
class="chat-input"></textarea>
<button
@onclick="SendMessageAsync"
disabled="@(_isStreaming || string.IsNullOrWhiteSpace(_userInput))"
class="send-button">
@(_isStreaming ? "Thinking..." : "Send")
</button>
</div>
</div>
@code {
private record DisplayMessage(string Content, bool IsUser);
private readonly List<DisplayMessage> _displayMessages = [];
private string _userInput = "";
private string _streamingBuffer = "";
private bool _isStreaming;
private string _errorMessage = "";
private async Task HandleKeyDown(KeyboardEventArgs e)
{
if (e.Key == "Enter" && !e.ShiftKey && !_isStreaming)
{
await SendMessageAsync();
}
}
private async Task SendMessageAsync()
{
var userText = _userInput.Trim();
// Validate input
if (string.IsNullOrWhiteSpace(userText))
return;
if (userText.Length > 2000)
{
_errorMessage = "Message too long. Please keep messages under 2000 characters.";
await InvokeAsync(StateHasChanged);
return;
}
_errorMessage = "";
_userInput = "";
_isStreaming = true;
_streamingBuffer = "";
_displayMessages.Add(new DisplayMessage(userText, IsUser: true));
await InvokeAsync(StateHasChanged);
await StreamResponseAsync(userText);
}
}
The component uses two rendering mechanisms together. @rendermode InteractiveServer enables two-way Blazor interactivity over SignalR. [StreamRendering(true)] allows the server to stream incremental HTML updates during the initial render cycle, which means users see the loading state immediately rather than waiting for the full page.
The _displayMessages list holds completed messages that are rendered as static bubbles. The _streamingBuffer string holds the in-progress assistant response being built token by token.
Step 4 — Implementing the Streaming Loop
Add the StreamResponseAsync method to the @code block in Chat.razor:
private async Task StreamResponseAsync(string userText)
{
var responseBuffer = new StringBuilder();
try
{
// Add the user message to the tracked history
ChatHistory.AddUserMessage(userText);
// Configure execution settings — Auto enables function calling
var settings = new OpenAIPromptExecutionSettings
{
FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};
// Get IChatCompletionService from the kernel's service provider
var chatCompletionService = Kernel.Services
.GetRequiredService<IChatCompletionService>();
// Register the weather plugin for this request
var kernelWithPlugin = Kernel.Clone();
kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");
// Stream the response token by token
await foreach (var chunk in chatCompletionService
.GetStreamingChatMessageContentsAsync(
ChatHistory,
settings,
kernelWithPlugin))
{
if (chunk.Content is not null)
{
responseBuffer.Append(chunk.Content);
_streamingBuffer = responseBuffer.ToString();
// Marshal the UI update to the Blazor render thread
await InvokeAsync(StateHasChanged);
}
}
// Streaming complete — move buffer to display messages
var fullResponse = responseBuffer.ToString();
ChatHistory.AddAssistantMessage(fullResponse);
_displayMessages.Add(new DisplayMessage(fullResponse, IsUser: false));
_streamingBuffer = "";
// Apply sliding window to prevent unbounded history growth
ApplySlidingWindow(maxMessages: 20);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
_errorMessage = $"Something went wrong: {ex.Message}. Please try again.";
// Restore the partial response to history if we have something useful
if (responseBuffer.Length > 0)
{
_displayMessages.Add(
new DisplayMessage(responseBuffer.ToString() + " [response truncated]", IsUser: false));
}
}
finally
{
_isStreaming = false;
await InvokeAsync(StateHasChanged);
}
}
private void ApplySlidingWindow(int maxMessages)
{
// ChatHistory[0] is always the system message — never remove it
int nonSystemCount = ChatHistory.Count - 1;
int excess = nonSystemCount - maxMessages;
if (excess > 0)
{
ChatHistory.RemoveRange(1, excess);
}
}
The critical line is await InvokeAsync(StateHasChanged). Blazor Server components have a synchronization context that is tied to the SignalR circuit dispatcher. The await foreach loop runs asynchronously — potentially on a thread pool thread — which is outside the component’s render thread. Calling StateHasChanged() directly from a background thread raises a threading exception in production. InvokeAsync marshals the call to the correct context.
For a deep understanding of how GetStreamingChatMessageContentsAsync and IAsyncEnumerable work under the hood, including backpressure and cancellation token handling, see Build a Streaming Chat API with Azure OpenAI and .NET.
Step 5 — Per-User Chat History Management
The ChatHistory injected into the component is already isolated per circuit because it is registered as Scoped. However, without truncation, history grows without bound and will eventually exhaust the model’s context window.
The ApplySlidingWindow call after each assistant response keeps the last 20 non-system messages. For a detailed comparison of sliding window, token-aware truncation, summarization, and hybrid strategies, see Semantic Kernel Chat History Management — Sliding Windows, Summarization, and Token-Aware Truncation.
For applications where users resume conversations across browser sessions (after closing the tab), you need external persistence. The pattern is to serialize ChatHistory to Redis on circuit close and deserialize it on reconnect:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using System.Text.Json;
// In a ChatHistoryPersistenceService.cs
public class ChatHistoryPersistenceService(IDistributedCache cache)
{
private static readonly string SystemPrompt =
"You are a helpful .NET assistant. Answer concisely and accurately.";
public async Task<ChatHistory> LoadAsync(string userId, CancellationToken ct = default)
{
var bytes = await cache.GetAsync(userId, ct);
if (bytes is null || bytes.Length == 0)
return new ChatHistory(SystemPrompt);
var records = JsonSerializer.Deserialize<List<HistoryRecord>>(bytes) ?? [];
var history = new ChatHistory();
foreach (var record in records)
{
history.Add(new ChatMessageContent(
new AuthorRole(record.Role),
record.Content));
}
return history;
}
public async Task SaveAsync(string userId, ChatHistory history, CancellationToken ct = default)
{
var records = history
.Select(m => new HistoryRecord(m.Role.ToString(), m.Content ?? ""))
.ToList();
var bytes = JsonSerializer.SerializeToUtf8Bytes(records);
await cache.SetAsync(userId, bytes, new DistributedCacheEntryOptions
{
SlidingExpiration = TimeSpan.FromHours(24)
}, ct);
}
private record HistoryRecord(string Role, string Content);
}
Register in Program.cs:
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration["Redis:ConnectionString"];
options.InstanceName = "chatbot:";
});
builder.Services.AddScoped<ChatHistoryPersistenceService>();
Step 6 — Function Calling with KernelFunction Plugins
Create a Plugins folder and add a WeatherPlugin.cs:
using System.ComponentModel;
using Microsoft.SemanticKernel;
namespace AIChatbot.Plugins;
/// <summary>
/// Provides current weather data for a given city.
/// In production, replace the stub with a real weather API call.
/// </summary>
public class WeatherPlugin
{
[KernelFunction("get_current_weather")]
[Description("Gets the current weather conditions for a specified city. " +
"Returns temperature in Celsius and a brief conditions summary.")]
public string GetCurrentWeather(
[Description("The city name to get weather for, e.g. 'London' or 'Seattle'")] string city)
{
// Stub implementation — replace with a real weather API in production
var conditions = city.ToLowerInvariant() switch
{
"london" => "12°C, overcast with light drizzle",
"seattle" => "9°C, partly cloudy",
"new york" => "18°C, sunny",
_ => "20°C, clear skies"
};
return $"Current weather in {city}: {conditions}";
}
}
The [KernelFunction] attribute marks the method as a tool the LLM can call. The [Description] attributes are sent to the model as part of the function schema — precise descriptions are critical for accurate function selection. The model cannot see your code; it only sees the descriptions.
In the streaming loop from Step 4, the plugin is added to a cloned kernel:
var kernelWithPlugin = Kernel.Clone();
kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");
Cloning the kernel ensures the plugin registration is request-scoped and does not mutate the shared kernel singleton. With FunctionChoiceBehavior.Auto(), the model will call get_current_weather automatically when the user asks about weather — no routing code required.
When the model calls a function, Semantic Kernel intercepts the tool call from the streaming response, invokes your C# method, feeds the result back to the model, and continues streaming the final response. This entire loop happens transparently inside GetStreamingChatMessageContentsAsync.
Step 7 — Content Safety and Input Validation
The input validation in SendMessageAsync covers basic length checks. Expand this with server-side prompt injection resistance:
private static readonly string[] ForbiddenPatterns =
[
"ignore previous instructions",
"disregard your system prompt",
"you are now",
"act as if you are"
];
private bool IsInputSafe(string input)
{
var lower = input.ToLowerInvariant();
return !ForbiddenPatterns.Any(pattern => lower.Contains(pattern));
}
private async Task SendMessageAsync()
{
var userText = _userInput.Trim();
if (string.IsNullOrWhiteSpace(userText))
return;
if (userText.Length > 2000)
{
_errorMessage = "Message too long. Please keep messages under 2000 characters.";
await InvokeAsync(StateHasChanged);
return;
}
if (!IsInputSafe(userText))
{
_errorMessage = "Your message contains content that cannot be processed. Please rephrase.";
await InvokeAsync(StateHasChanged);
return;
}
// Continue with the streaming request...
}
Azure OpenAI also applies server-side content filtering. When a request triggers a content filter, the SDK throws a HttpOperationException with status 400. The catch block in StreamResponseAsync already handles this — the error message displays to the user without crashing the component.
For production deployments, configure Azure AI Content Safety as an additional filter layer and add Semantic Kernel filters (prompt filters and function invocation filters) to log and monitor what the model is receiving and returning. See Semantic Kernel Filters and Middleware in C# for implementation patterns.
Step 8 — Dockerfile and Azure Deployment
Blazor Server requires a running ASP.NET Core process for SignalR — it cannot be deployed as a static site. Create a multi-stage Dockerfile at the project root:
# Stage 1: Build
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src
COPY ["AIChatbot.csproj", "./"]
RUN dotnet restore "AIChatbot.csproj"
COPY . .
RUN dotnet publish "AIChatbot.csproj" \
-c Release \
-o /app/publish \
--no-restore
# Stage 2: Runtime
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS final
WORKDIR /app
COPY --from=build /app/publish .
# Non-root user for container security
RUN adduser --disabled-password --gecos "" appuser
USER appuser
EXPOSE 8080
ENTRYPOINT ["dotnet", "AIChatbot.dll"]
Build and test the container locally:
docker build -t aichatbot:local .
docker run -p 8080:8080 \
-e AzureOpenAI__Endpoint="https://<your-resource>.openai.azure.com/" \
-e AzureOpenAI__ApiKey="<your-key>" \
-e AzureOpenAI__DeploymentName="gpt-4o" \
aichatbot:local
Managed Identity and Key Vault for Production
For production, replace the API key with managed identity authentication. Update Program.cs:
using Azure.Identity;
using Microsoft.SemanticKernel;
// Use DefaultAzureCredential — works with managed identity on Azure App Service
// and developer identity locally (via az login)
var credential = new DefaultAzureCredential();
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(
deploymentName: builder.Configuration["AzureOpenAI:DeploymentName"]!,
endpoint: new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
credentials: credential);
In Azure App Service:
- Enable the system-assigned managed identity on the App Service resource
- Grant the managed identity the
Cognitive Services OpenAI Userrole on the Azure OpenAI resource - Remove the
ApiKeyfrom your configuration entirely
The DefaultAzureCredential from Azure.Identity automatically uses the managed identity token when running on Azure, and falls back to your local az login credentials during development — no code changes needed between environments.
Complete Component File
Here is the complete Pages/Chat.razor for reference:
@page "/chat"
@rendermode InteractiveServer
@attribute [StreamRendering(true)]
@inject Kernel Kernel
@inject ChatHistory ChatHistory
@inject WeatherPlugin WeatherPlugin
@using Microsoft.SemanticKernel
@using Microsoft.SemanticKernel.ChatCompletion
@using Microsoft.SemanticKernel.Connectors.OpenAI
@using System.Text
<PageTitle>AI Chatbot</PageTitle>
<div class="chat-container">
<div class="chat-messages">
@foreach (var message in _displayMessages)
{
<div class="message @(message.IsUser ? "user-message" : "assistant-message")">
<div class="message-bubble">
<span class="message-role">@(message.IsUser ? "You" : "Assistant")</span>
<p class="message-content">@message.Content</p>
</div>
</div>
}
@if (_isStreaming)
{
<div class="message assistant-message">
<div class="message-bubble">
<span class="message-role">Assistant</span>
<p class="message-content">@_streamingBuffer<span class="cursor">|</span></p>
</div>
</div>
}
@if (!string.IsNullOrEmpty(_errorMessage))
{
<div class="message error-message">
<div class="message-bubble error">
<p>@_errorMessage</p>
</div>
</div>
}
</div>
<div class="chat-input-area">
<textarea
@bind="_userInput"
@bind:event="oninput"
@onkeydown="HandleKeyDown"
placeholder="Type a message... (Enter to send, Shift+Enter for new line)"
rows="2"
disabled="@_isStreaming"
class="chat-input"></textarea>
<button
@onclick="SendMessageAsync"
disabled="@(_isStreaming || string.IsNullOrWhiteSpace(_userInput))"
class="send-button">
@(_isStreaming ? "Thinking..." : "Send")
</button>
</div>
</div>
@code {
private record DisplayMessage(string Content, bool IsUser);
private readonly List<DisplayMessage> _displayMessages = [];
private string _userInput = "";
private string _streamingBuffer = "";
private bool _isStreaming;
private string _errorMessage = "";
private static readonly string[] ForbiddenPatterns =
[
"ignore previous instructions",
"disregard your system prompt",
"you are now",
"act as if you are"
];
private async Task HandleKeyDown(KeyboardEventArgs e)
{
if (e.Key == "Enter" && !e.ShiftKey && !_isStreaming)
{
await SendMessageAsync();
}
}
private bool IsInputSafe(string input)
{
var lower = input.ToLowerInvariant();
return !ForbiddenPatterns.Any(pattern => lower.Contains(pattern));
}
private async Task SendMessageAsync()
{
var userText = _userInput.Trim();
if (string.IsNullOrWhiteSpace(userText))
return;
if (userText.Length > 2000)
{
_errorMessage = "Message too long. Please keep messages under 2000 characters.";
await InvokeAsync(StateHasChanged);
return;
}
if (!IsInputSafe(userText))
{
_errorMessage = "Your message contains content that cannot be processed. Please rephrase.";
await InvokeAsync(StateHasChanged);
return;
}
_errorMessage = "";
_userInput = "";
_isStreaming = true;
_streamingBuffer = "";
_displayMessages.Add(new DisplayMessage(userText, IsUser: true));
await InvokeAsync(StateHasChanged);
await StreamResponseAsync(userText);
}
private async Task StreamResponseAsync(string userText)
{
var responseBuffer = new StringBuilder();
try
{
ChatHistory.AddUserMessage(userText);
var settings = new OpenAIPromptExecutionSettings
{
FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};
var chatCompletionService = Kernel.Services
.GetRequiredService<IChatCompletionService>();
var kernelWithPlugin = Kernel.Clone();
kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");
await foreach (var chunk in chatCompletionService
.GetStreamingChatMessageContentsAsync(
ChatHistory,
settings,
kernelWithPlugin))
{
if (chunk.Content is not null)
{
responseBuffer.Append(chunk.Content);
_streamingBuffer = responseBuffer.ToString();
await InvokeAsync(StateHasChanged);
}
}
var fullResponse = responseBuffer.ToString();
ChatHistory.AddAssistantMessage(fullResponse);
_displayMessages.Add(new DisplayMessage(fullResponse, IsUser: false));
_streamingBuffer = "";
// Apply sliding window to prevent context window exhaustion
int nonSystemCount = ChatHistory.Count - 1;
int excess = nonSystemCount - 20;
if (excess > 0)
{
ChatHistory.RemoveRange(1, excess);
}
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
_errorMessage = $"Something went wrong: {ex.Message}. Please try again.";
if (responseBuffer.Length > 0)
{
_displayMessages.Add(
new DisplayMessage(responseBuffer.ToString() + " [truncated]", IsUser: false));
}
}
finally
{
_isStreaming = false;
await InvokeAsync(StateHasChanged);
}
}
}
What You Learned
This workshop covered the full path from an empty directory to a streaming Blazor Server AI chatbot. You configured Semantic Kernel with AddKernel().AddAzureOpenAIChatCompletion() and isolated per-user conversation state with a Scoped ChatHistory. You implemented token-by-token streaming using GetStreamingChatMessageContentsAsync and await InvokeAsync(StateHasChanged) for thread-safe UI updates. You added automatic function calling with a [KernelFunction] plugin and FunctionChoiceBehavior.Auto(). You applied sliding window truncation to prevent context window exhaustion. Finally, you containerized the app and configured managed identity for production secret management.