Skip to main content

Build a Streaming AI Chatbot with Blazor and Semantic Kernel

Intermediate Original .NET 9 Microsoft.SemanticKernel 1.54.0 Azure.AI.OpenAI 2.1.0
By Rajesh Mishra · Mar 21, 2026 · 18 min read
Verified Mar 2026 .NET 9 Microsoft.SemanticKernel 1.54.0
In 30 Seconds

This workshop builds a production-ready streaming AI chatbot using Blazor Server, Semantic Kernel 1.54.0, and Azure OpenAI. It covers project setup, DI configuration with AddKernel and AddAzureOpenAIChatCompletion, streaming token output with GetStreamingChatMessageContentsAsync and InvokeAsync(StateHasChanged), per-user ChatHistory as a Scoped service with sliding window truncation, function calling via KernelFunction plugins, content safety validation, and containerized deployment to Azure App Service with managed identity.

What You'll Build

Build a real-time streaming AI chatbot in Blazor Server with Semantic Kernel and Azure OpenAI. DI setup, StreamRendering, per-user chat history, and function calling.

Microsoft.SemanticKernel 1.54.0Azure.AI.OpenAI 2.1.0 .NET 9 · 18 min read to complete

What You Will Build

By the end of this workshop you will have a running Blazor Server chatbot that:

  • Streams AI responses token-by-token to the browser using Semantic Kernel and Azure OpenAI
  • Maintains isolated per-user conversation history across turns in a single browser session
  • Calls a [KernelFunction] plugin automatically when the LLM decides it is relevant
  • Validates input and surfaces errors gracefully without crashing the streaming loop
  • Ships as a Docker container with managed identity secrets management

The architecture looks like this:

Blazor Browser(WASM)Blazor ServerSemantic KernelAzure OpenAI user messageGetStreamingChatMessageContentsAsyncstream requesttoken chunksIAsyncEnumerableSSE / SignalR

Step 1 — Project Setup

Create a new Blazor Server project and install the required NuGet packages:

dotnet new blazorserver -o AIChatbot
cd AIChatbot
dotnet add package Microsoft.SemanticKernel --version 1.54.0
dotnet add package Azure.AI.OpenAI --version 2.1.0

Microsoft.SemanticKernel brings in the Semantic Kernel core, the IChatCompletionService abstraction, and the Azure OpenAI connector. Azure.AI.OpenAI provides the underlying HTTP client and authentication support.

Open appsettings.json and add your Azure OpenAI credentials:

{
  "AzureOpenAI": {
    "Endpoint": "https://<your-resource>.openai.azure.com/",
    "ApiKey": "<your-api-key>",
    "DeploymentName": "gpt-4o"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

Never commit the API key to source control. You will replace this with Key Vault references in the deployment step.

Step 2 — Dependency Injection Configuration

Open Program.cs and configure Semantic Kernel and the per-user chat history:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using AIChatbot.Plugins;

var builder = WebApplication.CreateBuilder(args);

// Add Blazor services
builder.Services.AddRazorPages();
builder.Services.AddServerSideBlazor();

// Configure Semantic Kernel with Azure OpenAI
var endpoint = builder.Configuration["AzureOpenAI:Endpoint"]!;
var apiKey = builder.Configuration["AzureOpenAI:ApiKey"]!;
var deployment = builder.Configuration["AzureOpenAI:DeploymentName"]!;

builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);

// Register the weather plugin so it participates in function calling
builder.Services.AddScoped<WeatherPlugin>();

// Register ChatHistory as Scoped — each Blazor circuit gets its own instance
// This is the critical isolation point for per-user conversation state
builder.Services.AddScoped(_ =>
    new ChatHistory("You are a helpful .NET assistant. Answer concisely and accurately."));

var app = builder.Build();

app.UseHttpsRedirection();
app.UseStaticFiles();
app.UseRouting();

app.MapBlazorHub();
app.MapFallbackToPage("/_Host");

app.Run();

AddKernel() returns an IKernelBuilder. Chaining .AddAzureOpenAIChatCompletion() registers the Azure OpenAI chat completion service as the IChatCompletionService implementation inside the kernel’s internal service provider.

ChatHistory is registered as Scoped. In Blazor Server, the DI Scoped lifetime maps to the SignalR circuit lifetime — one scope per browser tab connection. This means every user gets an isolated ChatHistory for the duration of their session without any manual session management.

Step 3 — Building the Chat UI Component

Create the chat component at Pages/Chat.razor:

@page "/chat"
@rendermode InteractiveServer
@attribute [StreamRendering(true)]
@inject Kernel Kernel
@inject ChatHistory ChatHistory
@inject WeatherPlugin WeatherPlugin
@using Microsoft.SemanticKernel
@using Microsoft.SemanticKernel.ChatCompletion
@using Microsoft.SemanticKernel.Connectors.OpenAI
@using System.Text

<PageTitle>AI Chatbot</PageTitle>

<div class="chat-container">
    <div class="chat-messages" id="chatMessages">
        @foreach (var message in _displayMessages)
        {
            <div class="message @(message.IsUser ? "user-message" : "assistant-message")">
                <div class="message-bubble">
                    <span class="message-role">@(message.IsUser ? "You" : "Assistant")</span>
                    <p class="message-content">@message.Content</p>
                </div>
            </div>
        }

        @if (_isStreaming)
        {
            <div class="message assistant-message">
                <div class="message-bubble">
                    <span class="message-role">Assistant</span>
                    <p class="message-content">@_streamingBuffer<span class="cursor">|</span></p>
                </div>
            </div>
        }

        @if (!string.IsNullOrEmpty(_errorMessage))
        {
            <div class="message error-message">
                <div class="message-bubble error">
                    <p class="message-content">@_errorMessage</p>
                </div>
            </div>
        }
    </div>

    <div class="chat-input-area">
        <textarea
            @bind="_userInput"
            @bind:event="oninput"
            @onkeydown="HandleKeyDown"
            placeholder="Type a message..."
            rows="2"
            disabled="@_isStreaming"
            class="chat-input"></textarea>
        <button
            @onclick="SendMessageAsync"
            disabled="@(_isStreaming || string.IsNullOrWhiteSpace(_userInput))"
            class="send-button">
            @(_isStreaming ? "Thinking..." : "Send")
        </button>
    </div>
</div>

@code {
    private record DisplayMessage(string Content, bool IsUser);

    private readonly List<DisplayMessage> _displayMessages = [];
    private string _userInput = "";
    private string _streamingBuffer = "";
    private bool _isStreaming;
    private string _errorMessage = "";

    private async Task HandleKeyDown(KeyboardEventArgs e)
    {
        if (e.Key == "Enter" && !e.ShiftKey && !_isStreaming)
        {
            await SendMessageAsync();
        }
    }

    private async Task SendMessageAsync()
    {
        var userText = _userInput.Trim();

        // Validate input
        if (string.IsNullOrWhiteSpace(userText))
            return;

        if (userText.Length > 2000)
        {
            _errorMessage = "Message too long. Please keep messages under 2000 characters.";
            await InvokeAsync(StateHasChanged);
            return;
        }

        _errorMessage = "";
        _userInput = "";
        _isStreaming = true;
        _streamingBuffer = "";

        _displayMessages.Add(new DisplayMessage(userText, IsUser: true));
        await InvokeAsync(StateHasChanged);

        await StreamResponseAsync(userText);
    }
}

The component uses two rendering mechanisms together. @rendermode InteractiveServer enables two-way Blazor interactivity over SignalR. [StreamRendering(true)] allows the server to stream incremental HTML updates during the initial render cycle, which means users see the loading state immediately rather than waiting for the full page.

The _displayMessages list holds completed messages that are rendered as static bubbles. The _streamingBuffer string holds the in-progress assistant response being built token by token.

Step 4 — Implementing the Streaming Loop

Add the StreamResponseAsync method to the @code block in Chat.razor:

private async Task StreamResponseAsync(string userText)
{
    var responseBuffer = new StringBuilder();

    try
    {
        // Add the user message to the tracked history
        ChatHistory.AddUserMessage(userText);

        // Configure execution settings — Auto enables function calling
        var settings = new OpenAIPromptExecutionSettings
        {
            FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
        };

        // Get IChatCompletionService from the kernel's service provider
        var chatCompletionService = Kernel.Services
            .GetRequiredService<IChatCompletionService>();

        // Register the weather plugin for this request
        var kernelWithPlugin = Kernel.Clone();
        kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");

        // Stream the response token by token
        await foreach (var chunk in chatCompletionService
            .GetStreamingChatMessageContentsAsync(
                ChatHistory,
                settings,
                kernelWithPlugin))
        {
            if (chunk.Content is not null)
            {
                responseBuffer.Append(chunk.Content);
                _streamingBuffer = responseBuffer.ToString();

                // Marshal the UI update to the Blazor render thread
                await InvokeAsync(StateHasChanged);
            }
        }

        // Streaming complete — move buffer to display messages
        var fullResponse = responseBuffer.ToString();
        ChatHistory.AddAssistantMessage(fullResponse);

        _displayMessages.Add(new DisplayMessage(fullResponse, IsUser: false));
        _streamingBuffer = "";

        // Apply sliding window to prevent unbounded history growth
        ApplySlidingWindow(maxMessages: 20);
    }
    catch (Exception ex) when (ex is not OperationCanceledException)
    {
        _errorMessage = $"Something went wrong: {ex.Message}. Please try again.";

        // Restore the partial response to history if we have something useful
        if (responseBuffer.Length > 0)
        {
            _displayMessages.Add(
                new DisplayMessage(responseBuffer.ToString() + " [response truncated]", IsUser: false));
        }
    }
    finally
    {
        _isStreaming = false;
        await InvokeAsync(StateHasChanged);
    }
}

private void ApplySlidingWindow(int maxMessages)
{
    // ChatHistory[0] is always the system message — never remove it
    int nonSystemCount = ChatHistory.Count - 1;
    int excess = nonSystemCount - maxMessages;

    if (excess > 0)
    {
        ChatHistory.RemoveRange(1, excess);
    }
}

The critical line is await InvokeAsync(StateHasChanged). Blazor Server components have a synchronization context that is tied to the SignalR circuit dispatcher. The await foreach loop runs asynchronously — potentially on a thread pool thread — which is outside the component’s render thread. Calling StateHasChanged() directly from a background thread raises a threading exception in production. InvokeAsync marshals the call to the correct context.

For a deep understanding of how GetStreamingChatMessageContentsAsync and IAsyncEnumerable work under the hood, including backpressure and cancellation token handling, see Build a Streaming Chat API with Azure OpenAI and .NET.

Step 5 — Per-User Chat History Management

The ChatHistory injected into the component is already isolated per circuit because it is registered as Scoped. However, without truncation, history grows without bound and will eventually exhaust the model’s context window.

The ApplySlidingWindow call after each assistant response keeps the last 20 non-system messages. For a detailed comparison of sliding window, token-aware truncation, summarization, and hybrid strategies, see Semantic Kernel Chat History Management — Sliding Windows, Summarization, and Token-Aware Truncation.

For applications where users resume conversations across browser sessions (after closing the tab), you need external persistence. The pattern is to serialize ChatHistory to Redis on circuit close and deserialize it on reconnect:

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using System.Text.Json;

// In a ChatHistoryPersistenceService.cs
public class ChatHistoryPersistenceService(IDistributedCache cache)
{
    private static readonly string SystemPrompt =
        "You are a helpful .NET assistant. Answer concisely and accurately.";

    public async Task<ChatHistory> LoadAsync(string userId, CancellationToken ct = default)
    {
        var bytes = await cache.GetAsync(userId, ct);

        if (bytes is null || bytes.Length == 0)
            return new ChatHistory(SystemPrompt);

        var records = JsonSerializer.Deserialize<List<HistoryRecord>>(bytes) ?? [];
        var history = new ChatHistory();

        foreach (var record in records)
        {
            history.Add(new ChatMessageContent(
                new AuthorRole(record.Role),
                record.Content));
        }

        return history;
    }

    public async Task SaveAsync(string userId, ChatHistory history, CancellationToken ct = default)
    {
        var records = history
            .Select(m => new HistoryRecord(m.Role.ToString(), m.Content ?? ""))
            .ToList();

        var bytes = JsonSerializer.SerializeToUtf8Bytes(records);

        await cache.SetAsync(userId, bytes, new DistributedCacheEntryOptions
        {
            SlidingExpiration = TimeSpan.FromHours(24)
        }, ct);
    }

    private record HistoryRecord(string Role, string Content);
}

Register in Program.cs:

builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = builder.Configuration["Redis:ConnectionString"];
    options.InstanceName = "chatbot:";
});
builder.Services.AddScoped<ChatHistoryPersistenceService>();

Step 6 — Function Calling with KernelFunction Plugins

Create a Plugins folder and add a WeatherPlugin.cs:

using System.ComponentModel;
using Microsoft.SemanticKernel;

namespace AIChatbot.Plugins;

/// <summary>
/// Provides current weather data for a given city.
/// In production, replace the stub with a real weather API call.
/// </summary>
public class WeatherPlugin
{
    [KernelFunction("get_current_weather")]
    [Description("Gets the current weather conditions for a specified city. " +
                 "Returns temperature in Celsius and a brief conditions summary.")]
    public string GetCurrentWeather(
        [Description("The city name to get weather for, e.g. 'London' or 'Seattle'")] string city)
    {
        // Stub implementation — replace with a real weather API in production
        var conditions = city.ToLowerInvariant() switch
        {
            "london" => "12°C, overcast with light drizzle",
            "seattle" => "9°C, partly cloudy",
            "new york" => "18°C, sunny",
            _ => "20°C, clear skies"
        };

        return $"Current weather in {city}: {conditions}";
    }
}

The [KernelFunction] attribute marks the method as a tool the LLM can call. The [Description] attributes are sent to the model as part of the function schema — precise descriptions are critical for accurate function selection. The model cannot see your code; it only sees the descriptions.

In the streaming loop from Step 4, the plugin is added to a cloned kernel:

var kernelWithPlugin = Kernel.Clone();
kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");

Cloning the kernel ensures the plugin registration is request-scoped and does not mutate the shared kernel singleton. With FunctionChoiceBehavior.Auto(), the model will call get_current_weather automatically when the user asks about weather — no routing code required.

When the model calls a function, Semantic Kernel intercepts the tool call from the streaming response, invokes your C# method, feeds the result back to the model, and continues streaming the final response. This entire loop happens transparently inside GetStreamingChatMessageContentsAsync.

Step 7 — Content Safety and Input Validation

The input validation in SendMessageAsync covers basic length checks. Expand this with server-side prompt injection resistance:

private static readonly string[] ForbiddenPatterns =
[
    "ignore previous instructions",
    "disregard your system prompt",
    "you are now",
    "act as if you are"
];

private bool IsInputSafe(string input)
{
    var lower = input.ToLowerInvariant();
    return !ForbiddenPatterns.Any(pattern => lower.Contains(pattern));
}

private async Task SendMessageAsync()
{
    var userText = _userInput.Trim();

    if (string.IsNullOrWhiteSpace(userText))
        return;

    if (userText.Length > 2000)
    {
        _errorMessage = "Message too long. Please keep messages under 2000 characters.";
        await InvokeAsync(StateHasChanged);
        return;
    }

    if (!IsInputSafe(userText))
    {
        _errorMessage = "Your message contains content that cannot be processed. Please rephrase.";
        await InvokeAsync(StateHasChanged);
        return;
    }

    // Continue with the streaming request...
}

Azure OpenAI also applies server-side content filtering. When a request triggers a content filter, the SDK throws a HttpOperationException with status 400. The catch block in StreamResponseAsync already handles this — the error message displays to the user without crashing the component.

For production deployments, configure Azure AI Content Safety as an additional filter layer and add Semantic Kernel filters (prompt filters and function invocation filters) to log and monitor what the model is receiving and returning. See Semantic Kernel Filters and Middleware in C# for implementation patterns.

Step 8 — Dockerfile and Azure Deployment

Blazor Server requires a running ASP.NET Core process for SignalR — it cannot be deployed as a static site. Create a multi-stage Dockerfile at the project root:

# Stage 1: Build
FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
WORKDIR /src

COPY ["AIChatbot.csproj", "./"]
RUN dotnet restore "AIChatbot.csproj"

COPY . .
RUN dotnet publish "AIChatbot.csproj" \
    -c Release \
    -o /app/publish \
    --no-restore

# Stage 2: Runtime
FROM mcr.microsoft.com/dotnet/aspnet:9.0 AS final
WORKDIR /app

COPY --from=build /app/publish .

# Non-root user for container security
RUN adduser --disabled-password --gecos "" appuser
USER appuser

EXPOSE 8080
ENTRYPOINT ["dotnet", "AIChatbot.dll"]

Build and test the container locally:

docker build -t aichatbot:local .
docker run -p 8080:8080 \
  -e AzureOpenAI__Endpoint="https://<your-resource>.openai.azure.com/" \
  -e AzureOpenAI__ApiKey="<your-key>" \
  -e AzureOpenAI__DeploymentName="gpt-4o" \
  aichatbot:local

Managed Identity and Key Vault for Production

For production, replace the API key with managed identity authentication. Update Program.cs:

using Azure.Identity;
using Microsoft.SemanticKernel;

// Use DefaultAzureCredential — works with managed identity on Azure App Service
// and developer identity locally (via az login)
var credential = new DefaultAzureCredential();

builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(
        deploymentName: builder.Configuration["AzureOpenAI:DeploymentName"]!,
        endpoint: new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        credentials: credential);

In Azure App Service:

  1. Enable the system-assigned managed identity on the App Service resource
  2. Grant the managed identity the Cognitive Services OpenAI User role on the Azure OpenAI resource
  3. Remove the ApiKey from your configuration entirely

The DefaultAzureCredential from Azure.Identity automatically uses the managed identity token when running on Azure, and falls back to your local az login credentials during development — no code changes needed between environments.

Complete Component File

Here is the complete Pages/Chat.razor for reference:

@page "/chat"
@rendermode InteractiveServer
@attribute [StreamRendering(true)]
@inject Kernel Kernel
@inject ChatHistory ChatHistory
@inject WeatherPlugin WeatherPlugin
@using Microsoft.SemanticKernel
@using Microsoft.SemanticKernel.ChatCompletion
@using Microsoft.SemanticKernel.Connectors.OpenAI
@using System.Text

<PageTitle>AI Chatbot</PageTitle>

<div class="chat-container">
    <div class="chat-messages">
        @foreach (var message in _displayMessages)
        {
            <div class="message @(message.IsUser ? "user-message" : "assistant-message")">
                <div class="message-bubble">
                    <span class="message-role">@(message.IsUser ? "You" : "Assistant")</span>
                    <p class="message-content">@message.Content</p>
                </div>
            </div>
        }

        @if (_isStreaming)
        {
            <div class="message assistant-message">
                <div class="message-bubble">
                    <span class="message-role">Assistant</span>
                    <p class="message-content">@_streamingBuffer<span class="cursor">|</span></p>
                </div>
            </div>
        }

        @if (!string.IsNullOrEmpty(_errorMessage))
        {
            <div class="message error-message">
                <div class="message-bubble error">
                    <p>@_errorMessage</p>
                </div>
            </div>
        }
    </div>

    <div class="chat-input-area">
        <textarea
            @bind="_userInput"
            @bind:event="oninput"
            @onkeydown="HandleKeyDown"
            placeholder="Type a message... (Enter to send, Shift+Enter for new line)"
            rows="2"
            disabled="@_isStreaming"
            class="chat-input"></textarea>
        <button
            @onclick="SendMessageAsync"
            disabled="@(_isStreaming || string.IsNullOrWhiteSpace(_userInput))"
            class="send-button">
            @(_isStreaming ? "Thinking..." : "Send")
        </button>
    </div>
</div>

@code {
    private record DisplayMessage(string Content, bool IsUser);

    private readonly List<DisplayMessage> _displayMessages = [];
    private string _userInput = "";
    private string _streamingBuffer = "";
    private bool _isStreaming;
    private string _errorMessage = "";

    private static readonly string[] ForbiddenPatterns =
    [
        "ignore previous instructions",
        "disregard your system prompt",
        "you are now",
        "act as if you are"
    ];

    private async Task HandleKeyDown(KeyboardEventArgs e)
    {
        if (e.Key == "Enter" && !e.ShiftKey && !_isStreaming)
        {
            await SendMessageAsync();
        }
    }

    private bool IsInputSafe(string input)
    {
        var lower = input.ToLowerInvariant();
        return !ForbiddenPatterns.Any(pattern => lower.Contains(pattern));
    }

    private async Task SendMessageAsync()
    {
        var userText = _userInput.Trim();

        if (string.IsNullOrWhiteSpace(userText))
            return;

        if (userText.Length > 2000)
        {
            _errorMessage = "Message too long. Please keep messages under 2000 characters.";
            await InvokeAsync(StateHasChanged);
            return;
        }

        if (!IsInputSafe(userText))
        {
            _errorMessage = "Your message contains content that cannot be processed. Please rephrase.";
            await InvokeAsync(StateHasChanged);
            return;
        }

        _errorMessage = "";
        _userInput = "";
        _isStreaming = true;
        _streamingBuffer = "";

        _displayMessages.Add(new DisplayMessage(userText, IsUser: true));
        await InvokeAsync(StateHasChanged);

        await StreamResponseAsync(userText);
    }

    private async Task StreamResponseAsync(string userText)
    {
        var responseBuffer = new StringBuilder();

        try
        {
            ChatHistory.AddUserMessage(userText);

            var settings = new OpenAIPromptExecutionSettings
            {
                FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
            };

            var chatCompletionService = Kernel.Services
                .GetRequiredService<IChatCompletionService>();

            var kernelWithPlugin = Kernel.Clone();
            kernelWithPlugin.Plugins.AddFromObject(WeatherPlugin, "WeatherPlugin");

            await foreach (var chunk in chatCompletionService
                .GetStreamingChatMessageContentsAsync(
                    ChatHistory,
                    settings,
                    kernelWithPlugin))
            {
                if (chunk.Content is not null)
                {
                    responseBuffer.Append(chunk.Content);
                    _streamingBuffer = responseBuffer.ToString();
                    await InvokeAsync(StateHasChanged);
                }
            }

            var fullResponse = responseBuffer.ToString();
            ChatHistory.AddAssistantMessage(fullResponse);
            _displayMessages.Add(new DisplayMessage(fullResponse, IsUser: false));
            _streamingBuffer = "";

            // Apply sliding window to prevent context window exhaustion
            int nonSystemCount = ChatHistory.Count - 1;
            int excess = nonSystemCount - 20;
            if (excess > 0)
            {
                ChatHistory.RemoveRange(1, excess);
            }
        }
        catch (Exception ex) when (ex is not OperationCanceledException)
        {
            _errorMessage = $"Something went wrong: {ex.Message}. Please try again.";

            if (responseBuffer.Length > 0)
            {
                _displayMessages.Add(
                    new DisplayMessage(responseBuffer.ToString() + " [truncated]", IsUser: false));
            }
        }
        finally
        {
            _isStreaming = false;
            await InvokeAsync(StateHasChanged);
        }
    }
}

What You Learned

This workshop covered the full path from an empty directory to a streaming Blazor Server AI chatbot. You configured Semantic Kernel with AddKernel().AddAzureOpenAIChatCompletion() and isolated per-user conversation state with a Scoped ChatHistory. You implemented token-by-token streaming using GetStreamingChatMessageContentsAsync and await InvokeAsync(StateHasChanged) for thread-safe UI updates. You added automatic function calling with a [KernelFunction] plugin and FunctionChoiceBehavior.Auto(). You applied sliding window truncation to prevent context window exhaustion. Finally, you containerized the app and configured managed identity for production secret management.

⚠ Production Considerations

  • Never call StateHasChanged() directly inside an async streaming loop. Blazor Server components run on the SignalR hub dispatcher thread. Calling StateHasChanged from a background thread causes race conditions and dropped updates. Always use await InvokeAsync(StateHasChanged).
  • Do not register ChatHistory as Singleton or Transient. Singleton leaks conversation history across all users. Transient creates a new empty history every injection point within a single request, breaking the conversation. Scoped is the only correct lifetime for per-user conversation state in Blazor Server.
  • Sliding window truncation alone loses long-range context. For sessions longer than 20 turns, consider the hybrid summarization pattern described in the chat history management guide to maintain coherence while staying within the model's token limits.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

🧠 Architect’s Note

Blazor Server is an excellent fit for AI chatbots because the persistent SignalR circuit eliminates the need for external session storage during a browser session. The server holds the ChatHistory in memory for the lifetime of the circuit. The downside is scale-out: if you run multiple server instances, a user who reconnects after a server restart loses their in-memory history. For production multi-instance deployments, pair this with the Redis persistence pattern from the Semantic Kernel chat history management guide and restore ChatHistory from Redis on circuit reconnect.

AI-Friendly Summary

Summary

This workshop builds a production-ready streaming AI chatbot using Blazor Server, Semantic Kernel 1.54.0, and Azure OpenAI. It covers project setup, DI configuration with AddKernel and AddAzureOpenAIChatCompletion, streaming token output with GetStreamingChatMessageContentsAsync and InvokeAsync(StateHasChanged), per-user ChatHistory as a Scoped service with sliding window truncation, function calling via KernelFunction plugins, content safety validation, and containerized deployment to Azure App Service with managed identity.

Key Takeaways

  • Use AddKernel().AddAzureOpenAIChatCompletion() in Program.cs to wire up Semantic Kernel with Azure OpenAI
  • Register ChatHistory as Scoped so each Blazor circuit (browser tab) gets an isolated conversation
  • Call await InvokeAsync(StateHasChanged) inside the streaming loop — never call StateHasChanged() directly from a background thread
  • FunctionChoiceBehavior.Auto() in OpenAIPromptExecutionSettings enables automatic function calling without manual dispatch
  • Apply sliding window truncation to ChatHistory after each turn to prevent context window exhaustion
  • Containerize with a multi-stage Dockerfile and use managed identity + Key Vault for secrets in production

Implementation Checklist

  • Create a .NET 9 Blazor Server project and add Microsoft.SemanticKernel 1.54.0
  • Configure DI with AddKernel().AddAzureOpenAIChatCompletion() and register ChatHistory as Scoped
  • Build the chat UI component with @rendermode InteractiveServer and message bubble layout
  • Add [StreamRendering(true)] attribute and implement the streaming loop with InvokeAsync(StateHasChanged)
  • Register a KernelFunction plugin and set FunctionChoiceBehavior.Auto() in execution settings
  • Implement sliding window truncation on ChatHistory after each assistant response
  • Add input validation and graceful error handling around the streaming loop
  • Write a multi-stage Dockerfile and configure managed identity for Key Vault secret access

Frequently Asked Questions

How does streaming work in Blazor Server with Semantic Kernel?

Blazor Server uses SignalR to push UI updates from server to browser. When you call GetStreamingChatMessageContentsAsync from Semantic Kernel, it returns an IAsyncEnumerable of StreamingChatMessageContent. You iterate over it in an async loop, append each chunk to a StringBuilder, update a bound string property, and call await InvokeAsync(StateHasChanged) to push each chunk through SignalR to the browser in real time.

What does the [StreamRendering(true)] attribute do in Blazor?

The [StreamRendering(true)] attribute (or @attribute [StreamRendering(true)]) tells the Blazor Server runtime to incrementally flush HTML to the browser during the initial render pass. It enables the component to show loading states or partial content while async operations complete. For AI chatbots, it works together with StateHasChanged to push streaming token updates to the browser without a full page refresh.

Should ChatHistory be Scoped or Singleton in Blazor Server?

ChatHistory must be registered as Scoped in Blazor Server. Each circuit (browser tab session) gets its own Scoped container, so each user gets an isolated ChatHistory instance. Registering as Singleton would cause all users to share the same conversation history, which is a critical data isolation bug.

How do I call StateHasChanged safely from an async streaming loop?

Use await InvokeAsync(StateHasChanged) instead of calling StateHasChanged() directly. In Blazor Server, UI updates must occur on the component's render thread. InvokeAsync marshals the call to the correct synchronization context, preventing thread-safety issues when the streaming loop runs on a background thread.

What is OpenAIPromptExecutionSettings.FunctionChoiceBehavior.Auto()?

FunctionChoiceBehavior.Auto() tells Semantic Kernel to automatically detect when the LLM wants to call a registered KernelFunction plugin and invoke it on your behalf. The LLM decides which functions to call based on the function descriptions and the user's intent. Setting this to Auto enables the full agent loop: the model can call functions, receive results, and continue generating a response.

How do I register a Semantic Kernel plugin for function calling?

Create a class with methods decorated with [KernelFunction] and [Description] attributes. Register it in DI, then add it to the kernel with kernel.Plugins.AddFromObject(new YourPlugin(), "PluginName"). With FunctionChoiceBehavior.Auto() in your execution settings, the LLM will call your plugin methods when appropriate.

How do I deploy a Blazor Server AI chatbot to Azure App Service?

Blazor Server requires a persistent SignalR connection, so it runs as a standard ASP.NET Core server process — not a static site. Containerize the app with a Dockerfile based on the official ASP.NET Core runtime image. Deploy to Azure App Service (Linux container plan) and use a managed identity with Key Vault references for the Azure OpenAI API key to avoid storing secrets in environment variables.

You Might Also Enjoy

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#Blazor #Semantic Kernel #Streaming #AI Chatbot #.NET AI