Can I stream Azure OpenAI responses through SignalR?

Yes. Azure OpenAI's CompleteChatStreamingAsync returns tokens incrementally. On the server, iterate with await foreach and call client.SendAsync for each token. SignalR's persistent WebSocket connection delivers each token to the browser with minimal latency -- typically single-digit milliseconds per hop.

How do I manage conversation history per user?

Associate conversation state with the SignalR connection ID or an authenticated user ID. Store the message list in a ConcurrentDictionary keyed by connection ID for anonymous users, or by user claim for authenticated sessions. Clear state on disconnect to prevent memory growth.

Is SignalR suitable for AI chat applications?

SignalR is well-suited for AI chat because it maintains a persistent bidirectional connection, handles reconnection automatically, and supports server-to-client streaming. It eliminates the polling overhead of REST-based approaches and works across WebSocket, Server-Sent Events, and Long Polling transports.

How do I handle disconnections gracefully?

Override OnDisconnectedAsync in your Hub to clean up conversation state and release resources. SignalR's built-in reconnection (with the withAutomaticReconnect option in the JavaScript client) restores the connection transparently. For critical state, persist conversations to a database so users can resume after reconnection.

Build a Real-Time AI Chat App with SignalR and Azure OpenAI

Server-Sent Events and REST endpoints work fine for one-off AI requests, but a real-time chat experience needs something more. Users expect to see tokens appear as they are generated, send follow-up messages immediately, and maintain context across the conversation. SignalR provides all of this through a persistent bidirectional connection that handles reconnection, transport negotiation, and message delivery out of the box.

This workshop builds a complete real-time AI chat application. The backend is an ASP.NET Core 9 project with a SignalR Hub that accepts user messages, sends them to Azure OpenAI, and streams tokens back to the browser. The frontend is a single HTML page with vanilla JavaScript — no framework, no build step. Everything runs end to end.

Prerequisites

.NET 9 SDK installed
An Azure OpenAI resource with a deployed gpt-4o or gpt-4o-mini model
Your Azure OpenAI endpoint, API key, and deployment name
A modern browser (Chrome, Edge, Firefox, Safari)

For the foundational streaming pattern, the Streaming Chat API workshop covers the underlying SDK calls in detail.

Step 1 — Scaffold the Project

dotnet new web -n RealtimeAiChat
cd RealtimeAiChat
dotnet add package Azure.AI.OpenAI --version 2.1.0

We start with dotnet new web rather than webapi because this project also serves static HTML. The minimal template gives us just Program.cs with no extra scaffolding.

Step 2 — Application Configuration

appsettings.json

{
  "AzureOpenAI": {
    "Endpoint": "https://<your-resource>.openai.azure.com/",
    "ApiKey": "<your-api-key>",
    "DeploymentName": "gpt-4o"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

Step 3 — Define the Settings Model

Models/AzureOpenAISettings.cs

namespace RealtimeAiChat.Models;

public sealed class AzureOpenAISettings
{
    public const string SectionName = "AzureOpenAI";

    public required string Endpoint { get; init; }
    public required string ApiKey { get; init; }
    public required string DeploymentName { get; init; }
}

Step 4 — Build the Conversation Store

Each SignalR connection maintains its own conversation history. The store maps connection IDs to message lists, providing thread-safe access for concurrent hubs.

Services/ConversationStore.cs

using System.Collections.Concurrent;
using OpenAI.Chat;

namespace RealtimeAiChat.Services;

public sealed class ConversationStore
{
    private readonly ConcurrentDictionary<string, List<ChatMessage>> _conversations = new();
    private readonly ILogger<ConversationStore> _logger;

    private const int MaxHistoryMessages = 50;

    public ConversationStore(ILogger<ConversationStore> logger)
    {
        _logger = logger;
    }

    public List<ChatMessage> GetOrCreate(string connectionId)
    {
        return _conversations.GetOrAdd(connectionId, _ =>
        {
            _logger.LogInformation(
                "Created conversation for connection {ConnectionId}", connectionId);
            return new List<ChatMessage>
            {
                new SystemChatMessage(
                    "You are a helpful AI assistant. Be concise, accurate, and friendly.")
            };
        });
    }

    public void AddUserMessage(string connectionId, string content)
    {
        var messages = GetOrCreate(connectionId);
        lock (messages)
        {
            messages.Add(new UserChatMessage(content));
            TrimHistory(messages);
        }
    }

    public void AddAssistantMessage(string connectionId, string content)
    {
        var messages = GetOrCreate(connectionId);
        lock (messages)
        {
            messages.Add(new AssistantChatMessage(content));
            TrimHistory(messages);
        }
    }

    public void Remove(string connectionId)
    {
        if (_conversations.TryRemove(connectionId, out _))
        {
            _logger.LogInformation(
                "Removed conversation for connection {ConnectionId}", connectionId);
        }
    }

    private static void TrimHistory(List<ChatMessage> messages)
    {
        // Keep system message + last N messages
        while (messages.Count > MaxHistoryMessages + 1)
        {
            messages.RemoveAt(1); // Remove oldest after system message
        }
    }
}

The MaxHistoryMessages cap prevents token limits from being exceeded as conversations grow. The trim strategy removes the oldest messages first while always preserving the system prompt. For production applications, consider a more sophisticated approach that summarizes older messages instead of discarding them.

Step 5 — Build the ChatHub

The hub is the heart of the application. It receives messages from clients, calls Azure OpenAI with the full conversation history, and streams tokens back one at a time.

Hubs/ChatHub.cs

using System.ClientModel;
using Azure.AI.OpenAI;
using Microsoft.AspNetCore.SignalR;
using Microsoft.Extensions.Options;
using OpenAI.Chat;
using RealtimeAiChat.Models;
using RealtimeAiChat.Services;

namespace RealtimeAiChat.Hubs;

public sealed class ChatHub : Hub
{
    private readonly AzureOpenAIClient _aiClient;
    private readonly ConversationStore _store;
    private readonly AzureOpenAISettings _settings;
    private readonly ILogger<ChatHub> _logger;

    public ChatHub(
        AzureOpenAIClient aiClient,
        ConversationStore store,
        IOptions<AzureOpenAISettings> settings,
        ILogger<ChatHub> logger)
    {
        _aiClient = aiClient;
        _store = store;
        _settings = settings.Value;
        _logger = logger;
    }

    public async Task SendMessage(string message)
    {
        var connectionId = Context.ConnectionId;

        if (string.IsNullOrWhiteSpace(message))
        {
            await Clients.Caller.SendAsync("ReceiveError", "Message cannot be empty.");
            return;
        }

        _store.AddUserMessage(connectionId, message);

        // Notify the client that the AI is thinking
        await Clients.Caller.SendAsync("ReceiveStatusUpdate", "thinking");

        try
        {
            var chatClient = _aiClient.GetChatClient(_settings.DeploymentName);
            var messages = _store.GetOrCreate(connectionId);

            var fullResponse = new System.Text.StringBuilder();

            // Signal the start of a new AI response
            await Clients.Caller.SendAsync("ReceiveStreamStart");

            await foreach (StreamingChatCompletionUpdate update in
                chatClient.CompleteChatStreamingAsync(messages))
            {
                foreach (ChatMessageContentPart part in update.ContentUpdate)
                {
                    fullResponse.Append(part.Text);
                    await Clients.Caller.SendAsync("ReceiveStreamToken", part.Text);
                }
            }

            // Signal the end of the stream
            await Clients.Caller.SendAsync("ReceiveStreamEnd");

            // Store the complete response for conversation history
            _store.AddAssistantMessage(connectionId, fullResponse.ToString());

            _logger.LogInformation(
                "Completed response for {ConnectionId}: {Length} chars",
                connectionId, fullResponse.Length);
        }
        catch (ClientResultException ex) when (ex.Status == 429)
        {
            _logger.LogWarning("Rate limit hit for {ConnectionId}", connectionId);
            await Clients.Caller.SendAsync("ReceiveError",
                "The AI service is busy. Please wait a moment and try again.");
        }
        catch (ClientResultException ex) when (ex.Status == 401)
        {
            _logger.LogError("Auth failure for Azure OpenAI");
            await Clients.Caller.SendAsync("ReceiveError",
                "AI service authentication failed. Contact the administrator.");
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Error processing message for {ConnectionId}", connectionId);
            await Clients.Caller.SendAsync("ReceiveError",
                "An error occurred while generating the response. Please try again.");
        }
    }

    public async Task ClearHistory()
    {
        _store.Remove(Context.ConnectionId);
        _store.GetOrCreate(Context.ConnectionId);
        await Clients.Caller.SendAsync("ReceiveStatusUpdate", "history_cleared");
    }

    public override async Task OnDisconnectedAsync(Exception? exception)
    {
        _store.Remove(Context.ConnectionId);

        if (exception is not null)
        {
            _logger.LogWarning(exception,
                "Connection {ConnectionId} disconnected with error",
                Context.ConnectionId);
        }

        await base.OnDisconnectedAsync(exception);
    }
}

Each call to SendMessage follows a clear lifecycle: validate input, add to history, notify the client that processing has started, stream tokens individually, signal completion, and store the full response. Error handling catches rate limits and auth failures without crashing the connection.

The ClearHistory method lets users reset their conversation. OnDisconnectedAsync cleans up state when a client disconnects, preventing memory leaks.

Step 6 — Wire Up Program.cs

using System.ClientModel.Primitives;
using Azure;
using Azure.AI.OpenAI;
using RealtimeAiChat.Hubs;
using RealtimeAiChat.Models;
using RealtimeAiChat.Services;

var builder = WebApplication.CreateBuilder(args);

// Bind configuration
builder.Services.Configure<AzureOpenAISettings>(
    builder.Configuration.GetSection(AzureOpenAISettings.SectionName));

// Register AzureOpenAIClient as singleton
builder.Services.AddSingleton(sp =>
{
    var settings = builder.Configuration
        .GetSection(AzureOpenAISettings.SectionName)
        .Get<AzureOpenAISettings>()
        ?? throw new InvalidOperationException("AzureOpenAI settings missing.");

    var options = new AzureOpenAIClientOptions
    {
        RetryPolicy = new ClientRetryPolicy(maxRetries: 3)
    };

    return new AzureOpenAIClient(
        new Uri(settings.Endpoint),
        new AzureKeyCredential(settings.ApiKey),
        options);
});

// Register services
builder.Services.AddSingleton<ConversationStore>();

// Add SignalR
builder.Services.AddSignalR();

var app = builder.Build();

// Serve static files (for the HTML frontend)
app.UseDefaultFiles();
app.UseStaticFiles();

// Map the chat hub
app.MapHub<ChatHub>("/chatHub");

app.Run();

UseDefaultFiles looks for index.html in the wwwroot folder. UseStaticFiles serves it. The SignalR hub is mapped at /chatHub.

Step 7 — Build the Frontend

Create the wwwroot folder and add a single HTML file. This frontend uses no build tools, no framework — just the SignalR JavaScript client loaded from a CDN.

wwwroot/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>AI Chat</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body {
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
            background: #f5f5f5;
            height: 100vh;
            display: flex;
            flex-direction: column;
        }
        header {
            background: #1a1a2e;
            color: white;
            padding: 16px 24px;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }
        header h1 { font-size: 1.2rem; font-weight: 600; }
        #status {
            font-size: 0.8rem;
            padding: 4px 12px;
            border-radius: 12px;
            background: #333;
        }
        #status.connected { background: #2d6a4f; }
        #status.disconnected { background: #d00000; }
        #chat-container {
            flex: 1;
            overflow-y: auto;
            padding: 24px;
            display: flex;
            flex-direction: column;
            gap: 16px;
        }
        .message {
            max-width: 75%;
            padding: 12px 16px;
            border-radius: 12px;
            line-height: 1.5;
            white-space: pre-wrap;
            word-wrap: break-word;
        }
        .message.user {
            align-self: flex-end;
            background: #1a1a2e;
            color: white;
        }
        .message.assistant {
            align-self: flex-start;
            background: white;
            border: 1px solid #ddd;
        }
        .message.error {
            align-self: center;
            background: #fee;
            border: 1px solid #fcc;
            color: #c00;
            font-size: 0.9rem;
        }
        .thinking {
            align-self: flex-start;
            color: #888;
            font-style: italic;
            font-size: 0.9rem;
        }
        #input-area {
            padding: 16px 24px;
            background: white;
            border-top: 1px solid #ddd;
            display: flex;
            gap: 12px;
        }
        #message-input {
            flex: 1;
            padding: 12px 16px;
            border: 1px solid #ddd;
            border-radius: 8px;
            font-size: 1rem;
            outline: none;
        }
        #message-input:focus { border-color: #1a1a2e; }
        button {
            padding: 12px 24px;
            border: none;
            border-radius: 8px;
            font-size: 1rem;
            cursor: pointer;
        }
        #send-btn {
            background: #1a1a2e;
            color: white;
        }
        #send-btn:disabled { opacity: 0.5; cursor: not-allowed; }
        #clear-btn {
            background: #eee;
            color: #333;
        }
    </style>
</head>
<body>
    <header>
        <h1>AI Chat</h1>
        <span id="status">Connecting...</span>
    </header>
    <div id="chat-container"></div>
    <div id="input-area">
        <input id="message-input" type="text"
               placeholder="Type your message..."
               autocomplete="off" disabled />
        <button id="send-btn" disabled>Send</button>
        <button id="clear-btn">Clear</button>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/microsoft-signalr/8.0.7/signalr.min.js"></script>
    <script>
        const chatContainer = document.getElementById('chat-container');
        const messageInput = document.getElementById('message-input');
        const sendBtn = document.getElementById('send-btn');
        const clearBtn = document.getElementById('clear-btn');
        const statusEl = document.getElementById('status');

        let currentAssistantMessage = null;
        let isStreaming = false;

        // Build the SignalR connection
        const connection = new signalR.HubConnectionBuilder()
            .withUrl('/chatHub')
            .withAutomaticReconnect([0, 2000, 5000, 10000, 30000])
            .configureLogging(signalR.LogLevel.Warning)
            .build();

        // Connection lifecycle
        connection.onreconnecting(() => {
            setStatus('Reconnecting...', 'disconnected');
            setInputEnabled(false);
        });

        connection.onreconnected(() => {
            setStatus('Connected', 'connected');
            setInputEnabled(true);
        });

        connection.onclose(() => {
            setStatus('Disconnected', 'disconnected');
            setInputEnabled(false);
        });

        // Hub event handlers
        connection.on('ReceiveStreamStart', () => {
            isStreaming = true;
            removeThinking();
            currentAssistantMessage = addMessage('', 'assistant');
        });

        connection.on('ReceiveStreamToken', (token) => {
            if (currentAssistantMessage) {
                currentAssistantMessage.textContent += token;
                scrollToBottom();
            }
        });

        connection.on('ReceiveStreamEnd', () => {
            isStreaming = false;
            currentAssistantMessage = null;
            setInputEnabled(true);
        });

        connection.on('ReceiveStatusUpdate', (status) => {
            if (status === 'thinking') {
                addThinking();
            } else if (status === 'history_cleared') {
                chatContainer.innerHTML = '';
                addMessage('Conversation cleared.', 'error');
            }
        });

        connection.on('ReceiveError', (error) => {
            isStreaming = false;
            currentAssistantMessage = null;
            removeThinking();
            addMessage(error, 'error');
            setInputEnabled(true);
        });

        // UI helpers
        function addMessage(text, role) {
            const div = document.createElement('div');
            div.className = `message ${role}`;
            div.textContent = text;
            chatContainer.appendChild(div);
            scrollToBottom();
            return div;
        }

        function addThinking() {
            const div = document.createElement('div');
            div.className = 'thinking';
            div.id = 'thinking-indicator';
            div.textContent = 'AI is thinking...';
            chatContainer.appendChild(div);
            scrollToBottom();
        }

        function removeThinking() {
            const el = document.getElementById('thinking-indicator');
            if (el) el.remove();
        }

        function scrollToBottom() {
            chatContainer.scrollTop = chatContainer.scrollHeight;
        }

        function setStatus(text, cls) {
            statusEl.textContent = text;
            statusEl.className = cls;
        }

        function setInputEnabled(enabled) {
            messageInput.disabled = !enabled;
            sendBtn.disabled = !enabled;
            if (enabled) messageInput.focus();
        }

        // Send message
        async function sendMessage() {
            const message = messageInput.value.trim();
            if (!message || isStreaming) return;

            addMessage(message, 'user');
            messageInput.value = '';
            setInputEnabled(false);

            try {
                await connection.invoke('SendMessage', message);
            } catch (err) {
                addMessage('Failed to send message: ' + err.message, 'error');
                setInputEnabled(true);
            }
        }

        sendBtn.addEventListener('click', sendMessage);
        messageInput.addEventListener('keydown', (e) => {
            if (e.key === 'Enter' && !e.shiftKey) {
                e.preventDefault();
                sendMessage();
            }
        });

        clearBtn.addEventListener('click', async () => {
            try {
                await connection.invoke('ClearHistory');
            } catch (err) {
                addMessage('Failed to clear: ' + err.message, 'error');
            }
        });

        // Start the connection
        async function start() {
            try {
                await connection.start();
                setStatus('Connected', 'connected');
                setInputEnabled(true);
            } catch (err) {
                setStatus('Failed to connect', 'disconnected');
                console.error('SignalR connection error:', err);
                setTimeout(start, 5000);
            }
        }

        start();
    </script>
</body>
</html>

The frontend handles five distinct events from the hub: ReceiveStreamStart, ReceiveStreamToken, ReceiveStreamEnd, ReceiveStatusUpdate, and ReceiveError. This protocol gives the UI full control over rendering. The “AI is thinking…” indicator appears immediately, then dissolves as tokens arrive.

The withAutomaticReconnect configuration uses a progressive back-off strategy — 0, 2, 5, 10, then 30 seconds between attempts. If the server restarts during development, the client reconnects without the user lifting a finger.

Step 8 — Run and Test

Start the application:

dotnet run

Open your browser to http://localhost:5000 (or the port shown in the console). Type a message and press Enter. You should see:

Your message appears on the right in dark.
“AI is thinking…” appears briefly on the left.
The AI response streams in token by token.
The input re-enables once streaming finishes.

Try a multi-turn conversation:

“What is dependency injection?”
“Can you show me an example in C#?”
“How would I test that class?”

Each follow-up builds on the previous context because the ConversationStore maintains the full message history for each connection.

Click Clear to reset the conversation and start fresh.

Step 9 — Connection Lifecycle Details

SignalR manages several edge cases automatically, but there are behaviors worth understanding:

Reconnection. When the WebSocket drops (network interruption, server restart during development), the client follows the withAutomaticReconnect schedule. During reconnection, the connection gets a new ID. That means conversation history from the previous connection is lost. For production, persist history to a database keyed by user ID rather than connection ID.

Concurrent messages. The hub processes one SendMessage invocation at a time per connection by default. If a user hammers the send button, messages queue up. The frontend disables the input during streaming to prevent this.

Memory pressure. The ConversationStore trims history at 50 messages. Without this cap, a long conversation could grow to thousands of tokens, eventually exceeding the model’s context window. The trim strategy keeps the system prompt and the most recent messages, discarding the oldest ones.

Step 10 — Production Considerations

A few things to address before deploying this to production.

Authentication. Add ASP.NET Core authentication middleware and decorate the hub with [Authorize]. Key conversation history by user ID instead of connection ID so conversations survive reconnections.

Scaling. SignalR uses in-memory state by default. For multiple server instances behind a load balancer, add the Azure SignalR Service backplane or Redis backplane so messages reach the correct client regardless of which server handles the WebSocket.

Rate limiting. Apply per-user rate limits either in the hub or through middleware to prevent abuse. If an Azure OpenAI 429 error occurs, the hub returns a friendly message rather than crashing. See the fix 429 rate-limit errors guide for deeper strategies.

Cancellation. Currently, if a user disconnects mid-stream, the foreach loop continues processing tokens on the server. Pass Context.ConnectionAborted as a cancellation token to CompleteChatStreamingAsync to stop generation when the client disappears.

Complete Project Structure

RealtimeAiChat/
  Program.cs
  appsettings.json
  Models/
    AzureOpenAISettings.cs
  Services/
    ConversationStore.cs
  Hubs/
    ChatHub.cs
  wwwroot/
    index.html

What You Learned

You built a real-time AI chat application from an empty project. SignalR provides the transport layer — persistent, bidirectional, with automatic reconnection. Azure OpenAI provides the intelligence. The ChatHub bridges the two, streaming tokens from the model to the browser as they are generated. The conversation store maintains multi-turn context per connection. The frontend handles the full lifecycle of connection, streaming, error display, and history clearing without any framework dependencies.

For the underlying streaming SDK patterns, review the Streaming Chat API workshop. For prompt techniques that make the AI more useful in interactive conversations, the Prompt Engineering Fundamentals guide covers system messages, few-shot examples, and chain-of-thought strategies.

Build a Real-Time AI Chat App with SignalR and Azure OpenAI

Prerequisites

Step 1 — Scaffold the Project

Step 2 — Application Configuration

Step 3 — Define the Settings Model

Step 4 — Build the Conversation Store

Step 5 — Build the ChatHub

Step 6 — Wire Up Program.cs

Step 7 — Build the Frontend

Step 8 — Run and Test

Step 9 — Connection Lifecycle Details

Step 10 — Production Considerations

Complete Project Structure

What You Learned

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

Related Articles

Build a Document Summarizer with C# and Azure OpenAI

Build a Streaming Chat API with Azure OpenAI and .NET

Comparing LLM Providers — OpenAI, Azure OpenAI, Anthropic, and Open-Source

Was this article useful?

Discussion