Build a RAG Chatbot with .NET 8+, Semantic Kernel, and Azure Cosmos DB

Original .NET 8 Semantic Kernel 1.34.0 Azure.AI.OpenAI 2.2.0 Microsoft.Azure.Cosmos 3.44.0
By Rajesh Mishra · Feb 14, 2026 · Verified: Feb 18, 2026 · 25 min read

What You’ll Build

A production-ready RAG (Retrieval-Augmented Generation) chatbot that:

  • Ingests documents and generates vector embeddings
  • Stores embeddings alongside operational data in Azure Cosmos DB
  • Retrieves relevant context using vector similarity search
  • Generates accurate, grounded answers using Azure OpenAI GPT-4o
  • Streams responses to the frontend in real-time

Architecture Overview

User Query → .NET 8+ API → Semantic Kernel

                   Azure Cosmos DB (Vector Search)

                   Retrieved Context + Query

                   Azure OpenAI GPT-4o

                   Streaming Response → User

Step 1: Project Setup

Create a new .NET 8+ Web API project:

dotnet new webapi -n RagChatbot --framework net8.0
cd RagChatbot

Install required packages:

dotnet add package Microsoft.SemanticKernel --version 1.34.0
dotnet add package Azure.AI.OpenAI --version 2.2.0
dotnet add package Microsoft.Azure.Cosmos --version 3.44.0

Step 2: Configure Services

Set up dependency injection in Program.cs:

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel;

var builder = WebApplication.CreateBuilder(args);

// Azure OpenAI
builder.Services.AddSingleton(sp =>
{
    var options = new AzureOpenAIClientOptions();
    options.Retry.MaxRetries = 5;
    options.Retry.Mode = Azure.Core.RetryMode.Exponential;

    return new AzureOpenAIClient(
        new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!),
        options
    );
});

// Cosmos DB
builder.Services.AddSingleton(sp =>
{
    var cosmosClient = new CosmosClient(
        builder.Configuration["CosmosDB:ConnectionString"],
        new CosmosClientOptions
        {
            ApplicationName = "rag-chatbot",
            ConnectionMode = ConnectionMode.Direct,
            SerializerOptions = new CosmosSerializationOptions
            {
                PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
            }
        }
    );
    return cosmosClient;
});

// Semantic Kernel
builder.Services.AddSingleton(sp =>
{
    var kernelBuilder = Kernel.CreateBuilder();
    kernelBuilder.AddAzureOpenAIChatCompletion(
        deploymentName: "gpt-4o",
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!
    );
    return kernelBuilder.Build();
});

Step 3: Document Model

Define the document structure with vector embedding support:

public class DocumentChunk
{
    public string Id { get; set; } = Guid.NewGuid().ToString();
    public string DocumentId { get; set; } = "";
    public string Content { get; set; } = "";
    public string Title { get; set; } = "";
    public float[] Embedding { get; set; } = [];
    public string Category { get; set; } = "";
    public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}

Step 4: Embedding Service

Generate embeddings using Azure OpenAI:

public class EmbeddingService(AzureOpenAIClient client)
{
    private readonly OpenAI.Embeddings.EmbeddingClient _embeddingClient
        = client.GetEmbeddingClient("text-embedding-3-small");

    public async Task<float[]> GenerateEmbeddingAsync(string text)
    {
        var result = await _embeddingClient.GenerateEmbeddingAsync(text);
        return result.Value.ToFloats().ToArray();
    }
}

Step 5: Vector Search with Cosmos DB

Query Cosmos DB using vector similarity:

public class VectorSearchService(CosmosClient cosmosClient)
{
    private readonly Container _container =
        cosmosClient.GetContainer("ragdb", "documents");

    public async Task<List<DocumentChunk>> SearchAsync(
        float[] queryEmbedding, int topK = 5)
    {
        var queryDef = new QueryDefinition(
            "SELECT TOP @topK c.id, c.content, c.title, c.category, " +
            "VectorDistance(c.embedding, @embedding) AS score " +
            "FROM c ORDER BY VectorDistance(c.embedding, @embedding)")
            .WithParameter("@topK", topK)
            .WithParameter("@embedding", queryEmbedding);

        var results = new List<DocumentChunk>();
        using var feed = _container.GetItemQueryIterator<DocumentChunk>(queryDef);
        
        while (feed.HasMoreResults)
        {
            var response = await feed.ReadNextAsync();
            results.AddRange(response);
        }

        return results;
    }
}

Step 6: Chat API Endpoint

Wire everything together in a streaming chat endpoint:

app.MapPost("/api/chat", async (
    ChatRequest request,
    EmbeddingService embeddingService,
    VectorSearchService searchService,
    Kernel kernel,
    HttpContext httpContext) =>
{
    // 1. Generate embedding for the user's query
    var queryEmbedding = await embeddingService
        .GenerateEmbeddingAsync(request.Message);

    // 2. Search for relevant document chunks
    var relevantDocs = await searchService
        .SearchAsync(queryEmbedding, topK: 5);

    // 3. Build context from retrieved documents
    var context = string.Join("\n\n",
        relevantDocs.Select(d => $"[{d.Title}]: {d.Content}"));

    // 4. Generate answer with context
    var prompt = $"""
        You are a helpful assistant. Answer the user's question based on
        the provided context. If the context doesn't contain relevant
        information, say so clearly.

        Context:
        {context}

        Question: {request.Message}
        """;

    var result = await kernel.InvokePromptAsync(prompt);
    return Results.Ok(new { response = result.ToString() });
});

Next Steps

  • Add authentication with Azure AD
  • Implement conversation memory with Cosmos DB
  • Add document ingestion pipeline with Azure Functions
  • Deploy to Azure Container Apps
  • Add Application Insights telemetry

AI-Friendly Summary

Summary

This end-to-end tutorial builds a production-ready RAG chatbot using .NET 8 LTS (or later), Semantic Kernel for orchestration, Azure OpenAI for embeddings and chat completion, and Azure Cosmos DB for vector storage. Covers project setup, embedding generation, vector indexing, retrieval pipeline, and chat API with streaming.

Key Takeaways

  • Azure Cosmos DB supports native vector search — no separate vector store needed
  • Semantic Kernel manages the orchestration between retrieval and generation
  • Use text-embedding-3-small for cost-effective embeddings
  • Streaming responses improve perceived latency significantly
  • The DI-first approach makes the architecture testable and maintainable

Implementation Checklist

  • Create Azure OpenAI resource with GPT-4o and embedding deployments
  • Create Azure Cosmos DB account with vector search enabled
  • Scaffold .NET 8 Web API project
  • Install Semantic Kernel, Azure.AI.OpenAI, and Cosmos DB packages
  • Implement document ingestion with embedding generation
  • Configure Cosmos DB vector index
  • Build retrieval pipeline with Semantic Kernel
  • Implement streaming chat API endpoint
  • Add error handling and rate limit retries
  • Deploy to Azure Container Apps

Frequently Asked Questions

Why use Azure Cosmos DB for RAG instead of a dedicated vector database?

Azure Cosmos DB now supports native vector search alongside your operational data. This eliminates the need for a separate vector store, reduces architectural complexity, and ensures your embeddings stay in sync with your source data — all within a single globally-distributed database.

What embedding model should I use?

For most .NET RAG applications, use Azure OpenAI's text-embedding-3-small or text-embedding-3-large. The small model offers excellent quality/cost ratio for most use cases.

Related Articles

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#RAG #Semantic Kernel #Azure Cosmos DB #Azure OpenAI #.NET 8+ #Tutorial