Skip to main content

Build a RAG Chatbot with Semantic Kernel + Cosmos DB Vector Store (.NET 10)

Verified Apr 2026 Original .NET 10 Microsoft.SemanticKernel 1.54.0 Azure.AI.OpenAI 2.2.0 Microsoft.Azure.Cosmos 3.44.0
By Rajesh Mishra · Feb 14, 2026 · 25 min read
In 30 Seconds

This end-to-end tutorial builds a production-ready RAG chatbot using .NET 10, Semantic Kernel for orchestration, Azure OpenAI for embeddings and chat completion, and Azure Cosmos DB for vector storage. Covers project setup, embedding generation, vector indexing, retrieval pipeline, and chat API with streaming.

What You'll Build

Production RAG chatbot: Semantic Kernel + Azure Cosmos DB vector search + Azure OpenAI grounded generation. Ingestion, retrieval, and chat in .NET 10.

Microsoft.SemanticKernel 1.54.0Azure.AI.OpenAI 2.2.0Microsoft.Azure.Cosmos 3.44.0 .NET 10 · 25 min read to complete

What You’ll Build

A production-ready RAG (Retrieval-Augmented Generation) chatbot that:

  • Ingests documents and generates vector embeddings
  • Stores embeddings alongside operational data in Azure Cosmos DB
  • Retrieves relevant context using vector similarity search
  • Generates accurate, grounded answers using your Azure OpenAI chat deployment
  • Streams responses to the frontend in real-time

Architecture Overview

User Query.NET 10 APISemantic KernelAzure Cosmos DBVector SearchRetrieved Context + QueryAzure OpenAIChat DeploymentStreaming ResponseUser
RAG chatbot architecture — query flows through Semantic Kernel to vector search, retrieves context, and generates a grounded streaming response.

Step 1: Project Setup

Create a new .NET 10 Web API project:

dotnet new webapi -n RagChatbot --framework net8.0
cd RagChatbot

Install required packages:

dotnet add package Microsoft.SemanticKernel --version 1.34.0
dotnet add package Azure.AI.OpenAI --version 2.2.0
dotnet add package Microsoft.Azure.Cosmos --version 3.44.0

Step 2: Configure Services

Set up dependency injection in Program.cs:

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel;

var builder = WebApplication.CreateBuilder(args);

// Azure OpenAI
builder.Services.AddSingleton(sp =>
{
    var options = new AzureOpenAIClientOptions();
    options.Retry.MaxRetries = 5;
    options.Retry.Mode = Azure.Core.RetryMode.Exponential;

    return new AzureOpenAIClient(
        new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!),
        options
    );
});

// Cosmos DB
builder.Services.AddSingleton(sp =>
{
    var cosmosClient = new CosmosClient(
        builder.Configuration["CosmosDB:ConnectionString"],
        new CosmosClientOptions
        {
            ApplicationName = "rag-chatbot",
            ConnectionMode = ConnectionMode.Direct,
            SerializerOptions = new CosmosSerializationOptions
            {
                PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
            }
        }
    );
    return cosmosClient;
});

// Semantic Kernel
builder.Services.AddSingleton(sp =>
{
    var kernelBuilder = Kernel.CreateBuilder();
    kernelBuilder.AddAzureOpenAIChatCompletion(
        deploymentName: "chat-deployment",
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!
    );
    return kernelBuilder.Build();
});

Step 3: Document Model

Define the document structure with vector embedding support:

public class DocumentChunk
{
    public string Id { get; set; } = Guid.NewGuid().ToString();
    public string DocumentId { get; set; } = "";
    public string Content { get; set; } = "";
    public string Title { get; set; } = "";
    public float[] Embedding { get; set; } = [];
    public string Category { get; set; } = "";
    public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}

Step 4: Embedding Service

Generate embeddings using Azure OpenAI:

public class EmbeddingService(AzureOpenAIClient client)
{
    private readonly OpenAI.Embeddings.EmbeddingClient _embeddingClient
        = client.GetEmbeddingClient("text-embedding-3-small");

    public async Task<float[]> GenerateEmbeddingAsync(string text)
    {
        var result = await _embeddingClient.GenerateEmbeddingAsync(text);
        return result.Value.ToFloats().ToArray();
    }
}

Step 5: Vector Search with Cosmos DB

Query Cosmos DB using vector similarity:

public class VectorSearchService(CosmosClient cosmosClient)
{
    private readonly Container _container =
        cosmosClient.GetContainer("ragdb", "documents");

    public async Task<List<DocumentChunk>> SearchAsync(
        float[] queryEmbedding, int topK = 5)
    {
        var queryDef = new QueryDefinition(
            "SELECT TOP @topK c.id, c.content, c.title, c.category, " +
            "VectorDistance(c.embedding, @embedding) AS score " +
            "FROM c ORDER BY VectorDistance(c.embedding, @embedding)")
            .WithParameter("@topK", topK)
            .WithParameter("@embedding", queryEmbedding);

        var results = new List<DocumentChunk>();
        using var feed = _container.GetItemQueryIterator<DocumentChunk>(queryDef);
        
        while (feed.HasMoreResults)
        {
            var response = await feed.ReadNextAsync();
            results.AddRange(response);
        }

        return results;
    }
}

Step 6: Chat API Endpoint

Wire everything together in a streaming chat endpoint:

app.MapPost("/api/chat", async (
    ChatRequest request,
    EmbeddingService embeddingService,
    VectorSearchService searchService,
    Kernel kernel,
    HttpContext httpContext) =>
{
    // 1. Generate embedding for the user's query
    var queryEmbedding = await embeddingService
        .GenerateEmbeddingAsync(request.Message);

    // 2. Search for relevant document chunks
    var relevantDocs = await searchService
        .SearchAsync(queryEmbedding, topK: 5);

    // 3. Build context from retrieved documents
    var context = string.Join("\n\n",
        relevantDocs.Select(d => $"[{d.Title}]: {d.Content}"));

    // 4. Generate answer with context
    var prompt = $"""
        You are a helpful assistant. Answer the user's question based on
        the provided context. If the context doesn't contain relevant
        information, say so clearly.

        Context:
        {context}

        Question: {request.Message}
        """;

    var result = await kernel.InvokePromptAsync(prompt);
    return Results.Ok(new { response = result.ToString() });
});

If your RAG pipeline starts hitting rate limits under load, see Fix Azure OpenAI 429 Too Many Requests in .NET.

Next Steps

  • Add authentication with Azure AD
  • Implement conversation memory with Cosmos DB
  • Add document ingestion pipeline with Azure Functions
  • Deploy to Azure Container Apps
  • Add Application Insights telemetry

For comprehensive chat history management patterns, see Semantic Kernel Chat History Management — Sliding Windows, Summarization, and Token-Aware Truncation.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

AI-Friendly Summary

Summary

This end-to-end tutorial builds a production-ready RAG chatbot using .NET 10, Semantic Kernel for orchestration, Azure OpenAI for embeddings and chat completion, and Azure Cosmos DB for vector storage. Covers project setup, embedding generation, vector indexing, retrieval pipeline, and chat API with streaming.

Key Takeaways

  • Azure Cosmos DB supports native vector search — no separate vector store needed
  • Semantic Kernel manages the orchestration between retrieval and generation
  • Use text-embedding-3-small for cost-effective embeddings
  • Streaming responses improve perceived latency significantly
  • The DI-first approach makes the architecture testable and maintainable

Implementation Checklist

  • Create Azure OpenAI resource with chat and embedding deployments
  • Create Azure Cosmos DB account with vector search enabled
  • Scaffold .NET 10 Web API project
  • Install Semantic Kernel, Azure.AI.OpenAI, and Cosmos DB packages
  • Implement document ingestion with embedding generation
  • Configure Cosmos DB vector index
  • Build retrieval pipeline with Semantic Kernel
  • Implement streaming chat API endpoint
  • Add error handling and rate limit retries
  • Deploy to Azure Container Apps

Frequently Asked Questions

Why use Azure Cosmos DB for RAG instead of a dedicated vector database?

Azure Cosmos DB now supports native vector search alongside your operational data. This eliminates the need for a separate vector store, reduces architectural complexity, and ensures your embeddings stay in sync with your source data — all within a single globally-distributed database.

What embedding model should I use?

For most .NET RAG applications, use Azure OpenAI's text-embedding-3-small or text-embedding-3-large. The small model offers excellent quality/cost ratio for most use cases.

You Might Also Enjoy

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#RAG #Semantic Kernel #Azure Cosmos DB #Azure OpenAI #.NET 10 #Tutorial