Why use Azure Cosmos DB for RAG instead of a dedicated vector database?

Azure Cosmos DB now supports native vector search alongside your operational data. This eliminates the need for a separate vector store, reduces architectural complexity, and ensures your embeddings stay in sync with your source data — all within a single globally-distributed database.

What embedding model should I use?

For most .NET RAG applications, use Azure OpenAI's text-embedding-3-small or text-embedding-3-large. The small model offers excellent quality/cost ratio for most use cases.

Build a RAG Chatbot with Semantic Kernel + Cosmos DB Vector Store (.NET 10)

What You’ll Build

A production-ready RAG (Retrieval-Augmented Generation) chatbot that:

Ingests documents and generates vector embeddings
Stores embeddings alongside operational data in Azure Cosmos DB
Retrieves relevant context using vector similarity search
Generates accurate, grounded answers using your Azure OpenAI chat deployment
Streams responses to the frontend in real-time

Architecture Overview

RAG chatbot architecture — query flows through Semantic Kernel to vector search, retrieves context, and generates a grounded streaming response.

Step 1: Project Setup

Create a new .NET 10 Web API project:

dotnet new webapi -n RagChatbot --framework net8.0
cd RagChatbot

Install required packages:

dotnet add package Microsoft.SemanticKernel --version 1.34.0
dotnet add package Azure.AI.OpenAI --version 2.2.0
dotnet add package Microsoft.Azure.Cosmos --version 3.44.0

Step 2: Configure Services

Set up dependency injection in Program.cs:

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel;

var builder = WebApplication.CreateBuilder(args);

// Azure OpenAI
builder.Services.AddSingleton(sp =>
{
    var options = new AzureOpenAIClientOptions();
    options.Retry.MaxRetries = 5;
    options.Retry.Mode = Azure.Core.RetryMode.Exponential;

    return new AzureOpenAIClient(
        new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!),
        options
    );
});

// Cosmos DB
builder.Services.AddSingleton(sp =>
{
    var cosmosClient = new CosmosClient(
        builder.Configuration["CosmosDB:ConnectionString"],
        new CosmosClientOptions
        {
            ApplicationName = "rag-chatbot",
            ConnectionMode = ConnectionMode.Direct,
            SerializerOptions = new CosmosSerializationOptions
            {
                PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
            }
        }
    );
    return cosmosClient;
});

// Semantic Kernel
builder.Services.AddSingleton(sp =>
{
    var kernelBuilder = Kernel.CreateBuilder();
    kernelBuilder.AddAzureOpenAIChatCompletion(
        deploymentName: "chat-deployment",
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!
    );
    return kernelBuilder.Build();
});

Step 3: Document Model

Define the document structure with vector embedding support:

public class DocumentChunk
{
    public string Id { get; set; } = Guid.NewGuid().ToString();
    public string DocumentId { get; set; } = "";
    public string Content { get; set; } = "";
    public string Title { get; set; } = "";
    public float[] Embedding { get; set; } = [];
    public string Category { get; set; } = "";
    public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}

Step 4: Embedding Service

Generate embeddings using Azure OpenAI:

public class EmbeddingService(AzureOpenAIClient client)
{
    private readonly OpenAI.Embeddings.EmbeddingClient _embeddingClient
        = client.GetEmbeddingClient("text-embedding-3-small");

    public async Task<float[]> GenerateEmbeddingAsync(string text)
    {
        var result = await _embeddingClient.GenerateEmbeddingAsync(text);
        return result.Value.ToFloats().ToArray();
    }
}

Step 5: Vector Search with Cosmos DB

Query Cosmos DB using vector similarity:

public class VectorSearchService(CosmosClient cosmosClient)
{
    private readonly Container _container =
        cosmosClient.GetContainer("ragdb", "documents");

    public async Task<List<DocumentChunk>> SearchAsync(
        float[] queryEmbedding, int topK = 5)
    {
        var queryDef = new QueryDefinition(
            "SELECT TOP @topK c.id, c.content, c.title, c.category, " +
            "VectorDistance(c.embedding, @embedding) AS score " +
            "FROM c ORDER BY VectorDistance(c.embedding, @embedding)")
            .WithParameter("@topK", topK)
            .WithParameter("@embedding", queryEmbedding);

        var results = new List<DocumentChunk>();
        using var feed = _container.GetItemQueryIterator<DocumentChunk>(queryDef);
        
        while (feed.HasMoreResults)
        {
            var response = await feed.ReadNextAsync();
            results.AddRange(response);
        }

        return results;
    }
}

Step 6: Chat API Endpoint

Wire everything together in a streaming chat endpoint:

app.MapPost("/api/chat", async (
    ChatRequest request,
    EmbeddingService embeddingService,
    VectorSearchService searchService,
    Kernel kernel,
    HttpContext httpContext) =>
{
    // 1. Generate embedding for the user's query
    var queryEmbedding = await embeddingService
        .GenerateEmbeddingAsync(request.Message);

    // 2. Search for relevant document chunks
    var relevantDocs = await searchService
        .SearchAsync(queryEmbedding, topK: 5);

    // 3. Build context from retrieved documents
    var context = string.Join("\n\n",
        relevantDocs.Select(d => $"[{d.Title}]: {d.Content}"));

    // 4. Generate answer with context
    var prompt = $"""
        You are a helpful assistant. Answer the user's question based on
        the provided context. If the context doesn't contain relevant
        information, say so clearly.

        Context:
        {context}

        Question: {request.Message}
        """;

    var result = await kernel.InvokePromptAsync(prompt);
    return Results.Ok(new { response = result.ToString() });
});

If your RAG pipeline starts hitting rate limits under load, see Fix Azure OpenAI 429 Too Many Requests in .NET.

Next Steps

Add authentication with Azure AD
Implement conversation memory with Cosmos DB
Add document ingestion pipeline with Azure Functions
Deploy to Azure Container Apps
Add Application Insights telemetry

For comprehensive chat history management patterns, see Semantic Kernel Chat History Management — Sliding Windows, Summarization, and Token-Aware Truncation.

Hospital: Fix Azure OpenAI Context Length Exceeded in C#

Build a RAG Chatbot with Semantic Kernel + Cosmos DB Vector Store (.NET 10)

What You’ll Build

Architecture Overview

Step 1: Project Setup

Step 2: Configure Services

Step 3: Document Model

Step 4: Embedding Service

Step 5: Vector Search with Cosmos DB

Step 6: Chat API Endpoint

Next Steps

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Semantic Kernel Architecture: How the Kernel, Plugins & Pipeline Work

Build an AI-Powered Minimal API in .NET 10 with Microsoft.Extensions.AI

Azure OpenAI Structured Outputs: C# JSON Fix

Was this article useful?

What You’ll Build

Architecture Overview

Step 1: Project Setup

Step 2: Configure Services

Step 3: Document Model

Step 4: Embedding Service

Step 5: Vector Search with Cosmos DB

Step 6: Chat API Endpoint

Next Steps

Related Articles

Get the next .NET AI build guide in your inbox

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Semantic Kernel Architecture: How the Kernel, Plugins & Pipeline Work

Build an AI-Powered Minimal API in .NET 10 with Microsoft.Extensions.AI

Azure OpenAI Structured Outputs: C# JSON Fix

Was this article useful?