Why use Azure Cosmos DB for RAG instead of a dedicated vector database?

Azure Cosmos DB now supports native vector search alongside your operational data. This eliminates the need for a separate vector store, reduces architectural complexity, and ensures your embeddings stay in sync with your source data — all within a single globally-distributed database.

What embedding model should I use?

For most .NET RAG applications, use Azure OpenAI's text-embedding-3-small or text-embedding-3-large. The small model offers excellent quality/cost ratio for most use cases.

Build a RAG Chatbot with .NET 8+, Semantic Kernel, and Azure Cosmos DB

What You’ll Build

A production-ready RAG (Retrieval-Augmented Generation) chatbot that:

Ingests documents and generates vector embeddings
Stores embeddings alongside operational data in Azure Cosmos DB
Retrieves relevant context using vector similarity search
Generates accurate, grounded answers using Azure OpenAI GPT-4o
Streams responses to the frontend in real-time

Architecture Overview

User Query → .NET 8+ API → Semantic Kernel
                              ↓
                   Azure Cosmos DB (Vector Search)
                              ↓
                   Retrieved Context + Query
                              ↓
                   Azure OpenAI GPT-4o
                              ↓
                   Streaming Response → User

Step 1: Project Setup

Create a new .NET 8+ Web API project:

dotnet new webapi -n RagChatbot --framework net8.0
cd RagChatbot

Install required packages:

dotnet add package Microsoft.SemanticKernel --version 1.34.0
dotnet add package Azure.AI.OpenAI --version 2.2.0
dotnet add package Microsoft.Azure.Cosmos --version 3.44.0

Step 2: Configure Services

Set up dependency injection in Program.cs:

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel;

var builder = WebApplication.CreateBuilder(args);

// Azure OpenAI
builder.Services.AddSingleton(sp =>
{
    var options = new AzureOpenAIClientOptions();
    options.Retry.MaxRetries = 5;
    options.Retry.Mode = Azure.Core.RetryMode.Exponential;

    return new AzureOpenAIClient(
        new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!),
        options
    );
});

// Cosmos DB
builder.Services.AddSingleton(sp =>
{
    var cosmosClient = new CosmosClient(
        builder.Configuration["CosmosDB:ConnectionString"],
        new CosmosClientOptions
        {
            ApplicationName = "rag-chatbot",
            ConnectionMode = ConnectionMode.Direct,
            SerializerOptions = new CosmosSerializationOptions
            {
                PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
            }
        }
    );
    return cosmosClient;
});

// Semantic Kernel
builder.Services.AddSingleton(sp =>
{
    var kernelBuilder = Kernel.CreateBuilder();
    kernelBuilder.AddAzureOpenAIChatCompletion(
        deploymentName: "gpt-4o",
        endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
        apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!
    );
    return kernelBuilder.Build();
});

Step 3: Document Model

Define the document structure with vector embedding support:

public class DocumentChunk
{
    public string Id { get; set; } = Guid.NewGuid().ToString();
    public string DocumentId { get; set; } = "";
    public string Content { get; set; } = "";
    public string Title { get; set; } = "";
    public float[] Embedding { get; set; } = [];
    public string Category { get; set; } = "";
    public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}

Step 4: Embedding Service

Generate embeddings using Azure OpenAI:

public class EmbeddingService(AzureOpenAIClient client)
{
    private readonly OpenAI.Embeddings.EmbeddingClient _embeddingClient
        = client.GetEmbeddingClient("text-embedding-3-small");

    public async Task<float[]> GenerateEmbeddingAsync(string text)
    {
        var result = await _embeddingClient.GenerateEmbeddingAsync(text);
        return result.Value.ToFloats().ToArray();
    }
}

Step 5: Vector Search with Cosmos DB

Query Cosmos DB using vector similarity:

public class VectorSearchService(CosmosClient cosmosClient)
{
    private readonly Container _container =
        cosmosClient.GetContainer("ragdb", "documents");

    public async Task<List<DocumentChunk>> SearchAsync(
        float[] queryEmbedding, int topK = 5)
    {
        var queryDef = new QueryDefinition(
            "SELECT TOP @topK c.id, c.content, c.title, c.category, " +
            "VectorDistance(c.embedding, @embedding) AS score " +
            "FROM c ORDER BY VectorDistance(c.embedding, @embedding)")
            .WithParameter("@topK", topK)
            .WithParameter("@embedding", queryEmbedding);

        var results = new List<DocumentChunk>();
        using var feed = _container.GetItemQueryIterator<DocumentChunk>(queryDef);
        
        while (feed.HasMoreResults)
        {
            var response = await feed.ReadNextAsync();
            results.AddRange(response);
        }

        return results;
    }
}

Step 6: Chat API Endpoint

Wire everything together in a streaming chat endpoint:

app.MapPost("/api/chat", async (
    ChatRequest request,
    EmbeddingService embeddingService,
    VectorSearchService searchService,
    Kernel kernel,
    HttpContext httpContext) =>
{
    // 1. Generate embedding for the user's query
    var queryEmbedding = await embeddingService
        .GenerateEmbeddingAsync(request.Message);

    // 2. Search for relevant document chunks
    var relevantDocs = await searchService
        .SearchAsync(queryEmbedding, topK: 5);

    // 3. Build context from retrieved documents
    var context = string.Join("\n\n",
        relevantDocs.Select(d => $"[{d.Title}]: {d.Content}"));

    // 4. Generate answer with context
    var prompt = $"""
        You are a helpful assistant. Answer the user's question based on
        the provided context. If the context doesn't contain relevant
        information, say so clearly.

        Context:
        {context}

        Question: {request.Message}
        """;

    var result = await kernel.InvokePromptAsync(prompt);
    return Results.Ok(new { response = result.ToString() });
});

Next Steps

Add authentication with Azure AD
Implement conversation memory with Cosmos DB
Add document ingestion pipeline with Azure Functions
Deploy to Azure Container Apps
Add Application Insights telemetry

Build a RAG Chatbot with .NET 8+, Semantic Kernel, and Azure Cosmos DB

What You’ll Build

Architecture Overview

Step 1: Project Setup

Step 2: Configure Services

Step 3: Document Model

Step 4: Embedding Service

Step 5: Vector Search with Cosmos DB

Step 6: Chat API Endpoint

Next Steps

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

Related Articles

Semantic Kernel Architecture Deep Dive

Retrieval-Augmented Generation Explained for .NET Architects

Azure.AI.OpenAI 2.1.0 Released — GA Streaming, Vision, and Structured Outputs

Was this article useful?

Discussion