What You’ll Build
A production-ready RAG (Retrieval-Augmented Generation) chatbot that:
- Ingests documents and generates vector embeddings
- Stores embeddings alongside operational data in Azure Cosmos DB
- Retrieves relevant context using vector similarity search
- Generates accurate, grounded answers using Azure OpenAI GPT-4o
- Streams responses to the frontend in real-time
Architecture Overview
User Query → .NET 8+ API → Semantic Kernel
↓
Azure Cosmos DB (Vector Search)
↓
Retrieved Context + Query
↓
Azure OpenAI GPT-4o
↓
Streaming Response → User
Step 1: Project Setup
Create a new .NET 8+ Web API project:
dotnet new webapi -n RagChatbot --framework net8.0
cd RagChatbot
Install required packages:
dotnet add package Microsoft.SemanticKernel --version 1.34.0
dotnet add package Azure.AI.OpenAI --version 2.2.0
dotnet add package Microsoft.Azure.Cosmos --version 3.44.0
Step 2: Configure Services
Set up dependency injection in Program.cs:
using Azure;
using Azure.AI.OpenAI;
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel;
var builder = WebApplication.CreateBuilder(args);
// Azure OpenAI
builder.Services.AddSingleton(sp =>
{
var options = new AzureOpenAIClientOptions();
options.Retry.MaxRetries = 5;
options.Retry.Mode = Azure.Core.RetryMode.Exponential;
return new AzureOpenAIClient(
new Uri(builder.Configuration["AzureOpenAI:Endpoint"]!),
new AzureKeyCredential(builder.Configuration["AzureOpenAI:ApiKey"]!),
options
);
});
// Cosmos DB
builder.Services.AddSingleton(sp =>
{
var cosmosClient = new CosmosClient(
builder.Configuration["CosmosDB:ConnectionString"],
new CosmosClientOptions
{
ApplicationName = "rag-chatbot",
ConnectionMode = ConnectionMode.Direct,
SerializerOptions = new CosmosSerializationOptions
{
PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
}
}
);
return cosmosClient;
});
// Semantic Kernel
builder.Services.AddSingleton(sp =>
{
var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddAzureOpenAIChatCompletion(
deploymentName: "gpt-4o",
endpoint: builder.Configuration["AzureOpenAI:Endpoint"]!,
apiKey: builder.Configuration["AzureOpenAI:ApiKey"]!
);
return kernelBuilder.Build();
});
Step 3: Document Model
Define the document structure with vector embedding support:
public class DocumentChunk
{
public string Id { get; set; } = Guid.NewGuid().ToString();
public string DocumentId { get; set; } = "";
public string Content { get; set; } = "";
public string Title { get; set; } = "";
public float[] Embedding { get; set; } = [];
public string Category { get; set; } = "";
public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}
Step 4: Embedding Service
Generate embeddings using Azure OpenAI:
public class EmbeddingService(AzureOpenAIClient client)
{
private readonly OpenAI.Embeddings.EmbeddingClient _embeddingClient
= client.GetEmbeddingClient("text-embedding-3-small");
public async Task<float[]> GenerateEmbeddingAsync(string text)
{
var result = await _embeddingClient.GenerateEmbeddingAsync(text);
return result.Value.ToFloats().ToArray();
}
}
Step 5: Vector Search with Cosmos DB
Query Cosmos DB using vector similarity:
public class VectorSearchService(CosmosClient cosmosClient)
{
private readonly Container _container =
cosmosClient.GetContainer("ragdb", "documents");
public async Task<List<DocumentChunk>> SearchAsync(
float[] queryEmbedding, int topK = 5)
{
var queryDef = new QueryDefinition(
"SELECT TOP @topK c.id, c.content, c.title, c.category, " +
"VectorDistance(c.embedding, @embedding) AS score " +
"FROM c ORDER BY VectorDistance(c.embedding, @embedding)")
.WithParameter("@topK", topK)
.WithParameter("@embedding", queryEmbedding);
var results = new List<DocumentChunk>();
using var feed = _container.GetItemQueryIterator<DocumentChunk>(queryDef);
while (feed.HasMoreResults)
{
var response = await feed.ReadNextAsync();
results.AddRange(response);
}
return results;
}
}
Step 6: Chat API Endpoint
Wire everything together in a streaming chat endpoint:
app.MapPost("/api/chat", async (
ChatRequest request,
EmbeddingService embeddingService,
VectorSearchService searchService,
Kernel kernel,
HttpContext httpContext) =>
{
// 1. Generate embedding for the user's query
var queryEmbedding = await embeddingService
.GenerateEmbeddingAsync(request.Message);
// 2. Search for relevant document chunks
var relevantDocs = await searchService
.SearchAsync(queryEmbedding, topK: 5);
// 3. Build context from retrieved documents
var context = string.Join("\n\n",
relevantDocs.Select(d => $"[{d.Title}]: {d.Content}"));
// 4. Generate answer with context
var prompt = $"""
You are a helpful assistant. Answer the user's question based on
the provided context. If the context doesn't contain relevant
information, say so clearly.
Context:
{context}
Question: {request.Message}
""";
var result = await kernel.InvokePromptAsync(prompt);
return Results.Ok(new { response = result.ToString() });
});
Next Steps
- Add authentication with Azure AD
- Implement conversation memory with Cosmos DB
- Add document ingestion pipeline with Azure Functions
- Deploy to Azure Container Apps
- Add Application Insights telemetry