Skip to main content

Semantic Kernel Memory + Vector Stores: Azure AI Search & Cosmos DB in C#

Verified Apr 2026 Intermediate Original .NET 10 Microsoft.SemanticKernel 1.54.0 Microsoft.SemanticKernel.Connectors.AzureAISearch 1.54.0 Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL 1.54.0
By Rajesh Mishra · Mar 12, 2026 · 11 min read
In 30 Seconds

Guide to Semantic Kernel memory system in C#. Covers three memory levels (working, short-term, long-term), vector store connectors for Azure AI Search and Cosmos DB, embedding generation, and RAG implementation patterns.

How Semantic Kernel Thinks About Memory

Memory in Semantic Kernel operates at three distinct levels. Understanding when to use each saves you from over-engineering simple scenarios or under-engineering complex ones.

Working MemoryChatHistory — current sessionShort-Term MemorySession Persistence — Redis / Cosmos DBLong-Term MemoryVector Store — Azure AI Search / Cosmos DB serialize on session endreload on session startsemantic search results
Three SK memory levels — most applications need working memory + a vector store. Add session persistence only when multi-session continuity is a real user requirement.

Working Memory: ChatHistory

The simplest form of memory. Every message, tool call, and result in the current conversation lives in a ChatHistory object that gets sent to the LLM on every turn.

var history = new ChatHistory();
history.AddSystemMessage("You are a .NET expert assistant.");
history.AddUserMessage("How do I configure dependency injection?");

// The LLM sees the full history on every call
var response = await chatService.GetChatMessageContentAsync(history, settings, kernel);
history.Add(response); // Response is added for future turns

Working memory is automatic and ephemeral — it lives as long as the ChatHistory instance. When the user closes the tab, it’s gone.

Limitation: Context windows have token limits. A long-context model call with a 128K window costs more as history grows. For long conversations, you need a strategy to manage history length.

Short-Term Memory: Session Persistence

Short-term memory bridges sessions. The user asks a question Monday, comes back Wednesday, and picks up where they left off. You implement this by serializing ChatHistory to a session store.

public class SessionMemory
{
    private readonly IDistributedCache _cache;

    public SessionMemory(IDistributedCache cache) => _cache = cache;

    public async Task SaveAsync(string sessionId, ChatHistory history)
    {
        var messages = history.Select(m => new
        {
            m.Role,
            m.Content,
            m.AuthorName
        });

        var json = JsonSerializer.Serialize(messages);
        await _cache.SetStringAsync(
            $"chat:{sessionId}",
            json,
            new DistributedCacheEntryOptions
            {
                SlidingExpiration = TimeSpan.FromDays(7)
            });
    }

    public async Task<ChatHistory> LoadAsync(string sessionId)
    {
        var json = await _cache.GetStringAsync($"chat:{sessionId}");
        if (json is null) return new ChatHistory();

        var history = new ChatHistory();
        var messages = JsonSerializer.Deserialize<List<ChatMessage>>(json);
        // Reconstruct history from stored messages
        foreach (var msg in messages ?? [])
        {
            history.Add(new ChatMessageContent(msg.Role, msg.Content)
            {
                AuthorName = msg.AuthorName
            });
        }
        return history;
    }
}

Use Redis, Azure Cosmos DB, or SQL Server as your distributed cache depending on your existing infrastructure.

Long-Term Memory: Vector Stores

Long-term memory stores knowledge that persists indefinitely and is retrieved by semantic similarity. This is the foundation of RAG (Retrieval-Augmented Generation) in Semantic Kernel.

DocumentsChunkSplit into segmentsEmbedtext-embedding-3-smallVector StoreAzure AI Search /Cosmos DBUser QuerySemantic SearchTop-K resultsContextInjected into prompt similarity lookup
Vector store ingestion (top path) embeds document chunks into the store. At query time (bottom path) the user query is embedded and matched against stored chunks to retrieve relevant context.

Setting Up a Vector Store

Step 1: Define Your Data Model

Semantic Kernel uses attributed classes to define how data maps to vector store records:

using Microsoft.Extensions.VectorData;

public class DocumentChunk
{
    [VectorStoreRecordKey]
    public string Id { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Text { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Source { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Category { get; set; } = string.Empty;

    [VectorStoreRecordData(IsFilterable = true)]
    public DateTime IndexedAt { get; set; }

    [VectorStoreRecordVector(1536)] // Matches text-embedding-3-small dimensions
    public ReadOnlyMemory<float> Embedding { get; set; }
}

The VectorStoreRecordVector(1536) attribute tells the connector the expected embedding dimensions. This must match your embedding model.

Step 2: Configure the Embedding Service

using Microsoft.SemanticKernel;

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAITextEmbeddingGeneration(
        deploymentName: "text-embedding-3-small",
        endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
        apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!)
    .Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

Step 3: Choose a Vector Store Connector

Best for production search scenarios. Supports hybrid search (vector + keyword), semantic reranking, and filtering.

dotnet add package Microsoft.SemanticKernel.Connectors.AzureAISearch
using Azure;
using Azure.Search.Documents.Indexes;
using Microsoft.SemanticKernel.Connectors.AzureAISearch;

var searchClient = new SearchIndexClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_SEARCH_ENDPOINT")!),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_SEARCH_KEY")!));

var vectorStore = new AzureAISearchVectorStore(searchClient);
var collection = vectorStore.GetCollection<string, DocumentChunk>("knowledge-base");

// Create the index if it doesn't exist
await collection.CreateCollectionIfNotExistsAsync();

Azure Cosmos DB

Best when your application data already lives in Cosmos DB. Use the same database for both transactional data and vector search.

dotnet add package Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;

var cosmosClient = new CosmosClient(
    Environment.GetEnvironmentVariable("COSMOS_ENDPOINT")!,
    Environment.GetEnvironmentVariable("COSMOS_KEY")!);

var vectorStore = new AzureCosmosDBNoSQLVectorStore(
    cosmosClient.GetDatabase("ai-memory"));
var collection = vectorStore.GetCollection<string, DocumentChunk>("documents");

In-Memory (Development)

No dependencies. Useful for testing and prototyping.

using Microsoft.SemanticKernel.Connectors.InMemory;

var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<string, DocumentChunk>("test-collection");
await collection.CreateCollectionIfNotExistsAsync();

Ingesting Documents

The ingestion pipeline: chunk text → generate embeddings → store in vector database.

public class DocumentIngestionService
{
    private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
    private readonly ITextEmbeddingGenerationService _embeddingService;

    public DocumentIngestionService(
        IVectorStoreRecordCollection<string, DocumentChunk> collection,
        ITextEmbeddingGenerationService embeddingService)
    {
        _collection = collection;
        _embeddingService = embeddingService;
    }

    public async Task IngestAsync(string documentText, string source, string category)
    {
        var chunks = ChunkText(documentText, maxTokens: 400, overlap: 50);

        foreach (var (text, index) in chunks.Select((t, i) => (t, i)))
        {
            var embedding = await _embeddingService.GenerateEmbeddingAsync(text);

            var chunk = new DocumentChunk
            {
                Id = $"{source}_{index}",
                Text = text,
                Source = source,
                Category = category,
                IndexedAt = DateTime.UtcNow,
                Embedding = embedding
            };

            await _collection.UpsertAsync(chunk);
        }
    }

    private static List<string> ChunkText(string text, int maxTokens, int overlap)
    {
        // Simple sentence-based chunking
        var sentences = text.Split(new[] { ". ", ".\n" }, StringSplitOptions.RemoveEmptyEntries);
        var chunks = new List<string>();
        var currentChunk = new List<string>();
        var currentLength = 0;

        foreach (var sentence in sentences)
        {
            var sentenceTokens = sentence.Split(' ').Length; // Rough estimate

            if (currentLength + sentenceTokens > maxTokens && currentChunk.Count > 0)
            {
                chunks.Add(string.Join(". ", currentChunk) + ".");

                // Keep last N sentences for overlap
                var overlapSentences = currentChunk.TakeLast(2).ToList();
                currentChunk = new List<string>(overlapSentences);
                currentLength = overlapSentences.Sum(s => s.Split(' ').Length);
            }

            currentChunk.Add(sentence.TrimEnd('.'));
            currentLength += sentenceTokens;
        }

        if (currentChunk.Count > 0)
            chunks.Add(string.Join(". ", currentChunk) + ".");

        return chunks;
    }
}

Searching Memory (The “R” in RAG)

Retrieve relevant context before the LLM call. For a complete production implementation of this pattern with Azure AI Search, see Build a Semantic Search API in .NET with Azure AI Search.

public class MemorySearchService
{
    private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
    private readonly ITextEmbeddingGenerationService _embeddingService;

    public MemorySearchService(
        IVectorStoreRecordCollection<string, DocumentChunk> collection,
        ITextEmbeddingGenerationService embeddingService)
    {
        _collection = collection;
        _embeddingService = embeddingService;
    }

    public async Task<IReadOnlyList<DocumentChunk>> SearchAsync(
        string query, int limit = 3)
    {
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);

        var searchResults = _collection.VectorizedSearchAsync(
            queryEmbedding,
            new VectorSearchOptions { Top = limit });

        var results = new List<DocumentChunk>();
        await foreach (var result in searchResults.Results)
        {
            results.Add(result.Record);
        }

        return results;
    }
}

Putting It Together: Memory-Enhanced Agent

var memorySearch = new MemorySearchService(collection, embeddingService);
var chatService = kernel.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory("""
    You are a .NET AI engineering assistant. 
    Answer questions using the provided context. 
    If the context doesn't contain the answer, say so.
    Always cite which source document your answer comes from.
    """);

while (true)
{
    Console.Write("Question: ");
    var question = Console.ReadLine();
    if (string.IsNullOrEmpty(question)) break;

    // Search memory for relevant context
    var relevantChunks = await memorySearch.SearchAsync(question, limit: 3);

    if (relevantChunks.Count > 0)
    {
        var contextBlock = string.Join("\n\n", relevantChunks.Select(c =>
            $"[Source: {c.Source}]\n{c.Text}"));

        history.AddUserMessage($"""
            Context from knowledge base:
            {contextBlock}

            Question: {question}
            """);
    }
    else
    {
        history.AddUserMessage(question);
    }

    var response = await chatService.GetChatMessageContentAsync(history);
    history.Add(response);
    Console.WriteLine($"\nAnswer: {response.Content}\n");
}

Choosing an Embedding Model

ModelDimensionsSpeedQualityCost
text-embedding-3-small1536FastGoodLowest
text-embedding-3-large3072MediumBestMedium
text-embedding-ada-0021536FastGoodLow

For most .NET applications, text-embedding-3-small provides the best balance. Use text-embedding-3-large only when retrieval precision is critical (legal, medical).

Next Steps

⚠ Production Considerations

  • Don't embed entire documents as single vectors. Chunk them into 200-500 token segments with overlap. A single embedding for a 10-page document loses the nuance that makes retrieval useful.
  • Embedding model must match between ingestion and query. If you index with text-embedding-3-small and query with text-embedding-ada-002, similarity scores are meaningless. Pin your embedding model version.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

🧠 Architect’s Note

Vector stores are caches of understanding, not databases of record. The source documents live in blob storage or a CMS. The vectors are a derived index that you can regenerate. Design your ingestion pipeline accordingly — if you need to re-embed everything with a new model, you should be able to do it in one batch job.

AI-Friendly Summary

Summary

Guide to Semantic Kernel memory system in C#. Covers three memory levels (working, short-term, long-term), vector store connectors for Azure AI Search and Cosmos DB, embedding generation, and RAG implementation patterns.

Key Takeaways

  • Working memory = ChatHistory (current conversation context)
  • Short-term memory = serialized chat history in session store
  • Long-term memory = vector stores with embedding search
  • Azure AI Search and Cosmos DB are the primary Azure vector store options
  • Use ITextEmbeddingGenerationService for converting text to vectors

Implementation Checklist

  • Choose a vector store connector (Azure AI Search, Cosmos DB, or in-memory)
  • Configure an embedding model (Azure OpenAI text-embedding-3-small)
  • Define a data model with VectorStoreRecordKey and VectorStoreRecordVector attributes
  • Implement upsert for adding documents to the store
  • Implement search for retrieving relevant context before LLM calls

Frequently Asked Questions

What is a vector store?

A vector store is a database optimized for storing and searching vector embeddings — numerical representations of text, images, or other data. When you search a vector store, it finds items that are semantically similar to your query, not just keyword matches.

Which vector store should I use with Semantic Kernel?

For production on Azure, use Azure AI Search (managed service, hybrid search, built-in reranking) or Azure Cosmos DB (if you already use Cosmos for your application data). For development and testing, use the in-memory store. For self-hosted, consider Qdrant or PostgreSQL with pgvector.

How is Semantic Kernel memory different from RAG?

Semantic Kernel memory is the framework's abstraction for storing and retrieving contextual data. RAG (Retrieval-Augmented Generation) is the pattern of retrieving relevant documents before generating a response. SK memory is one way to implement the retrieval step of RAG.

Track your progress through this learning path.

You Might Also Enjoy

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#Semantic Kernel #Vector Store #Memory #RAG #Azure AI Search