Semantic Kernel Memory + Vector Stores: Azure AI Search & Cosmos DB in C#

Verified Apr 2026 Intermediate Original .NET 10 Microsoft.SemanticKernel 1.54.0 Microsoft.SemanticKernel.Connectors.AzureAISearch 1.54.0 Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL 1.54.0

By Rajesh Mishra · Mar 12, 2026 · 11 min read

In 30 Seconds

Guide to Semantic Kernel memory system in C#. Covers three memory levels (working, short-term, long-term), vector store connectors for Azure AI Search and Cosmos DB, embedding generation, and RAG implementation patterns.

How Semantic Kernel Thinks About Memory

Memory in Semantic Kernel operates at three distinct levels. Understanding when to use each saves you from over-engineering simple scenarios or under-engineering complex ones.

Three SK memory levels — most applications need working memory + a vector store. Add session persistence only when multi-session continuity is a real user requirement.

Working Memory: ChatHistory

The simplest form of memory. Every message, tool call, and result in the current conversation lives in a ChatHistory object that gets sent to the LLM on every turn.

var history = new ChatHistory();
history.AddSystemMessage("You are a .NET expert assistant.");
history.AddUserMessage("How do I configure dependency injection?");

// The LLM sees the full history on every call
var response = await chatService.GetChatMessageContentAsync(history, settings, kernel);
history.Add(response); // Response is added for future turns

Working memory is automatic and ephemeral — it lives as long as the ChatHistory instance. When the user closes the tab, it’s gone.

Limitation: Context windows have token limits. A long-context model call with a 128K window costs more as history grows. For long conversations, you need a strategy to manage history length.

Short-Term Memory: Session Persistence

Short-term memory bridges sessions. The user asks a question Monday, comes back Wednesday, and picks up where they left off. You implement this by serializing ChatHistory to a session store.

public class SessionMemory
{
    private readonly IDistributedCache _cache;

    public SessionMemory(IDistributedCache cache) => _cache = cache;

    public async Task SaveAsync(string sessionId, ChatHistory history)
    {
        var messages = history.Select(m => new
        {
            m.Role,
            m.Content,
            m.AuthorName
        });

        var json = JsonSerializer.Serialize(messages);
        await _cache.SetStringAsync(
            $"chat:{sessionId}",
            json,
            new DistributedCacheEntryOptions
            {
                SlidingExpiration = TimeSpan.FromDays(7)
            });
    }

    public async Task<ChatHistory> LoadAsync(string sessionId)
    {
        var json = await _cache.GetStringAsync($"chat:{sessionId}");
        if (json is null) return new ChatHistory();

        var history = new ChatHistory();
        var messages = JsonSerializer.Deserialize<List<ChatMessage>>(json);
        // Reconstruct history from stored messages
        foreach (var msg in messages ?? [])
        {
            history.Add(new ChatMessageContent(msg.Role, msg.Content)
            {
                AuthorName = msg.AuthorName
            });
        }
        return history;
    }
}

Use Redis, Azure Cosmos DB, or SQL Server as your distributed cache depending on your existing infrastructure.

Long-Term Memory: Vector Stores

Long-term memory stores knowledge that persists indefinitely and is retrieved by semantic similarity. This is the foundation of RAG (Retrieval-Augmented Generation) in Semantic Kernel.

Vector store ingestion (top path) embeds document chunks into the store. At query time (bottom path) the user query is embedded and matched against stored chunks to retrieve relevant context.

Setting Up a Vector Store

Step 1: Define Your Data Model

Semantic Kernel uses attributed classes to define how data maps to vector store records:

using Microsoft.Extensions.VectorData;

public class DocumentChunk
{
    [VectorStoreRecordKey]
    public string Id { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Text { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Source { get; set; } = string.Empty;

    [VectorStoreRecordData]
    public string Category { get; set; } = string.Empty;

    [VectorStoreRecordData(IsFilterable = true)]
    public DateTime IndexedAt { get; set; }

    [VectorStoreRecordVector(1536)] // Matches text-embedding-3-small dimensions
    public ReadOnlyMemory<float> Embedding { get; set; }
}

The VectorStoreRecordVector(1536) attribute tells the connector the expected embedding dimensions. This must match your embedding model.

Step 2: Configure the Embedding Service

using Microsoft.SemanticKernel;

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAITextEmbeddingGeneration(
        deploymentName: "text-embedding-3-small",
        endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
        apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!)
    .Build();

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

Step 3: Choose a Vector Store Connector

Azure AI Search

Best for production search scenarios. Supports hybrid search (vector + keyword), semantic reranking, and filtering.

dotnet add package Microsoft.SemanticKernel.Connectors.AzureAISearch

using Azure;
using Azure.Search.Documents.Indexes;
using Microsoft.SemanticKernel.Connectors.AzureAISearch;

var searchClient = new SearchIndexClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_SEARCH_ENDPOINT")!),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_SEARCH_KEY")!));

var vectorStore = new AzureAISearchVectorStore(searchClient);
var collection = vectorStore.GetCollection<string, DocumentChunk>("knowledge-base");

// Create the index if it doesn't exist
await collection.CreateCollectionIfNotExistsAsync();

Azure Cosmos DB

Best when your application data already lives in Cosmos DB. Use the same database for both transactional data and vector search.

dotnet add package Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL

using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;

var cosmosClient = new CosmosClient(
    Environment.GetEnvironmentVariable("COSMOS_ENDPOINT")!,
    Environment.GetEnvironmentVariable("COSMOS_KEY")!);

var vectorStore = new AzureCosmosDBNoSQLVectorStore(
    cosmosClient.GetDatabase("ai-memory"));
var collection = vectorStore.GetCollection<string, DocumentChunk>("documents");

In-Memory (Development)

No dependencies. Useful for testing and prototyping.

using Microsoft.SemanticKernel.Connectors.InMemory;

var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<string, DocumentChunk>("test-collection");
await collection.CreateCollectionIfNotExistsAsync();

Ingesting Documents

The ingestion pipeline: chunk text → generate embeddings → store in vector database.

public class DocumentIngestionService
{
    private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
    private readonly ITextEmbeddingGenerationService _embeddingService;

    public DocumentIngestionService(
        IVectorStoreRecordCollection<string, DocumentChunk> collection,
        ITextEmbeddingGenerationService embeddingService)
    {
        _collection = collection;
        _embeddingService = embeddingService;
    }

    public async Task IngestAsync(string documentText, string source, string category)
    {
        var chunks = ChunkText(documentText, maxTokens: 400, overlap: 50);

        foreach (var (text, index) in chunks.Select((t, i) => (t, i)))
        {
            var embedding = await _embeddingService.GenerateEmbeddingAsync(text);

            var chunk = new DocumentChunk
            {
                Id = $"{source}_{index}",
                Text = text,
                Source = source,
                Category = category,
                IndexedAt = DateTime.UtcNow,
                Embedding = embedding
            };

            await _collection.UpsertAsync(chunk);
        }
    }

    private static List<string> ChunkText(string text, int maxTokens, int overlap)
    {
        // Simple sentence-based chunking
        var sentences = text.Split(new[] { ". ", ".\n" }, StringSplitOptions.RemoveEmptyEntries);
        var chunks = new List<string>();
        var currentChunk = new List<string>();
        var currentLength = 0;

        foreach (var sentence in sentences)
        {
            var sentenceTokens = sentence.Split(' ').Length; // Rough estimate

            if (currentLength + sentenceTokens > maxTokens && currentChunk.Count > 0)
            {
                chunks.Add(string.Join(". ", currentChunk) + ".");

                // Keep last N sentences for overlap
                var overlapSentences = currentChunk.TakeLast(2).ToList();
                currentChunk = new List<string>(overlapSentences);
                currentLength = overlapSentences.Sum(s => s.Split(' ').Length);
            }

            currentChunk.Add(sentence.TrimEnd('.'));
            currentLength += sentenceTokens;
        }

        if (currentChunk.Count > 0)
            chunks.Add(string.Join(". ", currentChunk) + ".");

        return chunks;
    }
}

Searching Memory (The “R” in RAG)

Retrieve relevant context before the LLM call. For a complete production implementation of this pattern with Azure AI Search, see Build a Semantic Search API in .NET with Azure AI Search.

public class MemorySearchService
{
    private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
    private readonly ITextEmbeddingGenerationService _embeddingService;

    public MemorySearchService(
        IVectorStoreRecordCollection<string, DocumentChunk> collection,
        ITextEmbeddingGenerationService embeddingService)
    {
        _collection = collection;
        _embeddingService = embeddingService;
    }

    public async Task<IReadOnlyList<DocumentChunk>> SearchAsync(
        string query, int limit = 3)
    {
        var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);

        var searchResults = _collection.VectorizedSearchAsync(
            queryEmbedding,
            new VectorSearchOptions { Top = limit });

        var results = new List<DocumentChunk>();
        await foreach (var result in searchResults.Results)
        {
            results.Add(result.Record);
        }

        return results;
    }
}

Putting It Together: Memory-Enhanced Agent

var memorySearch = new MemorySearchService(collection, embeddingService);
var chatService = kernel.GetRequiredService<IChatCompletionService>();

var history = new ChatHistory("""
    You are a .NET AI engineering assistant. 
    Answer questions using the provided context. 
    If the context doesn't contain the answer, say so.
    Always cite which source document your answer comes from.
    """);

while (true)
{
    Console.Write("Question: ");
    var question = Console.ReadLine();
    if (string.IsNullOrEmpty(question)) break;

    // Search memory for relevant context
    var relevantChunks = await memorySearch.SearchAsync(question, limit: 3);

    if (relevantChunks.Count > 0)
    {
        var contextBlock = string.Join("\n\n", relevantChunks.Select(c =>
            $"[Source: {c.Source}]\n{c.Text}"));

        history.AddUserMessage($"""
            Context from knowledge base:
            {contextBlock}

            Question: {question}
            """);
    }
    else
    {
        history.AddUserMessage(question);
    }

    var response = await chatService.GetChatMessageContentAsync(history);
    history.Add(response);
    Console.WriteLine($"\nAnswer: {response.Content}\n");
}

Choosing an Embedding Model

Model	Dimensions	Speed	Quality	Cost
text-embedding-3-small	1536	Fast	Good	Lowest
text-embedding-3-large	3072	Medium	Best	Medium
text-embedding-ada-002	1536	Fast	Good	Low

For most .NET applications, text-embedding-3-small provides the best balance. Use text-embedding-3-large only when retrieval precision is critical (legal, medical).

Next Steps

Semantic Kernel Plugins Guide — Build tools that use memory search
Getting Started with Semantic Kernel — Hands-on SK setup
Build Your First AI Agent — Agent with memory integration

⚠ Production Considerations

Don't embed entire documents as single vectors. Chunk them into 200-500 token segments with overlap. A single embedding for a 10-page document loses the nuance that makes retrieval useful.
Embedding model must match between ingestion and query. If you index with text-embedding-3-small and query with text-embedding-ada-002, similarity scores are meaningless. Pin your embedding model version.

🧠 Architect’s Note

Vector stores are caches of understanding, not databases of record. The source documents live in blob storage or a CMS. The vectors are a derived index that you can regenerate. Design your ingestion pipeline accordingly — if you need to re-embed everything with a new model, you should be able to do it in one batch job.

AI-Friendly Summary

Summary

Key Takeaways

Working memory = ChatHistory (current conversation context)
Short-term memory = serialized chat history in session store
Long-term memory = vector stores with embedding search
Azure AI Search and Cosmos DB are the primary Azure vector store options
Use ITextEmbeddingGenerationService for converting text to vectors

Implementation Checklist

Choose a vector store connector (Azure AI Search, Cosmos DB, or in-memory)
Configure an embedding model (Azure OpenAI text-embedding-3-small)
Define a data model with VectorStoreRecordKey and VectorStoreRecordVector attributes
Implement upsert for adding documents to the store
Implement search for retrieving relevant context before LLM calls

Frequently Asked Questions

What is a vector store?

A vector store is a database optimized for storing and searching vector embeddings — numerical representations of text, images, or other data. When you search a vector store, it finds items that are semantically similar to your query, not just keyword matches.

Which vector store should I use with Semantic Kernel?

For production on Azure, use Azure AI Search (managed service, hybrid search, built-in reranking) or Azure Cosmos DB (if you already use Cosmos for your application data). For development and testing, use the in-memory store. For self-hosted, consider Qdrant or PostgreSQL with pgvector.

How is Semantic Kernel memory different from RAG?

Semantic Kernel memory is the framework's abstraction for storing and retrieving contextual data. RAG (Retrieval-Augmented Generation) is the pattern of retrieving relevant documents before generating a response. SK memory is one way to implement the retrieval step of RAG.

Track your progress through this learning path.

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Semantic Kernel Memory + Vector Stores: Azure AI Search & Cosmos DB in C#

How Semantic Kernel Thinks About Memory

Working Memory: ChatHistory

Short-Term Memory: Session Persistence

Long-Term Memory: Vector Stores

Setting Up a Vector Store

Step 1: Define Your Data Model

Step 2: Configure the Embedding Service

Step 3: Choose a Vector Store Connector

Azure AI Search

Azure Cosmos DB

In-Memory (Development)

Ingesting Documents

Searching Memory (The “R” in RAG)

Putting It Together: Memory-Enhanced Agent

Choosing an Embedding Model

Next Steps

⚠ Production Considerations

🧠 Architect’s Note

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Vector Databases for .NET Compared

Azure AI Search Vector Dimension Mismatch Fix

RAG Architecture for .NET: Embeddings, Vector Search & Grounded Generation

Was this article useful?

How Semantic Kernel Thinks About Memory

Working Memory: ChatHistory

Short-Term Memory: Session Persistence

Long-Term Memory: Vector Stores

Setting Up a Vector Store

Step 1: Define Your Data Model

Step 2: Configure the Embedding Service

Step 3: Choose a Vector Store Connector

Azure AI Search

Azure Cosmos DB

In-Memory (Development)

Ingesting Documents

Searching Memory (The “R” in RAG)

Putting It Together: Memory-Enhanced Agent

Choosing an Embedding Model

Next Steps

⚠ Production Considerations

Get sharper .NET AI architecture notes every week

🧠 Architect’s Note

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

You Might Also Enjoy

Vector Databases for .NET Compared

Azure AI Search Vector Dimension Mismatch Fix

RAG Architecture for .NET: Embeddings, Vector Search & Grounded Generation

Was this article useful?