How Semantic Kernel Thinks About Memory
Memory in Semantic Kernel operates at three distinct levels. Understanding when to use each saves you from over-engineering simple scenarios or under-engineering complex ones.
Working Memory: ChatHistory
The simplest form of memory. Every message, tool call, and result in the current conversation lives in a ChatHistory object that gets sent to the LLM on every turn.
var history = new ChatHistory();
history.AddSystemMessage("You are a .NET expert assistant.");
history.AddUserMessage("How do I configure dependency injection?");
// The LLM sees the full history on every call
var response = await chatService.GetChatMessageContentAsync(history, settings, kernel);
history.Add(response); // Response is added for future turns
Working memory is automatic and ephemeral — it lives as long as the ChatHistory instance. When the user closes the tab, it’s gone.
Limitation: Context windows have token limits. A long-context model call with a 128K window costs more as history grows. For long conversations, you need a strategy to manage history length.
Short-Term Memory: Session Persistence
Short-term memory bridges sessions. The user asks a question Monday, comes back Wednesday, and picks up where they left off. You implement this by serializing ChatHistory to a session store.
public class SessionMemory
{
private readonly IDistributedCache _cache;
public SessionMemory(IDistributedCache cache) => _cache = cache;
public async Task SaveAsync(string sessionId, ChatHistory history)
{
var messages = history.Select(m => new
{
m.Role,
m.Content,
m.AuthorName
});
var json = JsonSerializer.Serialize(messages);
await _cache.SetStringAsync(
$"chat:{sessionId}",
json,
new DistributedCacheEntryOptions
{
SlidingExpiration = TimeSpan.FromDays(7)
});
}
public async Task<ChatHistory> LoadAsync(string sessionId)
{
var json = await _cache.GetStringAsync($"chat:{sessionId}");
if (json is null) return new ChatHistory();
var history = new ChatHistory();
var messages = JsonSerializer.Deserialize<List<ChatMessage>>(json);
// Reconstruct history from stored messages
foreach (var msg in messages ?? [])
{
history.Add(new ChatMessageContent(msg.Role, msg.Content)
{
AuthorName = msg.AuthorName
});
}
return history;
}
}
Use Redis, Azure Cosmos DB, or SQL Server as your distributed cache depending on your existing infrastructure.
Long-Term Memory: Vector Stores
Long-term memory stores knowledge that persists indefinitely and is retrieved by semantic similarity. This is the foundation of RAG (Retrieval-Augmented Generation) in Semantic Kernel.
Setting Up a Vector Store
Step 1: Define Your Data Model
Semantic Kernel uses attributed classes to define how data maps to vector store records:
using Microsoft.Extensions.VectorData;
public class DocumentChunk
{
[VectorStoreRecordKey]
public string Id { get; set; } = string.Empty;
[VectorStoreRecordData]
public string Text { get; set; } = string.Empty;
[VectorStoreRecordData]
public string Source { get; set; } = string.Empty;
[VectorStoreRecordData]
public string Category { get; set; } = string.Empty;
[VectorStoreRecordData(IsFilterable = true)]
public DateTime IndexedAt { get; set; }
[VectorStoreRecordVector(1536)] // Matches text-embedding-3-small dimensions
public ReadOnlyMemory<float> Embedding { get; set; }
}
The VectorStoreRecordVector(1536) attribute tells the connector the expected embedding dimensions. This must match your embedding model.
Step 2: Configure the Embedding Service
using Microsoft.SemanticKernel;
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: "text-embedding-3-small",
endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!)
.Build();
var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
Step 3: Choose a Vector Store Connector
Azure AI Search
Best for production search scenarios. Supports hybrid search (vector + keyword), semantic reranking, and filtering.
dotnet add package Microsoft.SemanticKernel.Connectors.AzureAISearch
using Azure;
using Azure.Search.Documents.Indexes;
using Microsoft.SemanticKernel.Connectors.AzureAISearch;
var searchClient = new SearchIndexClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_SEARCH_ENDPOINT")!),
new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_SEARCH_KEY")!));
var vectorStore = new AzureAISearchVectorStore(searchClient);
var collection = vectorStore.GetCollection<string, DocumentChunk>("knowledge-base");
// Create the index if it doesn't exist
await collection.CreateCollectionIfNotExistsAsync();
Azure Cosmos DB
Best when your application data already lives in Cosmos DB. Use the same database for both transactional data and vector search.
dotnet add package Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL
using Microsoft.Azure.Cosmos;
using Microsoft.SemanticKernel.Connectors.AzureCosmosDBNoSQL;
var cosmosClient = new CosmosClient(
Environment.GetEnvironmentVariable("COSMOS_ENDPOINT")!,
Environment.GetEnvironmentVariable("COSMOS_KEY")!);
var vectorStore = new AzureCosmosDBNoSQLVectorStore(
cosmosClient.GetDatabase("ai-memory"));
var collection = vectorStore.GetCollection<string, DocumentChunk>("documents");
In-Memory (Development)
No dependencies. Useful for testing and prototyping.
using Microsoft.SemanticKernel.Connectors.InMemory;
var vectorStore = new InMemoryVectorStore();
var collection = vectorStore.GetCollection<string, DocumentChunk>("test-collection");
await collection.CreateCollectionIfNotExistsAsync();
Ingesting Documents
The ingestion pipeline: chunk text → generate embeddings → store in vector database.
public class DocumentIngestionService
{
private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
private readonly ITextEmbeddingGenerationService _embeddingService;
public DocumentIngestionService(
IVectorStoreRecordCollection<string, DocumentChunk> collection,
ITextEmbeddingGenerationService embeddingService)
{
_collection = collection;
_embeddingService = embeddingService;
}
public async Task IngestAsync(string documentText, string source, string category)
{
var chunks = ChunkText(documentText, maxTokens: 400, overlap: 50);
foreach (var (text, index) in chunks.Select((t, i) => (t, i)))
{
var embedding = await _embeddingService.GenerateEmbeddingAsync(text);
var chunk = new DocumentChunk
{
Id = $"{source}_{index}",
Text = text,
Source = source,
Category = category,
IndexedAt = DateTime.UtcNow,
Embedding = embedding
};
await _collection.UpsertAsync(chunk);
}
}
private static List<string> ChunkText(string text, int maxTokens, int overlap)
{
// Simple sentence-based chunking
var sentences = text.Split(new[] { ". ", ".\n" }, StringSplitOptions.RemoveEmptyEntries);
var chunks = new List<string>();
var currentChunk = new List<string>();
var currentLength = 0;
foreach (var sentence in sentences)
{
var sentenceTokens = sentence.Split(' ').Length; // Rough estimate
if (currentLength + sentenceTokens > maxTokens && currentChunk.Count > 0)
{
chunks.Add(string.Join(". ", currentChunk) + ".");
// Keep last N sentences for overlap
var overlapSentences = currentChunk.TakeLast(2).ToList();
currentChunk = new List<string>(overlapSentences);
currentLength = overlapSentences.Sum(s => s.Split(' ').Length);
}
currentChunk.Add(sentence.TrimEnd('.'));
currentLength += sentenceTokens;
}
if (currentChunk.Count > 0)
chunks.Add(string.Join(". ", currentChunk) + ".");
return chunks;
}
}
Searching Memory (The “R” in RAG)
Retrieve relevant context before the LLM call. For a complete production implementation of this pattern with Azure AI Search, see Build a Semantic Search API in .NET with Azure AI Search.
public class MemorySearchService
{
private readonly IVectorStoreRecordCollection<string, DocumentChunk> _collection;
private readonly ITextEmbeddingGenerationService _embeddingService;
public MemorySearchService(
IVectorStoreRecordCollection<string, DocumentChunk> collection,
ITextEmbeddingGenerationService embeddingService)
{
_collection = collection;
_embeddingService = embeddingService;
}
public async Task<IReadOnlyList<DocumentChunk>> SearchAsync(
string query, int limit = 3)
{
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query);
var searchResults = _collection.VectorizedSearchAsync(
queryEmbedding,
new VectorSearchOptions { Top = limit });
var results = new List<DocumentChunk>();
await foreach (var result in searchResults.Results)
{
results.Add(result.Record);
}
return results;
}
}
Putting It Together: Memory-Enhanced Agent
var memorySearch = new MemorySearchService(collection, embeddingService);
var chatService = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory("""
You are a .NET AI engineering assistant.
Answer questions using the provided context.
If the context doesn't contain the answer, say so.
Always cite which source document your answer comes from.
""");
while (true)
{
Console.Write("Question: ");
var question = Console.ReadLine();
if (string.IsNullOrEmpty(question)) break;
// Search memory for relevant context
var relevantChunks = await memorySearch.SearchAsync(question, limit: 3);
if (relevantChunks.Count > 0)
{
var contextBlock = string.Join("\n\n", relevantChunks.Select(c =>
$"[Source: {c.Source}]\n{c.Text}"));
history.AddUserMessage($"""
Context from knowledge base:
{contextBlock}
Question: {question}
""");
}
else
{
history.AddUserMessage(question);
}
var response = await chatService.GetChatMessageContentAsync(history);
history.Add(response);
Console.WriteLine($"\nAnswer: {response.Content}\n");
}
Choosing an Embedding Model
| Model | Dimensions | Speed | Quality | Cost |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | Fast | Good | Lowest |
| text-embedding-3-large | 3072 | Medium | Best | Medium |
| text-embedding-ada-002 | 1536 | Fast | Good | Low |
For most .NET applications, text-embedding-3-small provides the best balance. Use text-embedding-3-large only when retrieval precision is critical (legal, medical).
Next Steps
- Semantic Kernel Plugins Guide — Build tools that use memory search
- Getting Started with Semantic Kernel — Hands-on SK setup
- Build Your First AI Agent — Agent with memory integration