Semantic Kernel’s filter system gives you a structured way to intercept every function call, prompt render, and auto-invoked tool call in your AI pipeline. Think of it as ASP.NET Core middleware but for your AI layer — the same onion model, the same short-circuit semantics, and the same dependency injection support.
This guide covers all three filter types, five production-ready patterns, and the composition model for stacking filters without surprises.
The Three SK Filter Types
SK 1.x ships three filter interfaces. Choosing the right one determines what you intercept and when.
| Filter | Interface | Wraps | Use For |
|---|---|---|---|
| Function invocation | IFunctionInvocationFilter | Every function call (prompt and plugin) | Caching, logging, rate limiting |
| Prompt render | IPromptRenderFilter | Prompt template rendering before the model call | PII removal, injection detection |
| Auto function invocation | IAutoFunctionInvocationFilter | Tool calls triggered by FunctionChoiceBehavior.Auto() | Observability, cost tracking, approval gates |
IFunctionInvocationFilter is the workhorse. It fires for both AI prompt invocations (kernel.InvokePromptAsync) and [KernelFunction] plugin calls. If you only add one filter, make it this one.
IPromptRenderFilter fires before the fully rendered prompt string is sent to the model. It runs after template variable substitution, so context.RenderedPrompt contains the final text — useful for detecting injected instructions or scrubbing PII that slipped in through user-supplied variables.
IAutoFunctionInvocationFilter only fires when the model itself decides to call a function via function calling (FunctionChoiceBehavior.Auto()). Use it when you need an approval gate before the model executes tool calls, or when tracking which functions the model chose and how often.
Registration Patterns
SK picks up filters from two places: the host DI container and the kernel’s own filter collections.
// Option A: DI registration (preferred for production — supports constructor injection)
builder.Services.AddSingleton<IFunctionInvocationFilter, CacheFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, LoggingFilter>();
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);
// Filters registered in DI are picked up automatically by AddKernel()
// Option B: Kernel-level (useful in scripts or tests)
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey)
.Build();
kernel.FunctionInvocationFilters.Add(new LoggingFilter(logger));
The DI approach (Option A) is strongly preferred for production. Constructor injection means your filters get their ILogger, IMemoryCache, or IContentSafetyService automatically — no manual wiring. Option B requires you to construct filters by hand, which becomes error-prone as filter dependencies grow.
Pattern 1 — Caching Filter
Semantic caching is one of the highest-leverage cost optimizations you can apply to a Semantic Kernel app. The idea: if the same arguments produce the same result repeatedly, skip the AI call and return from cache.
The short-circuit mechanism is key: set context.Result and return without calling next(context). The function body never runs.
using Microsoft.SemanticKernel;
using Microsoft.Extensions.Caching.Memory;
public class SemanticCacheFilter : IFunctionInvocationFilter
{
private readonly IMemoryCache _cache;
private readonly TimeSpan _ttl;
public SemanticCacheFilter(IMemoryCache cache, TimeSpan? ttl = null)
{
_cache = cache;
_ttl = ttl ?? TimeSpan.FromMinutes(30);
}
public async Task OnFunctionInvocationAsync(
FunctionInvocationContext context,
Func<FunctionInvocationContext, Task> next)
{
// Only cache AI prompt functions
if (context.Function.PluginName != "Prompts")
{
await next(context);
return;
}
var cacheKey = $"{context.Function.PluginName}:{context.Function.Name}:" +
string.Join("|", context.Arguments
.OrderBy(a => a.Key)
.Select(a => $"{a.Key}={a.Value}"));
if (_cache.TryGetValue(cacheKey, out string? cached))
{
// Short-circuit — skip the AI call entirely
context.Result = new FunctionResult(context.Function, cached);
return;
}
await next(context);
if (context.Result.GetValue<string>() is string result)
{
_cache.Set(cacheKey, result, _ttl);
}
}
}
This filter gates on PluginName == "Prompts" so it only caches semantic functions, not every plugin call. Adjust the predicate to match your plugin naming convention.
The cache key includes the function identity and all arguments in sorted order. Sorting by key ensures {input: "hello", lang: "en"} and {lang: "en", input: "hello"} produce the same key.
Pattern 2 — Structured Logging Filter
Structured logging in a filter gives you consistent telemetry across every AI call without touching your plugin code. Log before calling next() to capture the invocation start, and log after to capture latency and outcome.
using Microsoft.SemanticKernel;
using Microsoft.Extensions.Logging;
using System.Diagnostics;
public class StructuredLoggingFilter : IFunctionInvocationFilter
{
private readonly ILogger<StructuredLoggingFilter> _logger;
public StructuredLoggingFilter(ILogger<StructuredLoggingFilter> logger)
{
_logger = logger;
}
public async Task OnFunctionInvocationAsync(
FunctionInvocationContext context,
Func<FunctionInvocationContext, Task> next)
{
var sw = Stopwatch.StartNew();
_logger.LogInformation(
"Invoking {Plugin}.{Function} with {ArgCount} arguments",
context.Function.PluginName,
context.Function.Name,
context.Arguments.Count);
try
{
await next(context);
_logger.LogInformation(
"Completed {Plugin}.{Function} in {ElapsedMs}ms",
context.Function.PluginName,
context.Function.Name,
sw.ElapsedMilliseconds);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Failed {Plugin}.{Function} after {ElapsedMs}ms",
context.Function.PluginName,
context.Function.Name,
sw.ElapsedMilliseconds);
throw;
}
}
}
Using structured log properties ({Plugin}, {Function}, {ElapsedMs}) means Application Insights and Seq can index and query these values. A query like where Function == "SummarizeDocument" | summarize avg(ElapsedMs) gives you p50/p95 latency per function with no extra instrumentation.
Always re-throw after logging the error. The filter pipeline expects exceptions to propagate — swallowing them here would cause the caller to receive a null result with no indication of failure.
Pattern 3 — Content Safety Pre-Screen Filter
Content safety filtering belongs at the infrastructure layer, not inside individual plugins. A filter ensures every function that accepts user input is screened consistently.
public class ContentSafetyFilter : IFunctionInvocationFilter
{
private readonly IContentSafetyService _safetyService;
private readonly ILogger<ContentSafetyFilter> _logger;
public ContentSafetyFilter(
IContentSafetyService safetyService,
ILogger<ContentSafetyFilter> logger)
{
_safetyService = safetyService;
_logger = logger;
}
public async Task OnFunctionInvocationAsync(
FunctionInvocationContext context,
Func<FunctionInvocationContext, Task> next)
{
// Check the user input argument for safety
if (context.Arguments.TryGetValue("input", out var input) && input is string userText)
{
var isSafe = await _safetyService.IsSafeAsync(userText);
if (!isSafe)
{
_logger.LogWarning(
"Content safety check failed for {Plugin}.{Function}",
context.Function.PluginName, context.Function.Name);
// Short-circuit with a safe response — don't call the AI
context.Result = new FunctionResult(context.Function,
"I can't help with that request.");
return;
}
}
await next(context);
}
}
The filter checks the input argument by name. If your plugins use different argument names for user-supplied text, adjust the TryGetValue key or check multiple argument names. You can also iterate context.Arguments and screen all string-typed values if you prefer blanket coverage.
Returning a short-circuited FunctionResult with a safe refusal message is cleaner than throwing an exception — the caller gets a coherent response and you avoid unhandled exception logic higher in the stack.
Pattern 4 — Per-User Rate Limiting Filter
Rate limiting in a filter prevents a single user from burning your token budget. .NET 7+ System.Threading.RateLimiting provides a production-ready token bucket implementation.
using System.Threading.RateLimiting;
using System.Collections.Concurrent;
public class UserRateLimitFilter : IFunctionInvocationFilter
{
private readonly ConcurrentDictionary<string, TokenBucketRateLimiter> _limiters = new();
public async Task OnFunctionInvocationAsync(
FunctionInvocationContext context,
Func<FunctionInvocationContext, Task> next)
{
// Get user ID from arguments (adjust key name to match your app)
if (!context.Arguments.TryGetValue("userId", out var userIdObj) ||
userIdObj is not string userId)
{
await next(context);
return;
}
var limiter = _limiters.GetOrAdd(userId, _ => new TokenBucketRateLimiter(
new TokenBucketRateLimiterOptions
{
TokenLimit = 10,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 10,
AutoReplenishment = true,
QueueLimit = 0
}));
using var lease = await limiter.AcquireAsync(1);
if (!lease.IsAcquired)
{
throw new InvalidOperationException(
$"Rate limit exceeded for user {userId}. Try again in a moment.");
}
await next(context);
}
}
The ConcurrentDictionary grows unbounded if users are not evicted. For production, back this with a distributed cache (Redis) or use ASP.NET Core’s built-in RateLimiterMiddleware at the HTTP layer and pass the rate limit decision in as an argument. The filter approach is useful when you need per-function rate limits rather than per-endpoint limits.
Pattern 5 — IPromptRenderFilter for PII Detection
IPromptRenderFilter fires after template variables are substituted but before the prompt is sent to the model. This is the right place to detect PII that users injected through form inputs.
using Microsoft.SemanticKernel;
public class PiiDetectionFilter : IPromptRenderFilter
{
private readonly ILogger<PiiDetectionFilter> _logger;
private static readonly string[] _piiPatterns =
["\\b\\d{3}-\\d{2}-\\d{4}\\b", // SSN
"\\b\\d{16}\\b"]; // Credit card
public PiiDetectionFilter(ILogger<PiiDetectionFilter> logger)
{
_logger = logger;
}
public async Task OnPromptRenderAsync(
PromptRenderContext context,
Func<PromptRenderContext, Task> next)
{
await next(context);
// context.RenderedPrompt is available after next()
if (ContainsPii(context.RenderedPrompt))
{
_logger.LogWarning(
"Potential PII detected in prompt for {Plugin}.{Function}",
context.Function.PluginName, context.Function.Name);
// Optionally redact or throw — here we just log
}
}
private static bool ContainsPii(string? prompt)
{
if (prompt is null) return false;
return _piiPatterns.Any(p => System.Text.RegularExpressions.Regex.IsMatch(prompt, p));
}
}
Call await next(context) first, then read context.RenderedPrompt. The prompt is only populated after rendering completes. To modify the prompt before it reaches the model, set context.RenderedPrompt before calling next().
Note that IPromptRenderFilter fires for every prompt render in the kernel, including internal SK templates used for function schema generation and planning. To avoid log spam, always check context.Function.PluginName and only act on your own plugin prompts.
Composing Multiple Filters
Filters execute in an onion model: the first registered filter wraps all the others. Before-next() code runs in registration order; after-next() code runs in reverse. This mirrors ASP.NET Core middleware — the outermost filter sees the request first and the response last.
// Registration order determines execution order
builder.Services.AddSingleton<IFunctionInvocationFilter, StructuredLoggingFilter>(); // Outermost
builder.Services.AddSingleton<IFunctionInvocationFilter, UserRateLimitFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, ContentSafetyFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, SemanticCacheFilter>(); // Innermost
builder.Services.AddMemoryCache();
builder.Services.AddKernel()
.AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);
Execution flow:
LoggingFilter (before)
→ RateLimitFilter (before)
→ ContentSafetyFilter (before)
→ CacheFilter (before) → Function → CacheFilter (after)
← ContentSafetyFilter (after)
← RateLimitFilter (after)
← LoggingFilter (after)
Register StructuredLoggingFilter outermost so it captures the total latency of all filters plus the function, including cache hits. If you registered logging innermost, a cache hit would show near-zero latency and you’d lose the visibility into the filter overhead.
Register SemanticCacheFilter innermost so the rate limiter and content safety check run even on cached requests. This ensures a user can’t bypass safety checks by hitting the cache.
Testing Filter Ordering
The easiest way to verify execution order in unit tests is to collect a trace list from each filter:
var trace = new List<string>();
var kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey)
.Build();
kernel.FunctionInvocationFilters.Add(new TraceFilter(trace, "outer"));
kernel.FunctionInvocationFilters.Add(new TraceFilter(trace, "inner"));
await kernel.InvokePromptAsync("hello");
// Expected: ["outer-before", "inner-before", "inner-after", "outer-after"]
This pattern is useful during development to confirm your filter stack behaves as designed before adding real logic.
IAutoFunctionInvocationFilter for Tool Call Observability
When using FunctionChoiceBehavior.Auto(), the model decides which functions to call. IAutoFunctionInvocationFilter lets you observe or gate those model-driven calls separately from manually invoked functions.
public class AutoInvokeObservabilityFilter : IAutoFunctionInvocationFilter
{
private readonly ILogger<AutoInvokeObservabilityFilter> _logger;
public AutoInvokeObservabilityFilter(ILogger<AutoInvokeObservabilityFilter> logger)
{
_logger = logger;
}
public async Task OnAutoFunctionInvocationAsync(
AutoFunctionInvocationContext context,
Func<AutoFunctionInvocationContext, Task> next)
{
_logger.LogInformation(
"Model auto-invoking {Plugin}.{Function} (iteration {Iteration})",
context.Function.PluginName,
context.Function.Name,
context.FunctionSequenceIndex);
await next(context);
// Set Terminate = true to stop the auto-invoke loop after this call
// context.Terminate = true;
}
}
context.FunctionSequenceIndex tells you which function in the current auto-invoke sequence this is — useful for detecting runaway loops where the model keeps calling tools without converging. Set context.Terminate = true to force the loop to stop after the current function returns.
Choosing Between Filter Types: Decision Guide
- Need to intercept every AI call and plugin call? →
IFunctionInvocationFilter - Need to inspect or modify the prompt text before the model sees it? →
IPromptRenderFilter - Need to gate or observe only model-driven tool calls? →
IAutoFunctionInvocationFilter - Need all three? → Register all three. They are independent filter chains and do not conflict.
For most production applications, you will register at minimum a StructuredLoggingFilter and a ContentSafetyFilter as IFunctionInvocationFilter. Add IPromptRenderFilter when you have prompt templates that include user-supplied variables. Add IAutoFunctionInvocationFilter when you use FunctionChoiceBehavior.Auto() with tools.
Further Reading
- SK Filters documentation
- University: AI Cost Optimization for .NET Developers
- University: OpenTelemetry for AI Applications in .NET