Skip to main content

Semantic Kernel Filters in C#: Caching, Logging, Content Safety

Intermediate Original .NET 9 Microsoft.SemanticKernel 1.54.0
By Rajesh Mishra · Mar 21, 2026 · 14 min read
Verified Mar 2026 .NET 9 Microsoft.SemanticKernel 1.54.0
In 30 Seconds

Semantic Kernel has three filter types: IFunctionInvocationFilter (wraps all function calls), IPromptRenderFilter (wraps prompt rendering), and IAutoFunctionInvocationFilter (wraps auto-invoked tool calls). Register filters via DI with services.AddSingleton<IFunctionInvocationFilter, MyFilter>() and AddKernel(). Common patterns: caching (short-circuit with context.Result), structured logging (log before/after next()), content safety (call safety API before next()), and token usage tracking (read context.Result.Metadata after next()).

Semantic Kernel’s filter system gives you a structured way to intercept every function call, prompt render, and auto-invoked tool call in your AI pipeline. Think of it as ASP.NET Core middleware but for your AI layer — the same onion model, the same short-circuit semantics, and the same dependency injection support.

This guide covers all three filter types, five production-ready patterns, and the composition model for stacking filters without surprises.

The Three SK Filter Types

SK 1.x ships three filter interfaces. Choosing the right one determines what you intercept and when.

FilterInterfaceWrapsUse For
Function invocationIFunctionInvocationFilterEvery function call (prompt and plugin)Caching, logging, rate limiting
Prompt renderIPromptRenderFilterPrompt template rendering before the model callPII removal, injection detection
Auto function invocationIAutoFunctionInvocationFilterTool calls triggered by FunctionChoiceBehavior.Auto()Observability, cost tracking, approval gates

IFunctionInvocationFilter is the workhorse. It fires for both AI prompt invocations (kernel.InvokePromptAsync) and [KernelFunction] plugin calls. If you only add one filter, make it this one.

IPromptRenderFilter fires before the fully rendered prompt string is sent to the model. It runs after template variable substitution, so context.RenderedPrompt contains the final text — useful for detecting injected instructions or scrubbing PII that slipped in through user-supplied variables.

IAutoFunctionInvocationFilter only fires when the model itself decides to call a function via function calling (FunctionChoiceBehavior.Auto()). Use it when you need an approval gate before the model executes tool calls, or when tracking which functions the model chose and how often.

Registration Patterns

SK picks up filters from two places: the host DI container and the kernel’s own filter collections.

// Option A: DI registration (preferred for production — supports constructor injection)
builder.Services.AddSingleton<IFunctionInvocationFilter, CacheFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, LoggingFilter>();
builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);
// Filters registered in DI are picked up automatically by AddKernel()

// Option B: Kernel-level (useful in scripts or tests)
var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey)
    .Build();
kernel.FunctionInvocationFilters.Add(new LoggingFilter(logger));

The DI approach (Option A) is strongly preferred for production. Constructor injection means your filters get their ILogger, IMemoryCache, or IContentSafetyService automatically — no manual wiring. Option B requires you to construct filters by hand, which becomes error-prone as filter dependencies grow.

Pattern 1 — Caching Filter

Semantic caching is one of the highest-leverage cost optimizations you can apply to a Semantic Kernel app. The idea: if the same arguments produce the same result repeatedly, skip the AI call and return from cache.

The short-circuit mechanism is key: set context.Result and return without calling next(context). The function body never runs.

using Microsoft.SemanticKernel;
using Microsoft.Extensions.Caching.Memory;

public class SemanticCacheFilter : IFunctionInvocationFilter
{
    private readonly IMemoryCache _cache;
    private readonly TimeSpan _ttl;

    public SemanticCacheFilter(IMemoryCache cache, TimeSpan? ttl = null)
    {
        _cache = cache;
        _ttl = ttl ?? TimeSpan.FromMinutes(30);
    }

    public async Task OnFunctionInvocationAsync(
        FunctionInvocationContext context,
        Func<FunctionInvocationContext, Task> next)
    {
        // Only cache AI prompt functions
        if (context.Function.PluginName != "Prompts")
        {
            await next(context);
            return;
        }

        var cacheKey = $"{context.Function.PluginName}:{context.Function.Name}:" +
            string.Join("|", context.Arguments
                .OrderBy(a => a.Key)
                .Select(a => $"{a.Key}={a.Value}"));

        if (_cache.TryGetValue(cacheKey, out string? cached))
        {
            // Short-circuit — skip the AI call entirely
            context.Result = new FunctionResult(context.Function, cached);
            return;
        }

        await next(context);

        if (context.Result.GetValue<string>() is string result)
        {
            _cache.Set(cacheKey, result, _ttl);
        }
    }
}

This filter gates on PluginName == "Prompts" so it only caches semantic functions, not every plugin call. Adjust the predicate to match your plugin naming convention.

The cache key includes the function identity and all arguments in sorted order. Sorting by key ensures {input: "hello", lang: "en"} and {lang: "en", input: "hello"} produce the same key.

Pattern 2 — Structured Logging Filter

Structured logging in a filter gives you consistent telemetry across every AI call without touching your plugin code. Log before calling next() to capture the invocation start, and log after to capture latency and outcome.

using Microsoft.SemanticKernel;
using Microsoft.Extensions.Logging;
using System.Diagnostics;

public class StructuredLoggingFilter : IFunctionInvocationFilter
{
    private readonly ILogger<StructuredLoggingFilter> _logger;

    public StructuredLoggingFilter(ILogger<StructuredLoggingFilter> logger)
    {
        _logger = logger;
    }

    public async Task OnFunctionInvocationAsync(
        FunctionInvocationContext context,
        Func<FunctionInvocationContext, Task> next)
    {
        var sw = Stopwatch.StartNew();

        _logger.LogInformation(
            "Invoking {Plugin}.{Function} with {ArgCount} arguments",
            context.Function.PluginName,
            context.Function.Name,
            context.Arguments.Count);

        try
        {
            await next(context);

            _logger.LogInformation(
                "Completed {Plugin}.{Function} in {ElapsedMs}ms",
                context.Function.PluginName,
                context.Function.Name,
                sw.ElapsedMilliseconds);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex,
                "Failed {Plugin}.{Function} after {ElapsedMs}ms",
                context.Function.PluginName,
                context.Function.Name,
                sw.ElapsedMilliseconds);
            throw;
        }
    }
}

Using structured log properties ({Plugin}, {Function}, {ElapsedMs}) means Application Insights and Seq can index and query these values. A query like where Function == "SummarizeDocument" | summarize avg(ElapsedMs) gives you p50/p95 latency per function with no extra instrumentation.

Always re-throw after logging the error. The filter pipeline expects exceptions to propagate — swallowing them here would cause the caller to receive a null result with no indication of failure.

Pattern 3 — Content Safety Pre-Screen Filter

Content safety filtering belongs at the infrastructure layer, not inside individual plugins. A filter ensures every function that accepts user input is screened consistently.

public class ContentSafetyFilter : IFunctionInvocationFilter
{
    private readonly IContentSafetyService _safetyService;
    private readonly ILogger<ContentSafetyFilter> _logger;

    public ContentSafetyFilter(
        IContentSafetyService safetyService,
        ILogger<ContentSafetyFilter> logger)
    {
        _safetyService = safetyService;
        _logger = logger;
    }

    public async Task OnFunctionInvocationAsync(
        FunctionInvocationContext context,
        Func<FunctionInvocationContext, Task> next)
    {
        // Check the user input argument for safety
        if (context.Arguments.TryGetValue("input", out var input) && input is string userText)
        {
            var isSafe = await _safetyService.IsSafeAsync(userText);

            if (!isSafe)
            {
                _logger.LogWarning(
                    "Content safety check failed for {Plugin}.{Function}",
                    context.Function.PluginName, context.Function.Name);

                // Short-circuit with a safe response — don't call the AI
                context.Result = new FunctionResult(context.Function,
                    "I can't help with that request.");
                return;
            }
        }

        await next(context);
    }
}

The filter checks the input argument by name. If your plugins use different argument names for user-supplied text, adjust the TryGetValue key or check multiple argument names. You can also iterate context.Arguments and screen all string-typed values if you prefer blanket coverage.

Returning a short-circuited FunctionResult with a safe refusal message is cleaner than throwing an exception — the caller gets a coherent response and you avoid unhandled exception logic higher in the stack.

Pattern 4 — Per-User Rate Limiting Filter

Rate limiting in a filter prevents a single user from burning your token budget. .NET 7+ System.Threading.RateLimiting provides a production-ready token bucket implementation.

using System.Threading.RateLimiting;
using System.Collections.Concurrent;

public class UserRateLimitFilter : IFunctionInvocationFilter
{
    private readonly ConcurrentDictionary<string, TokenBucketRateLimiter> _limiters = new();

    public async Task OnFunctionInvocationAsync(
        FunctionInvocationContext context,
        Func<FunctionInvocationContext, Task> next)
    {
        // Get user ID from arguments (adjust key name to match your app)
        if (!context.Arguments.TryGetValue("userId", out var userIdObj) ||
            userIdObj is not string userId)
        {
            await next(context);
            return;
        }

        var limiter = _limiters.GetOrAdd(userId, _ => new TokenBucketRateLimiter(
            new TokenBucketRateLimiterOptions
            {
                TokenLimit = 10,
                ReplenishmentPeriod = TimeSpan.FromMinutes(1),
                TokensPerPeriod = 10,
                AutoReplenishment = true,
                QueueLimit = 0
            }));

        using var lease = await limiter.AcquireAsync(1);
        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException(
                $"Rate limit exceeded for user {userId}. Try again in a moment.");
        }

        await next(context);
    }
}

The ConcurrentDictionary grows unbounded if users are not evicted. For production, back this with a distributed cache (Redis) or use ASP.NET Core’s built-in RateLimiterMiddleware at the HTTP layer and pass the rate limit decision in as an argument. The filter approach is useful when you need per-function rate limits rather than per-endpoint limits.

Pattern 5 — IPromptRenderFilter for PII Detection

IPromptRenderFilter fires after template variables are substituted but before the prompt is sent to the model. This is the right place to detect PII that users injected through form inputs.

using Microsoft.SemanticKernel;

public class PiiDetectionFilter : IPromptRenderFilter
{
    private readonly ILogger<PiiDetectionFilter> _logger;
    private static readonly string[] _piiPatterns =
        ["\\b\\d{3}-\\d{2}-\\d{4}\\b", // SSN
         "\\b\\d{16}\\b"];               // Credit card

    public PiiDetectionFilter(ILogger<PiiDetectionFilter> logger)
    {
        _logger = logger;
    }

    public async Task OnPromptRenderAsync(
        PromptRenderContext context,
        Func<PromptRenderContext, Task> next)
    {
        await next(context);

        // context.RenderedPrompt is available after next()
        if (ContainsPii(context.RenderedPrompt))
        {
            _logger.LogWarning(
                "Potential PII detected in prompt for {Plugin}.{Function}",
                context.Function.PluginName, context.Function.Name);

            // Optionally redact or throw — here we just log
        }
    }

    private static bool ContainsPii(string? prompt)
    {
        if (prompt is null) return false;
        return _piiPatterns.Any(p => System.Text.RegularExpressions.Regex.IsMatch(prompt, p));
    }
}

Call await next(context) first, then read context.RenderedPrompt. The prompt is only populated after rendering completes. To modify the prompt before it reaches the model, set context.RenderedPrompt before calling next().

Note that IPromptRenderFilter fires for every prompt render in the kernel, including internal SK templates used for function schema generation and planning. To avoid log spam, always check context.Function.PluginName and only act on your own plugin prompts.

Composing Multiple Filters

Filters execute in an onion model: the first registered filter wraps all the others. Before-next() code runs in registration order; after-next() code runs in reverse. This mirrors ASP.NET Core middleware — the outermost filter sees the request first and the response last.

// Registration order determines execution order
builder.Services.AddSingleton<IFunctionInvocationFilter, StructuredLoggingFilter>(); // Outermost
builder.Services.AddSingleton<IFunctionInvocationFilter, UserRateLimitFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, ContentSafetyFilter>();
builder.Services.AddSingleton<IFunctionInvocationFilter, SemanticCacheFilter>();     // Innermost

builder.Services.AddMemoryCache();
builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey);

Execution flow:

LoggingFilter (before)
  → RateLimitFilter (before)
    → ContentSafetyFilter (before)
      → CacheFilter (before) → Function → CacheFilter (after)
    ← ContentSafetyFilter (after)
  ← RateLimitFilter (after)
← LoggingFilter (after)

Register StructuredLoggingFilter outermost so it captures the total latency of all filters plus the function, including cache hits. If you registered logging innermost, a cache hit would show near-zero latency and you’d lose the visibility into the filter overhead.

Register SemanticCacheFilter innermost so the rate limiter and content safety check run even on cached requests. This ensures a user can’t bypass safety checks by hitting the cache.

Testing Filter Ordering

The easiest way to verify execution order in unit tests is to collect a trace list from each filter:

var trace = new List<string>();

var kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(deployment, endpoint, apiKey)
    .Build();

kernel.FunctionInvocationFilters.Add(new TraceFilter(trace, "outer"));
kernel.FunctionInvocationFilters.Add(new TraceFilter(trace, "inner"));

await kernel.InvokePromptAsync("hello");

// Expected: ["outer-before", "inner-before", "inner-after", "outer-after"]

This pattern is useful during development to confirm your filter stack behaves as designed before adding real logic.

IAutoFunctionInvocationFilter for Tool Call Observability

When using FunctionChoiceBehavior.Auto(), the model decides which functions to call. IAutoFunctionInvocationFilter lets you observe or gate those model-driven calls separately from manually invoked functions.

public class AutoInvokeObservabilityFilter : IAutoFunctionInvocationFilter
{
    private readonly ILogger<AutoInvokeObservabilityFilter> _logger;

    public AutoInvokeObservabilityFilter(ILogger<AutoInvokeObservabilityFilter> logger)
    {
        _logger = logger;
    }

    public async Task OnAutoFunctionInvocationAsync(
        AutoFunctionInvocationContext context,
        Func<AutoFunctionInvocationContext, Task> next)
    {
        _logger.LogInformation(
            "Model auto-invoking {Plugin}.{Function} (iteration {Iteration})",
            context.Function.PluginName,
            context.Function.Name,
            context.FunctionSequenceIndex);

        await next(context);

        // Set Terminate = true to stop the auto-invoke loop after this call
        // context.Terminate = true;
    }
}

context.FunctionSequenceIndex tells you which function in the current auto-invoke sequence this is — useful for detecting runaway loops where the model keeps calling tools without converging. Set context.Terminate = true to force the loop to stop after the current function returns.

Choosing Between Filter Types: Decision Guide

  • Need to intercept every AI call and plugin call?IFunctionInvocationFilter
  • Need to inspect or modify the prompt text before the model sees it?IPromptRenderFilter
  • Need to gate or observe only model-driven tool calls?IAutoFunctionInvocationFilter
  • Need all three? → Register all three. They are independent filter chains and do not conflict.

For most production applications, you will register at minimum a StructuredLoggingFilter and a ContentSafetyFilter as IFunctionInvocationFilter. Add IPromptRenderFilter when you have prompt templates that include user-supplied variables. Add IAutoFunctionInvocationFilter when you use FunctionChoiceBehavior.Auto() with tools.

Further Reading

⚠ Production Considerations

  • Filters registered via kernel.FunctionInvocationFilters.Add() are per-kernel-instance. In a Scoped kernel DI setup, you need to re-add filters for each new kernel scope. Register via DI services.AddSingleton<IFunctionInvocationFilter, MyFilter>() instead to ensure filters are always present regardless of how the Kernel is resolved.
  • IPromptRenderFilter fires for every prompt render, including internal SK prompt templates used for planning and tool schema generation. Logging every prompt render can generate significant log volume. Filter by context.Function.PluginName to target only your application's prompts.

Enjoying this article?

Get weekly .NET + AI insights delivered to your inbox. No spam.

Subscribe Free →

🧠 Architect’s Note

SK filters are the right place for cross-cutting concerns: logging, caching, rate limiting, and content safety. Keep your business logic in [KernelFunction] plugins. This separation lets you swap or A/B test filter implementations without touching plugin code — and it makes testing plugins independently of infrastructure concerns much simpler.

AI-Friendly Summary

Summary

Semantic Kernel has three filter types: IFunctionInvocationFilter (wraps all function calls), IPromptRenderFilter (wraps prompt rendering), and IAutoFunctionInvocationFilter (wraps auto-invoked tool calls). Register filters via DI with services.AddSingleton<IFunctionInvocationFilter, MyFilter>() and AddKernel(). Common patterns: caching (short-circuit with context.Result), structured logging (log before/after next()), content safety (call safety API before next()), and token usage tracking (read context.Result.Metadata after next()).

Key Takeaways

  • Three filter types: IFunctionInvocationFilter, IPromptRenderFilter, IAutoFunctionInvocationFilter
  • Short-circuit execution: set context.Result and return WITHOUT calling next(context)
  • Register via DI: services.AddSingleton<IFunctionInvocationFilter, MyFilter>()
  • Filters run in registration order (before) and reverse order (after)
  • IPromptRenderFilter can read and modify context.RenderedPrompt before model call

Implementation Checklist

  • Choose the right filter type for your use case (function vs prompt render vs auto-invoke)
  • Register filters via DI for constructor injection support
  • For caching: check cache before next(context), set context.Result to skip execution
  • For logging: log before and after next(context) to capture args and result
  • For content safety: call safety API before next(context), throw if flagged
  • Test filter ordering by registering filters in different sequences

Frequently Asked Questions

What are the three filter types in Semantic Kernel?

IFunctionInvocationFilter wraps every function call (both AI calls and [KernelFunction] plugins). IPromptRenderFilter wraps the prompt template rendering step before the prompt is sent to the model. IAutoFunctionInvocationFilter wraps only the automatic function calls triggered by FunctionChoiceBehavior.Auto().

How do I register a Semantic Kernel filter in ASP.NET Core?

Two ways: (1) Kernel-level — kernel.FunctionInvocationFilters.Add(new MyFilter()); or (2) DI-level — builder.Services.AddSingleton<IFunctionInvocationFilter, MyFilter>(), which is picked up automatically by AddKernel(). DI registration is preferred for production as it supports constructor injection.

How do I short-circuit a function call from a filter without executing it?

Set context.Result before returning, without calling next(context). Example: if the cache has a hit, set context.Result = new FunctionResult(context.Function, cachedValue) and return. The function body never executes. This is the mechanism for both caching and content safety pre-screening.

What is the correct IFunctionInvocationFilter interface signature?

Task OnFunctionInvocationAsync(FunctionInvocationContext context, Func<FunctionInvocationContext, Task> next). Always await next(context) to execute the function. Access function metadata via context.Function.Name and context.Function.PluginName. Access arguments via context.Arguments. Access the result after next() via context.Result.

Can I use IPromptRenderFilter to modify the prompt before it is sent to the model?

Yes. In OnPromptRenderAsync, access context.RenderedPrompt after calling await next(context) to read the fully rendered prompt. You can also set context.RenderedPrompt before calling next(context) to modify what gets sent. Use this for prompt injection detection or PII removal.

How do I read token usage from a filter after an AI call?

After await next(context), read context.Result.Metadata. For chat completions, the metadata may contain a 'Usage' key. Cast it to CompletionUsage or read context.Result.GetValue<ChatMessageContent>() and access its Metadata for token counts. Alternatively, read from OpenTelemetry spans.

What is the execution order when multiple filters are registered?

Filters execute in registration order for the 'before' phase and in reverse order for the 'after' phase — like an onion model. The first registered filter is the outermost wrapper. Register logging first and caching second so logging captures cache hits too.

Track your progress through this learning path.

You Might Also Enjoy

Was this article useful?

Feedback is anonymous and helps us improve content quality.

Discussion

Engineering discussion powered by GitHub Discussions.

#Semantic Kernel #Filters #Middleware #IFunctionInvocationFilter #.NET AI