What is prompt chaining?

Prompt chaining is a workflow pattern where the output of one LLM call becomes the input to the next. Each step in the chain performs a focused task — extract, transform, evaluate, format — and passes its result forward. This decomposition produces more reliable results than asking a single prompt to handle everything at once.

How do I build multi-step AI workflows in .NET?

Use Semantic Kernel's function pipeline to chain kernel functions together, or build custom orchestration with standard C# async/await. For complex workflows, combine Semantic Kernel plugins with conditional logic, error gates, and retry policies. Azure Durable Functions adds durability for long-running or stateful AI workflows.

When should I use Azure Durable Functions with AI?

Use Durable Functions when your AI workflow is long-running (minutes to hours), requires human approval checkpoints, needs guaranteed completion despite transient failures, or involves expensive multi-step processing where you cannot afford to restart from scratch. The orchestrator's automatic checkpointing handles replay and recovery.

How do I test non-deterministic AI workflows?

Test at three levels: unit test individual steps with mocked LLM responses to verify your orchestration logic, integration test with real LLM calls using assertion ranges rather than exact matches, and use LLM-as-judge patterns where a separate model evaluates output quality against rubrics. Pin model versions and use low temperature settings to reduce variance during testing.

Designing AI Workflows — Orchestration Patterns for .NET Applications

Single-Shot vs Multi-Step Workflows

The simplest AI integration is a single-shot prompt. User provides input, you call the model, you return the response. For many use cases — summarization, translation, simple Q&A — single-shot works fine. But the moment your requirements involve multiple reasoning steps, conditional logic, or quality validation, a single prompt starts to buckle.

Consider generating a product description. A single prompt might produce acceptable output. But generating a product description that follows brand guidelines, includes SEO keywords, avoids competitor mentions, reads naturally in the target market’s dialect, and fits within a 150-word limit? That’s five distinct concerns. A single prompt trying to satisfy all of them will reliably fail at least one.

Multi-step workflows decompose complex tasks into focused stages. Each stage does one thing well, validates its output, and passes a clean result to the next stage. The result is more reliable, more debuggable, and more maintainable than a monolithic prompt.

This article covers the core orchestration patterns you need for production AI workflows in .NET, with real C# implementations for each.

Pattern 1: Prompt Chaining

Prompt chaining is the foundational pattern. The output of one LLM call feeds directly into the next. Each step has a narrow focus and a well-defined contract.

Here is a three-step chain that researches a topic, drafts content, and then compresses it to a target length:

using Microsoft.SemanticKernel;

Kernel kernel = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(
        deploymentName: "gpt-4o",
        endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
        apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!)
    .Build();

// Step 1: Extract key facts
var extractFunction = kernel.CreateFunctionFromPrompt(
    """
    Extract the 5 most important technical facts about the following topic.
    Return them as a numbered list. Topic: {{$input}}
    """,
    new OpenAIPromptExecutionSettings { Temperature = 0.3f, MaxTokens = 500 });

// Step 2: Draft content from facts
var draftFunction = kernel.CreateFunctionFromPrompt(
    """
    Using these facts, write a concise technical paragraph suitable for a .NET developer audience.
    Facts: {{$input}}
    """,
    new OpenAIPromptExecutionSettings { Temperature = 0.7f, MaxTokens = 400 });

// Step 3: Compress to target length
var compressFunction = kernel.CreateFunctionFromPrompt(
    """
    Rewrite the following paragraph to be under 100 words while preserving all technical accuracy.
    Paragraph: {{$input}}
    """,
    new OpenAIPromptExecutionSettings { Temperature = 0.2f, MaxTokens = 200 });

// Execute the chain
string topic = "gRPC performance advantages over REST in .NET microservices";
var facts = await kernel.InvokeAsync(extractFunction, new() { ["input"] = topic });
var draft = await kernel.InvokeAsync(draftFunction, new() { ["input"] = facts.ToString() });
var final = await kernel.InvokeAsync(compressFunction, new() { ["input"] = draft.ToString() });

Console.WriteLine(final);

Notice the different temperature settings at each stage. Extraction benefits from low temperature (factual precision), drafting from moderate temperature (natural language flow), and compression from low temperature again (minimal creative deviation). This kind of per-step tuning is impossible with a single-shot approach.

Pattern 2: Sequential Pipeline with Error Gates

Prompt chaining becomes a pipeline when you add validation gates between stages. Gates check the intermediate output and decide whether to proceed, retry, or abort. This prevents garbage from one stage from polluting everything downstream.

public class AiPipeline
{
    private readonly Kernel _kernel;
    private readonly ILogger<AiPipeline> _logger;

    public AiPipeline(Kernel kernel, ILogger<AiPipeline> logger)
    {
        _kernel = kernel;
        _logger = logger;
    }

    public async Task<PipelineResult> ExecuteAsync(string input, CancellationToken ct = default)
    {
        // Stage 1: Classification
        var classifyResult = await ExecuteStageAsync("Classify", input,
            "Classify this customer message as: billing, technical, general, or spam. " +
            "Respond with only the category.", ct);

        if (!Validate(classifyResult, ["billing", "technical", "general", "spam"]))
        {
            _logger.LogWarning("Classification failed validation: {Result}", classifyResult);
            return PipelineResult.Failed("Classification produced invalid category.");
        }

        if (classifyResult == "spam")
        {
            return PipelineResult.Rejected("Message classified as spam.");
        }

        // Stage 2: Sentiment analysis
        var sentimentResult = await ExecuteStageAsync("Sentiment", input,
            "Rate the sentiment of this message as: positive, neutral, or negative. " +
            "Respond with only the sentiment.", ct);

        if (!Validate(sentimentResult, ["positive", "neutral", "negative"]))
        {
            _logger.LogWarning("Sentiment analysis failed validation: {Result}", sentimentResult);
            return PipelineResult.Failed("Sentiment analysis produced invalid result.");
        }

        // Stage 3: Generate response (uses classification and sentiment as context)
        string responsePrompt =
            $"Category: {classifyResult}\nSentiment: {sentimentResult}\n" +
            $"Original message: {input}\n\n" +
            "Write an appropriate customer support response.";

        var response = await ExecuteStageAsync("Response", responsePrompt,
            "You are a customer support agent. Write a helpful, professional response.", ct);

        return PipelineResult.Success(response, classifyResult, sentimentResult);
    }

    private async Task<string> ExecuteStageAsync(
        string stageName, string input, string prompt, CancellationToken ct)
    {
        _logger.LogInformation("Pipeline stage {Stage} starting", stageName);
        var function = _kernel.CreateFunctionFromPrompt(prompt);
        var result = await _kernel.InvokeAsync(function, new() { ["input"] = input }, ct);
        _logger.LogInformation("Pipeline stage {Stage} completed", stageName);
        return result.ToString().Trim().ToLowerInvariant();
    }

    private static bool Validate(string result, string[] allowedValues) =>
        allowedValues.Contains(result);
}

The key insight: each stage returns a constrained output that can be validated. Classification returns one of four categories. Sentiment returns one of three values. If validation fails, the pipeline stops cleanly rather than generating a response based on a misclassified message.

Pattern 3: Parallel Fan-Out / Fan-In

When multiple independent AI operations need to happen, running them sequentially wastes time. The fan-out/fan-in pattern dispatches concurrent requests and merges the results.

public async Task<ContentAnalysis> AnalyzeContentAsync(string content)
{
    // Fan-out: three independent analyses run concurrently
    var summaryTask = SummarizeAsync(content);
    var keywordsTask = ExtractKeywordsAsync(content);
    var sentimentTask = AnalyzeSentimentAsync(content);

    // Fan-in: await all results
    await Task.WhenAll(summaryTask, keywordsTask, sentimentTask);

    return new ContentAnalysis
    {
        Summary = summaryTask.Result,
        Keywords = keywordsTask.Result,
        Sentiment = sentimentTask.Result
    };
}

private async Task<string> SummarizeAsync(string content)
{
    var function = _kernel.CreateFunctionFromPrompt(
        "Summarize this text in 2-3 sentences: {{$input}}");
    var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
    return result.ToString();
}

private async Task<List<string>> ExtractKeywordsAsync(string content)
{
    var function = _kernel.CreateFunctionFromPrompt(
        "Extract the top 5 keywords from this text as a comma-separated list: {{$input}}");
    var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
    return result.ToString().Split(',').Select(k => k.Trim()).ToList();
}

private async Task<string> AnalyzeSentimentAsync(string content)
{
    var function = _kernel.CreateFunctionFromPrompt(
        "Analyze the sentiment of this text. Respond with: positive, neutral, or negative. {{$input}}");
    var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
    return result.ToString().Trim();
}

Three LLM calls that might take 2 seconds each now complete in roughly 2 seconds total instead of 6. The only requirement is that the operations are truly independent — no stage depends on the output of another.

You can also combine patterns. Fan-out the first three analyses in parallel, then chain their combined results into a final synthesis step.

Pattern 4: Human-in-the-Loop Checkpoints

For high-stakes workflows — generating customer communications, modifying data, publishing content — an automated pipeline needs human approval gates. The AI proposes; a human disposes.

public class HumanInTheLoopWorkflow
{
    private readonly Kernel _kernel;
    private readonly IApprovalService _approvalService;

    public HumanInTheLoopWorkflow(Kernel kernel, IApprovalService approvalService)
    {
        _kernel = kernel;
        _approvalService = approvalService;
    }

    public async Task<WorkflowResult> GenerateAndPublishAsync(
        string topic, string assigneeEmail, CancellationToken ct)
    {
        // Step 1: AI generates draft
        var draftFunction = _kernel.CreateFunctionFromPrompt(
            "Write a technical blog post about {{$input}}. Target audience: .NET developers.",
            new OpenAIPromptExecutionSettings { Temperature = 0.7f, MaxTokens = 1500 });

        var draft = await _kernel.InvokeAsync(draftFunction,
            new() { ["input"] = topic }, ct);

        // Step 2: Request human approval
        string approvalId = await _approvalService.RequestApprovalAsync(
            assignee: assigneeEmail,
            content: draft.ToString(),
            context: $"AI-generated blog post about: {topic}",
            ct);

        // Step 3: Wait for human decision
        ApprovalDecision decision = await _approvalService.WaitForDecisionAsync(
            approvalId, timeout: TimeSpan.FromHours(24), ct);

        return decision switch
        {
            { Approved: true } => await PublishAsync(draft.ToString(), ct),
            { Feedback: not null } => await ReviseAndResubmitAsync(
                draft.ToString(), decision.Feedback, assigneeEmail, ct),
            _ => WorkflowResult.Rejected("Approval denied without feedback.")
        };
    }

    private async Task<WorkflowResult> ReviseAndResubmitAsync(
        string original, string feedback, string assigneeEmail, CancellationToken ct)
    {
        var reviseFunction = _kernel.CreateFunctionFromPrompt(
            """
            Revise the following draft based on the editor's feedback.
            Original: {{$original}}
            Feedback: {{$feedback}}
            """);

        var revised = await _kernel.InvokeAsync(reviseFunction,
            new() { ["original"] = original, ["feedback"] = feedback }, ct);

        // Re-enter the approval loop with the revised version
        string approvalId = await _approvalService.RequestApprovalAsync(
            assignee: assigneeEmail,
            content: revised.ToString(),
            context: $"Revised draft based on feedback: {feedback}",
            ct);

        var decision = await _approvalService.WaitForDecisionAsync(
            approvalId, timeout: TimeSpan.FromHours(24), ct);

        return decision.Approved
            ? await PublishAsync(revised.ToString(), ct)
            : WorkflowResult.Rejected("Revision not approved.");
    }

    private Task<WorkflowResult> PublishAsync(string content, CancellationToken ct)
    {
        // Your publishing logic
        return Task.FromResult(WorkflowResult.Published(content));
    }
}

The critical design decision: the workflow pauses at the approval step. It does not proceed automatically. For long-running workflows like this, where human response time is measured in hours, you need durable execution.

Pattern 5: The Agent Loop

The agent loop is the most autonomous pattern. Instead of following a predetermined sequence, the agent dynamically decides what to do next based on observations. The cycle is: observe, plan, act, evaluate.

public class ResearchAgent
{
    private readonly Kernel _kernel;
    private readonly int _maxIterations;

    public ResearchAgent(Kernel kernel, int maxIterations = 5)
    {
        _kernel = kernel;
        _maxIterations = maxIterations;
    }

    public async Task<string> ResearchAsync(string question, CancellationToken ct = default)
    {
        var history = new List<string>();
        string currentGoal = question;

        for (int i = 0; i < _maxIterations; i++)
        {
            // Observe: gather current state
            string context = string.Join("\n", history);

            // Plan: decide next action
            var planFunction = _kernel.CreateFunctionFromPrompt(
                """
                Goal: {{$goal}}
                Research so far: {{$context}}

                What is the single most important thing to investigate next?
                If the research is sufficient to answer the goal, respond with: DONE
                Otherwise, describe the next research step in one sentence.
                """);

            var plan = await _kernel.InvokeAsync(planFunction,
                new() { ["goal"] = currentGoal, ["context"] = context }, ct);

            string nextStep = plan.ToString().Trim();

            if (nextStep.Contains("DONE", StringComparison.OrdinalIgnoreCase))
                break;

            // Act: execute the planned step (using tools via function calling)
            var actFunction = _kernel.CreateFunctionFromPrompt(
                "Research the following: {{$step}}. Provide factual findings.",
                new OpenAIPromptExecutionSettings
                {
                    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
                });

            var findings = await _kernel.InvokeAsync(actFunction,
                new() { ["step"] = nextStep }, ct);

            // Evaluate: record findings
            history.Add($"Step {i + 1}: {nextStep}\nFindings: {findings}");
        }

        // Synthesize final answer
        var synthesizeFunction = _kernel.CreateFunctionFromPrompt(
            """
            Based on the following research, provide a comprehensive answer to: {{$goal}}
            Research: {{$research}}
            """);

        var answer = await _kernel.InvokeAsync(synthesizeFunction,
            new() { ["goal"] = question, ["research"] = string.Join("\n\n", history) }, ct);

        return answer.ToString();
    }
}

The agent loop is powerful but carries risk. Without the _maxIterations guard, a poorly converging agent can loop indefinitely, burning tokens and time. Always set hard limits.

For fully realized agent orchestration with multiple agents collaborating, see our guide on Microsoft Agent Framework and multi-agent patterns.

Semantic Kernel Planner vs Custom Orchestration

Semantic Kernel includes planning capabilities where the AI model itself decides which functions to call and in what order. The recommended approach uses automatic function calling:

OpenAIPromptExecutionSettings settings = new()
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};

var result = await kernel.InvokePromptAsync(
    "Analyze the sales data, create a summary, and email it to the team.",
    new(settings));

Use SK’s automatic function calling when:

The workflow is exploratory — you don’t know the exact steps in advance
The model needs to decide which tools to use based on context
You have well-defined plugins and trust the model’s tool selection

Use custom orchestration when:

The workflow has a fixed, known sequence of steps
You need validation gates between stages
Error handling must follow specific business rules
You need guaranteed execution order for compliance or audit purposes
Latency budgets require parallel execution of independent steps

In practice, many production systems combine both. Custom orchestration handles the overall workflow structure, while individual stages within that workflow might use automatic function calling for flexible tool use.

Durable Execution with Azure Durable Functions

AI workflows that involve human approval, long-running processing, or multi-step operations that span minutes to hours need durable execution. Standard in-memory orchestration fails if the process restarts.

Azure Durable Functions provides exactly this. The orchestrator function’s state is automatically checkpointed, so it survives restarts, deployments, and transient failures.

[Function("AiContentWorkflow")]
public static async Task<string> RunOrchestrator(
    [OrchestrationTrigger] TaskOrchestrationContext context)
{
    string topic = context.GetInput<string>()!;

    // Step 1: Generate draft (calls an activity that invokes the LLM)
    string draft = await context.CallActivityAsync<string>(
        "GenerateDraft", topic);

    // Step 2: Quality check
    QualityResult quality = await context.CallActivityAsync<QualityResult>(
        "CheckQuality", draft);

    if (!quality.PassesThreshold)
    {
        // Retry with feedback
        draft = await context.CallActivityAsync<string>(
            "ReviseDraft", new RevisionInput(draft, quality.Feedback));
    }

    // Step 3: Human approval (waits for external event)
    bool approved = await context.WaitForExternalEvent<bool>(
        "ApprovalResult",
        timeout: TimeSpan.FromHours(48));

    if (!approved)
        return "Workflow rejected by reviewer.";

    // Step 4: Publish
    string url = await context.CallActivityAsync<string>("PublishContent", draft);
    return $"Published at: {url}";
}

[Function("GenerateDraft")]
public async Task<string> GenerateDraft(
    [ActivityTrigger] string topic, FunctionContext context)
{
    var function = _kernel.CreateFunctionFromPrompt(
        "Write a detailed technical article about: {{$input}}");
    var result = await _kernel.InvokeAsync(function, new() { ["input"] = topic });
    return result.ToString();
}

The WaitForExternalEvent call is the key differentiator. The orchestrator suspends, its state is persisted, and it resumes when the approval event arrives — whether that’s 5 minutes or 48 hours later. No long-running process, no polling, no state management code.

For workflows that combine AI processing with human review and multi-step approval chains, Durable Functions eliminates an entire class of reliability concerns. You can also integrate this approach with the patterns from our RAG chatbot workshop for end-to-end AI application architectures.

Testing AI Workflows

Non-deterministic systems require non-traditional testing strategies. Here is a three-level approach.

Level 1: Unit Tests with Mocked LLM

Test your orchestration logic without calling real AI services. Mock the kernel or the AI service to return predictable responses.

[Fact]
public async Task Pipeline_ClassifiesSpam_ReturnsRejected()
{
    // Arrange: mock the kernel to return "spam" for classification
    var mockKernel = CreateMockKernel(responses: new Dictionary<string, string>
    {
        ["Classify"] = "spam"
    });
    var pipeline = new AiPipeline(mockKernel, NullLogger<AiPipeline>.Instance);

    // Act
    var result = await pipeline.ExecuteAsync("Buy cheap watches now!!!");

    // Assert
    Assert.False(result.IsSuccess);
    Assert.Equal("Message classified as spam.", result.Reason);
}

This level tests that your pipeline logic — the gates, the routing, the error handling — works correctly. It says nothing about whether the AI model will actually classify correctly.

Level 2: Integration Tests with Real LLM

Call the real model, but use relaxed assertions. Don’t assert exact string equality — assert structural properties.

[Fact]
[Trait("Category", "Integration")]
public async Task SummarizeChain_ProducesOutput_WithinTokenBudget()
{
    var result = await _chain.SummarizeAsync(TestData.SampleArticle);

    Assert.NotNull(result);
    Assert.InRange(result.Split(' ').Length, 20, 100); // Word count range
    Assert.Contains(result, r => TestData.ExpectedKeyTerms.Any(
        term => r.Contains(term, StringComparison.OrdinalIgnoreCase)));
}

Level 3: LLM-as-Judge Evaluation

Use a separate model call to evaluate output quality against a rubric.

public async Task<EvaluationResult> EvaluateAsync(string output, string rubric)
{
    var function = _evaluationKernel.CreateFunctionFromPrompt(
        """
        Evaluate the following output against the rubric. Score 1-5.
        Output: {{$output}}
        Rubric: {{$rubric}}
        Respond as JSON: {"score": N, "reasoning": "..."}
        """,
        new OpenAIPromptExecutionSettings { Temperature = 0f });

    var result = await _evaluationKernel.InvokeAsync(function,
        new() { ["output"] = output, ["rubric"] = rubric });

    return JsonSerializer.Deserialize<EvaluationResult>(result.ToString())!;
}

Pin your model versions, use low temperature for evaluation, and run evaluation suites on a schedule rather than on every commit. AI testing is about statistical confidence, not binary pass/fail.

Choosing the Right Pattern

The patterns in this article form a toolkit, not a progression. Match the pattern to the problem:

Pattern	Best For	Trade-Off
Prompt chaining	Multi-step transformations	Sequential latency
Pipeline with gates	Quality-critical workflows	Added validation complexity
Parallel fan-out	Independent concurrent analyses	Higher burst API usage
Human-in-the-loop	High-stakes decisions	Workflow pauses for humans
Agent loop	Exploratory, open-ended tasks	Unpredictable cost and duration

Start simple. A prompt chain with one validation gate handles a surprising number of real-world workflows. Add complexity only when the simpler pattern demonstrably fails to meet your requirements.

Designing AI Workflows — Orchestration Patterns for .NET Applications

Single-Shot vs Multi-Step Workflows

Pattern 1: Prompt Chaining

Pattern 2: Sequential Pipeline with Error Gates

Pattern 3: Parallel Fan-Out / Fan-In

Pattern 4: Human-in-the-Loop Checkpoints

Pattern 5: The Agent Loop

Semantic Kernel Planner vs Custom Orchestration

Durable Execution with Azure Durable Functions

Testing AI Workflows

Level 1: Unit Tests with Mocked LLM

Level 2: Integration Tests with Real LLM

Level 3: LLM-as-Judge Evaluation

Choosing the Right Pattern

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

Related Articles

Function Calling and Tool Use with Semantic Kernel

Microsoft Agent Framework and Multi-Agent Patterns in .NET

Fix: Semantic Kernel Plugin and Function Registration Errors in .NET

Was this article useful?

Discussion