Single-Shot vs Multi-Step Workflows
The simplest AI integration is a single-shot prompt. User provides input, you call the model, you return the response. For many use cases — summarization, translation, simple Q&A — single-shot works fine. But the moment your requirements involve multiple reasoning steps, conditional logic, or quality validation, a single prompt starts to buckle.
Consider generating a product description. A single prompt might produce acceptable output. But generating a product description that follows brand guidelines, includes SEO keywords, avoids competitor mentions, reads naturally in the target market’s dialect, and fits within a 150-word limit? That’s five distinct concerns. A single prompt trying to satisfy all of them will reliably fail at least one.
Multi-step workflows decompose complex tasks into focused stages. Each stage does one thing well, validates its output, and passes a clean result to the next stage. The result is more reliable, more debuggable, and more maintainable than a monolithic prompt.
This article covers the core orchestration patterns you need for production AI workflows in .NET, with real C# implementations for each.
Pattern 1: Prompt Chaining
Prompt chaining is the foundational pattern. The output of one LLM call feeds directly into the next. Each step has a narrow focus and a well-defined contract.
Here is a three-step chain that researches a topic, drafts content, and then compresses it to a target length:
using Microsoft.SemanticKernel;
Kernel kernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion(
deploymentName: "gpt-4o",
endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
apiKey: Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!)
.Build();
// Step 1: Extract key facts
var extractFunction = kernel.CreateFunctionFromPrompt(
"""
Extract the 5 most important technical facts about the following topic.
Return them as a numbered list. Topic: {{$input}}
""",
new OpenAIPromptExecutionSettings { Temperature = 0.3f, MaxTokens = 500 });
// Step 2: Draft content from facts
var draftFunction = kernel.CreateFunctionFromPrompt(
"""
Using these facts, write a concise technical paragraph suitable for a .NET developer audience.
Facts: {{$input}}
""",
new OpenAIPromptExecutionSettings { Temperature = 0.7f, MaxTokens = 400 });
// Step 3: Compress to target length
var compressFunction = kernel.CreateFunctionFromPrompt(
"""
Rewrite the following paragraph to be under 100 words while preserving all technical accuracy.
Paragraph: {{$input}}
""",
new OpenAIPromptExecutionSettings { Temperature = 0.2f, MaxTokens = 200 });
// Execute the chain
string topic = "gRPC performance advantages over REST in .NET microservices";
var facts = await kernel.InvokeAsync(extractFunction, new() { ["input"] = topic });
var draft = await kernel.InvokeAsync(draftFunction, new() { ["input"] = facts.ToString() });
var final = await kernel.InvokeAsync(compressFunction, new() { ["input"] = draft.ToString() });
Console.WriteLine(final);
Notice the different temperature settings at each stage. Extraction benefits from low temperature (factual precision), drafting from moderate temperature (natural language flow), and compression from low temperature again (minimal creative deviation). This kind of per-step tuning is impossible with a single-shot approach.
Pattern 2: Sequential Pipeline with Error Gates
Prompt chaining becomes a pipeline when you add validation gates between stages. Gates check the intermediate output and decide whether to proceed, retry, or abort. This prevents garbage from one stage from polluting everything downstream.
public class AiPipeline
{
private readonly Kernel _kernel;
private readonly ILogger<AiPipeline> _logger;
public AiPipeline(Kernel kernel, ILogger<AiPipeline> logger)
{
_kernel = kernel;
_logger = logger;
}
public async Task<PipelineResult> ExecuteAsync(string input, CancellationToken ct = default)
{
// Stage 1: Classification
var classifyResult = await ExecuteStageAsync("Classify", input,
"Classify this customer message as: billing, technical, general, or spam. " +
"Respond with only the category.", ct);
if (!Validate(classifyResult, ["billing", "technical", "general", "spam"]))
{
_logger.LogWarning("Classification failed validation: {Result}", classifyResult);
return PipelineResult.Failed("Classification produced invalid category.");
}
if (classifyResult == "spam")
{
return PipelineResult.Rejected("Message classified as spam.");
}
// Stage 2: Sentiment analysis
var sentimentResult = await ExecuteStageAsync("Sentiment", input,
"Rate the sentiment of this message as: positive, neutral, or negative. " +
"Respond with only the sentiment.", ct);
if (!Validate(sentimentResult, ["positive", "neutral", "negative"]))
{
_logger.LogWarning("Sentiment analysis failed validation: {Result}", sentimentResult);
return PipelineResult.Failed("Sentiment analysis produced invalid result.");
}
// Stage 3: Generate response (uses classification and sentiment as context)
string responsePrompt =
$"Category: {classifyResult}\nSentiment: {sentimentResult}\n" +
$"Original message: {input}\n\n" +
"Write an appropriate customer support response.";
var response = await ExecuteStageAsync("Response", responsePrompt,
"You are a customer support agent. Write a helpful, professional response.", ct);
return PipelineResult.Success(response, classifyResult, sentimentResult);
}
private async Task<string> ExecuteStageAsync(
string stageName, string input, string prompt, CancellationToken ct)
{
_logger.LogInformation("Pipeline stage {Stage} starting", stageName);
var function = _kernel.CreateFunctionFromPrompt(prompt);
var result = await _kernel.InvokeAsync(function, new() { ["input"] = input }, ct);
_logger.LogInformation("Pipeline stage {Stage} completed", stageName);
return result.ToString().Trim().ToLowerInvariant();
}
private static bool Validate(string result, string[] allowedValues) =>
allowedValues.Contains(result);
}
The key insight: each stage returns a constrained output that can be validated. Classification returns one of four categories. Sentiment returns one of three values. If validation fails, the pipeline stops cleanly rather than generating a response based on a misclassified message.
Pattern 3: Parallel Fan-Out / Fan-In
When multiple independent AI operations need to happen, running them sequentially wastes time. The fan-out/fan-in pattern dispatches concurrent requests and merges the results.
public async Task<ContentAnalysis> AnalyzeContentAsync(string content)
{
// Fan-out: three independent analyses run concurrently
var summaryTask = SummarizeAsync(content);
var keywordsTask = ExtractKeywordsAsync(content);
var sentimentTask = AnalyzeSentimentAsync(content);
// Fan-in: await all results
await Task.WhenAll(summaryTask, keywordsTask, sentimentTask);
return new ContentAnalysis
{
Summary = summaryTask.Result,
Keywords = keywordsTask.Result,
Sentiment = sentimentTask.Result
};
}
private async Task<string> SummarizeAsync(string content)
{
var function = _kernel.CreateFunctionFromPrompt(
"Summarize this text in 2-3 sentences: {{$input}}");
var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
return result.ToString();
}
private async Task<List<string>> ExtractKeywordsAsync(string content)
{
var function = _kernel.CreateFunctionFromPrompt(
"Extract the top 5 keywords from this text as a comma-separated list: {{$input}}");
var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
return result.ToString().Split(',').Select(k => k.Trim()).ToList();
}
private async Task<string> AnalyzeSentimentAsync(string content)
{
var function = _kernel.CreateFunctionFromPrompt(
"Analyze the sentiment of this text. Respond with: positive, neutral, or negative. {{$input}}");
var result = await _kernel.InvokeAsync(function, new() { ["input"] = content });
return result.ToString().Trim();
}
Three LLM calls that might take 2 seconds each now complete in roughly 2 seconds total instead of 6. The only requirement is that the operations are truly independent — no stage depends on the output of another.
You can also combine patterns. Fan-out the first three analyses in parallel, then chain their combined results into a final synthesis step.
Pattern 4: Human-in-the-Loop Checkpoints
For high-stakes workflows — generating customer communications, modifying data, publishing content — an automated pipeline needs human approval gates. The AI proposes; a human disposes.
public class HumanInTheLoopWorkflow
{
private readonly Kernel _kernel;
private readonly IApprovalService _approvalService;
public HumanInTheLoopWorkflow(Kernel kernel, IApprovalService approvalService)
{
_kernel = kernel;
_approvalService = approvalService;
}
public async Task<WorkflowResult> GenerateAndPublishAsync(
string topic, string assigneeEmail, CancellationToken ct)
{
// Step 1: AI generates draft
var draftFunction = _kernel.CreateFunctionFromPrompt(
"Write a technical blog post about {{$input}}. Target audience: .NET developers.",
new OpenAIPromptExecutionSettings { Temperature = 0.7f, MaxTokens = 1500 });
var draft = await _kernel.InvokeAsync(draftFunction,
new() { ["input"] = topic }, ct);
// Step 2: Request human approval
string approvalId = await _approvalService.RequestApprovalAsync(
assignee: assigneeEmail,
content: draft.ToString(),
context: $"AI-generated blog post about: {topic}",
ct);
// Step 3: Wait for human decision
ApprovalDecision decision = await _approvalService.WaitForDecisionAsync(
approvalId, timeout: TimeSpan.FromHours(24), ct);
return decision switch
{
{ Approved: true } => await PublishAsync(draft.ToString(), ct),
{ Feedback: not null } => await ReviseAndResubmitAsync(
draft.ToString(), decision.Feedback, assigneeEmail, ct),
_ => WorkflowResult.Rejected("Approval denied without feedback.")
};
}
private async Task<WorkflowResult> ReviseAndResubmitAsync(
string original, string feedback, string assigneeEmail, CancellationToken ct)
{
var reviseFunction = _kernel.CreateFunctionFromPrompt(
"""
Revise the following draft based on the editor's feedback.
Original: {{$original}}
Feedback: {{$feedback}}
""");
var revised = await _kernel.InvokeAsync(reviseFunction,
new() { ["original"] = original, ["feedback"] = feedback }, ct);
// Re-enter the approval loop with the revised version
string approvalId = await _approvalService.RequestApprovalAsync(
assignee: assigneeEmail,
content: revised.ToString(),
context: $"Revised draft based on feedback: {feedback}",
ct);
var decision = await _approvalService.WaitForDecisionAsync(
approvalId, timeout: TimeSpan.FromHours(24), ct);
return decision.Approved
? await PublishAsync(revised.ToString(), ct)
: WorkflowResult.Rejected("Revision not approved.");
}
private Task<WorkflowResult> PublishAsync(string content, CancellationToken ct)
{
// Your publishing logic
return Task.FromResult(WorkflowResult.Published(content));
}
}
The critical design decision: the workflow pauses at the approval step. It does not proceed automatically. For long-running workflows like this, where human response time is measured in hours, you need durable execution.
Pattern 5: The Agent Loop
The agent loop is the most autonomous pattern. Instead of following a predetermined sequence, the agent dynamically decides what to do next based on observations. The cycle is: observe, plan, act, evaluate.
public class ResearchAgent
{
private readonly Kernel _kernel;
private readonly int _maxIterations;
public ResearchAgent(Kernel kernel, int maxIterations = 5)
{
_kernel = kernel;
_maxIterations = maxIterations;
}
public async Task<string> ResearchAsync(string question, CancellationToken ct = default)
{
var history = new List<string>();
string currentGoal = question;
for (int i = 0; i < _maxIterations; i++)
{
// Observe: gather current state
string context = string.Join("\n", history);
// Plan: decide next action
var planFunction = _kernel.CreateFunctionFromPrompt(
"""
Goal: {{$goal}}
Research so far: {{$context}}
What is the single most important thing to investigate next?
If the research is sufficient to answer the goal, respond with: DONE
Otherwise, describe the next research step in one sentence.
""");
var plan = await _kernel.InvokeAsync(planFunction,
new() { ["goal"] = currentGoal, ["context"] = context }, ct);
string nextStep = plan.ToString().Trim();
if (nextStep.Contains("DONE", StringComparison.OrdinalIgnoreCase))
break;
// Act: execute the planned step (using tools via function calling)
var actFunction = _kernel.CreateFunctionFromPrompt(
"Research the following: {{$step}}. Provide factual findings.",
new OpenAIPromptExecutionSettings
{
FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
});
var findings = await _kernel.InvokeAsync(actFunction,
new() { ["step"] = nextStep }, ct);
// Evaluate: record findings
history.Add($"Step {i + 1}: {nextStep}\nFindings: {findings}");
}
// Synthesize final answer
var synthesizeFunction = _kernel.CreateFunctionFromPrompt(
"""
Based on the following research, provide a comprehensive answer to: {{$goal}}
Research: {{$research}}
""");
var answer = await _kernel.InvokeAsync(synthesizeFunction,
new() { ["goal"] = question, ["research"] = string.Join("\n\n", history) }, ct);
return answer.ToString();
}
}
The agent loop is powerful but carries risk. Without the _maxIterations guard, a poorly converging agent can loop indefinitely, burning tokens and time. Always set hard limits.
For fully realized agent orchestration with multiple agents collaborating, see our guide on Microsoft Agent Framework and multi-agent patterns.
Semantic Kernel Planner vs Custom Orchestration
Semantic Kernel includes planning capabilities where the AI model itself decides which functions to call and in what order. The recommended approach uses automatic function calling:
OpenAIPromptExecutionSettings settings = new()
{
FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};
var result = await kernel.InvokePromptAsync(
"Analyze the sales data, create a summary, and email it to the team.",
new(settings));
Use SK’s automatic function calling when:
- The workflow is exploratory — you don’t know the exact steps in advance
- The model needs to decide which tools to use based on context
- You have well-defined plugins and trust the model’s tool selection
Use custom orchestration when:
- The workflow has a fixed, known sequence of steps
- You need validation gates between stages
- Error handling must follow specific business rules
- You need guaranteed execution order for compliance or audit purposes
- Latency budgets require parallel execution of independent steps
In practice, many production systems combine both. Custom orchestration handles the overall workflow structure, while individual stages within that workflow might use automatic function calling for flexible tool use.
Durable Execution with Azure Durable Functions
AI workflows that involve human approval, long-running processing, or multi-step operations that span minutes to hours need durable execution. Standard in-memory orchestration fails if the process restarts.
Azure Durable Functions provides exactly this. The orchestrator function’s state is automatically checkpointed, so it survives restarts, deployments, and transient failures.
[Function("AiContentWorkflow")]
public static async Task<string> RunOrchestrator(
[OrchestrationTrigger] TaskOrchestrationContext context)
{
string topic = context.GetInput<string>()!;
// Step 1: Generate draft (calls an activity that invokes the LLM)
string draft = await context.CallActivityAsync<string>(
"GenerateDraft", topic);
// Step 2: Quality check
QualityResult quality = await context.CallActivityAsync<QualityResult>(
"CheckQuality", draft);
if (!quality.PassesThreshold)
{
// Retry with feedback
draft = await context.CallActivityAsync<string>(
"ReviseDraft", new RevisionInput(draft, quality.Feedback));
}
// Step 3: Human approval (waits for external event)
bool approved = await context.WaitForExternalEvent<bool>(
"ApprovalResult",
timeout: TimeSpan.FromHours(48));
if (!approved)
return "Workflow rejected by reviewer.";
// Step 4: Publish
string url = await context.CallActivityAsync<string>("PublishContent", draft);
return $"Published at: {url}";
}
[Function("GenerateDraft")]
public async Task<string> GenerateDraft(
[ActivityTrigger] string topic, FunctionContext context)
{
var function = _kernel.CreateFunctionFromPrompt(
"Write a detailed technical article about: {{$input}}");
var result = await _kernel.InvokeAsync(function, new() { ["input"] = topic });
return result.ToString();
}
The WaitForExternalEvent call is the key differentiator. The orchestrator suspends, its state is persisted, and it resumes when the approval event arrives — whether that’s 5 minutes or 48 hours later. No long-running process, no polling, no state management code.
For workflows that combine AI processing with human review and multi-step approval chains, Durable Functions eliminates an entire class of reliability concerns. You can also integrate this approach with the patterns from our RAG chatbot workshop for end-to-end AI application architectures.
Testing AI Workflows
Non-deterministic systems require non-traditional testing strategies. Here is a three-level approach.
Level 1: Unit Tests with Mocked LLM
Test your orchestration logic without calling real AI services. Mock the kernel or the AI service to return predictable responses.
[Fact]
public async Task Pipeline_ClassifiesSpam_ReturnsRejected()
{
// Arrange: mock the kernel to return "spam" for classification
var mockKernel = CreateMockKernel(responses: new Dictionary<string, string>
{
["Classify"] = "spam"
});
var pipeline = new AiPipeline(mockKernel, NullLogger<AiPipeline>.Instance);
// Act
var result = await pipeline.ExecuteAsync("Buy cheap watches now!!!");
// Assert
Assert.False(result.IsSuccess);
Assert.Equal("Message classified as spam.", result.Reason);
}
This level tests that your pipeline logic — the gates, the routing, the error handling — works correctly. It says nothing about whether the AI model will actually classify correctly.
Level 2: Integration Tests with Real LLM
Call the real model, but use relaxed assertions. Don’t assert exact string equality — assert structural properties.
[Fact]
[Trait("Category", "Integration")]
public async Task SummarizeChain_ProducesOutput_WithinTokenBudget()
{
var result = await _chain.SummarizeAsync(TestData.SampleArticle);
Assert.NotNull(result);
Assert.InRange(result.Split(' ').Length, 20, 100); // Word count range
Assert.Contains(result, r => TestData.ExpectedKeyTerms.Any(
term => r.Contains(term, StringComparison.OrdinalIgnoreCase)));
}
Level 3: LLM-as-Judge Evaluation
Use a separate model call to evaluate output quality against a rubric.
public async Task<EvaluationResult> EvaluateAsync(string output, string rubric)
{
var function = _evaluationKernel.CreateFunctionFromPrompt(
"""
Evaluate the following output against the rubric. Score 1-5.
Output: {{$output}}
Rubric: {{$rubric}}
Respond as JSON: {"score": N, "reasoning": "..."}
""",
new OpenAIPromptExecutionSettings { Temperature = 0f });
var result = await _evaluationKernel.InvokeAsync(function,
new() { ["output"] = output, ["rubric"] = rubric });
return JsonSerializer.Deserialize<EvaluationResult>(result.ToString())!;
}
Pin your model versions, use low temperature for evaluation, and run evaluation suites on a schedule rather than on every commit. AI testing is about statistical confidence, not binary pass/fail.
Choosing the Right Pattern
The patterns in this article form a toolkit, not a progression. Match the pattern to the problem:
| Pattern | Best For | Trade-Off |
|---|---|---|
| Prompt chaining | Multi-step transformations | Sequential latency |
| Pipeline with gates | Quality-critical workflows | Added validation complexity |
| Parallel fan-out | Independent concurrent analyses | Higher burst API usage |
| Human-in-the-loop | High-stakes decisions | Workflow pauses for humans |
| Agent loop | Exploratory, open-ended tasks | Unpredictable cost and duration |
Start simple. A prompt chain with one validation gate handles a surprising number of real-world workflows. Add complexity only when the simpler pattern demonstrably fails to meet your requirements.