From Theory to Practice
In the previous article, we explored how LLMs generate text through next-token prediction. That understanding is not academic — it directly informs how you write prompts that produce reliable results.
Prompt engineering is where mechanical understanding meets practical application. The model predicts tokens one at a time, influenced by everything in its context window. Your job is to fill that context with information that biases the model toward the output you need.
This article covers the fundamentals with C# code you can use in production. We will work with Microsoft.Extensions.AI and Azure OpenAI, but the prompting principles apply to any LLM provider.
The Anatomy of a Prompt: Message Roles
Modern LLM APIs structure conversations as a sequence of messages, each with a role. Understanding these roles is foundational.
System Message
The system message sets the model’s behavior for the entire conversation. It is processed before any user input and influences every response the model generates. Think of it as configuration — it defines who the model is and how it should behave.
User Message
User messages contain the actual requests. In a chatbot, these are literally what the user types. In an automated pipeline, these are the prompts your application constructs.
Assistant Message
Assistant messages represent the model’s previous responses. Including them in the conversation history gives the model context about what it has already said, enabling multi-turn conversations.
Here is how these roles look in C# using Microsoft.Extensions.AI:
using Microsoft.Extensions.AI;
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a senior .NET architect. Answer questions about C# and .NET with precise, production-focused guidance. Use code examples when helpful."),
new(ChatRole.User, "How should I implement the repository pattern with EF Core in .NET 9?")
};
var response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response.Text);
The order matters. System first, then alternating user and assistant messages to build conversation history. The model sees the entire message sequence and generates its response based on all of it.
Designing Effective System Messages
The system message is the single highest-leverage prompt engineering decision. A well-designed system message can eliminate entire categories of bad output. A vague one leaves the model guessing.
The Four Components
Every production system message should address four concerns:
1. Role — Who is the model?
You are a .NET technical documentation assistant for the DotNetStudioAI platform.
2. Task — What should it do?
Answer developer questions about C#, .NET, Azure AI services, and Semantic Kernel. Provide code examples targeting .NET 9 unless a different version is specified.
3. Format — How should it respond?
Use markdown formatting. Include code blocks with language identifiers. Keep responses concise — under 500 words unless the question requires a detailed walkthrough.
4. Boundaries — What should it refuse?
If a question is outside .NET development or AI integration, politely redirect. Do not speculate about unreleased features. Do not provide security advice beyond referencing official Microsoft documentation.
Combined into a single system message:
const string SystemPrompt = """
You are a .NET technical documentation assistant for the DotNetStudioAI platform.
Answer developer questions about C#, .NET, Azure AI services, and Semantic Kernel.
Provide code examples targeting .NET 9 unless a different version is specified.
Use markdown formatting. Include code blocks with language identifiers.
Keep responses concise — under 500 words unless the question requires a detailed walkthrough.
If a question is outside .NET development or AI integration, politely redirect.
Do not speculate about unreleased features.
Do not provide security advice beyond referencing official Microsoft documentation.
""";
Notice the use of C# raw string literals ("""). They are ideal for multi-line system prompts — no escaping needed, clean indentation, easy to read and maintain.
Few-Shot Prompting
Few-shot prompting gives the model examples of the exact input-output pattern you want. It is the most reliable technique for controlling output format without fine-tuning.
When Zero-Shot Is Not Enough
Zero-shot prompting — just telling the model what to do without examples — works for simple, well-understood tasks. But when the output needs to follow a specific format or convention that the model would not naturally adopt, examples are far more effective than lengthy instructions.
Building Few-Shot Prompts in C#
In the message-based API, few-shot examples are pairs of user and assistant messages injected into the conversation before the real question:
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are an API that classifies .NET exceptions into categories. Respond with only the category name."),
// Few-shot example 1
new(ChatRole.User, "System.NullReferenceException: Object reference not set to an instance of an object."),
new(ChatRole.Assistant, "Null Reference"),
// Few-shot example 2
new(ChatRole.User, "System.Net.Http.HttpRequestException: Connection refused (localhost:5001)"),
new(ChatRole.Assistant, "Network/Connectivity"),
// Few-shot example 3
new(ChatRole.User, "System.Text.Json.JsonException: The JSON value could not be converted to System.Int32."),
new(ChatRole.Assistant, "Serialization"),
// Actual request
new(ChatRole.User, "System.InvalidOperationException: Sequence contains no elements")
};
var response = await chatClient.GetResponseAsync(messages);
// Expected output: "Collection/LINQ"
Three to five examples typically suffice. More examples consume tokens (and cost money) without proportionally improving quality. Choose examples that cover the diversity of cases you expect.
Dynamic Few-Shot Selection
For production systems, you might select examples dynamically based on the input. This is especially powerful when combined with embeddings — retrieve the most similar examples from a database rather than using a fixed set:
public class FewShotExampleSelector
{
private readonly IEmbeddingGenerator<string, Embedding<float>> _embedder;
private readonly List<FewShotExample> _examples;
public async Task<List<FewShotExample>> SelectExamplesAsync(
string userInput, int count = 3)
{
var inputEmbedding = await _embedder.GenerateAsync([userInput]);
return _examples
.Select(ex => new
{
Example = ex,
Similarity = CosineSimilarity(
inputEmbedding[0].Vector, ex.Embedding)
})
.OrderByDescending(x => x.Similarity)
.Take(count)
.Select(x => x.Example)
.ToList();
}
}
Chain-of-Thought Prompting
Chain-of-thought (CoT) prompting asks the model to reason through a problem step by step before providing a final answer. This significantly improves accuracy on tasks that require multi-step reasoning — math, logic, code analysis, complex decision-making.
The technique works because of how next-token prediction works. When the model generates intermediate reasoning tokens, those tokens become part of its context and influence subsequent token predictions. The reasoning steps literally help the model “think” its way to a better answer.
var messages = new List<ChatMessage>
{
new(ChatRole.System, """
You are a .NET performance analyst. When analyzing code for performance issues:
1. First, identify what the code does at a high level
2. Then, analyze each operation's time and space complexity
3. Identify specific bottlenecks with line references
4. Finally, provide your recommendation with corrected code
Always show your reasoning before your conclusion.
"""),
new(ChatRole.User, """
Analyze this code for performance issues:
public List<Customer> GetActiveCustomers(List<Customer> customers, List<Order> orders)
{
var result = new List<Customer>();
foreach (var customer in customers)
{
foreach (var order in orders)
{
if (order.CustomerId == customer.Id && order.IsActive)
{
if (!result.Contains(customer))
result.Add(customer);
}
}
}
return result;
}
""")
};
The model will walk through its analysis step by step rather than jumping to a conclusion. This produces more accurate and more useful output.
Structured Output: Getting JSON from LLMs
One of the most common tasks in production AI applications is extracting structured data from model responses. C# developers need strongly-typed objects, not free-form text.
The Prompt Strategy
Three elements work together: system instructions specifying the format, an example of the expected structure, and a low temperature to minimize deviation.
var messages = new List<ChatMessage>
{
new(ChatRole.System, """
You are a data extraction API. Extract structured information from
user-provided text and return it as JSON.
Always respond with valid JSON matching this exact schema:
{
"entities": [
{
"name": "string",
"type": "Person | Organization | Technology",
"context": "string (brief description of how the entity appears)"
}
],
"summary": "string (one-sentence summary)",
"sentiment": "positive | neutral | negative"
}
Do not include any text outside the JSON object.
"""),
new(ChatRole.User, "Microsoft announced that Semantic Kernel 2.0 will include native support for multi-agent orchestration, building on the success of their AI integration in Visual Studio and GitHub Copilot.")
};
Enabling JSON Mode
Azure OpenAI and other providers support a JSON mode that guarantees the response is valid JSON (though not necessarily matching your schema — that is your responsibility to validate):
using Azure.AI.OpenAI;
var options = new ChatCompletionOptions
{
Temperature = 0.1f,
ResponseFormat = ChatResponseFormat.CreateJsonObjectFormat()
};
Deserializing and Validating in C#
Never trust model output blindly. Always validate:
using System.Text.Json;
public record ExtractionResult(
List<ExtractedEntity> Entities,
string Summary,
string Sentiment);
public record ExtractedEntity(
string Name,
string Type,
string Context);
public static ExtractionResult? ParseAndValidate(string modelOutput)
{
try
{
var result = JsonSerializer.Deserialize<ExtractionResult>(
modelOutput,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
if (result is null || result.Entities is null || result.Summary is null)
return null;
// Validate enum-like fields
var validTypes = new HashSet<string> { "Person", "Organization", "Technology" };
if (result.Entities.Any(e => !validTypes.Contains(e.Type)))
return null;
var validSentiments = new HashSet<string> { "positive", "neutral", "negative" };
if (!validSentiments.Contains(result.Sentiment))
return null;
return result;
}
catch (JsonException)
{
return null;
}
}
This pattern — prompt for JSON, enable JSON mode, deserialize with validation — is the standard approach for structured extraction in production .NET applications.
Temperature and Max Tokens: Tuning for Your Task
These two parameters have the most immediate impact on output quality and cost.
Temperature by Task Type
| Task | Temperature | Reasoning |
|---|---|---|
| JSON extraction | 0 - 0.1 | Deterministic output, minimal variation |
| Code generation | 0 - 0.2 | Correctness over creativity |
| Summarization | 0.3 - 0.5 | Some variety in phrasing, consistent content |
| Conversational chat | 0.5 - 0.7 | Natural-sounding, varied responses |
| Creative writing | 0.7 - 1.0 | Diverse, unexpected outputs |
Max Tokens
MaxTokens (or MaxOutputTokens) caps the model’s response length. Set it thoughtfully:
var options = new ChatCompletionOptions
{
Temperature = 0.1f,
MaxOutputTokenCount = 500 // Cap response at ~375 words
};
Setting this too low truncates useful responses mid-sentence. Setting it too high wastes money when the model generates unnecessary padding. For JSON extraction, you can often estimate the maximum response size from your schema and set a tight limit.
Common Anti-Patterns and Fixes
After working with dozens of production LLM integrations, certain mistakes appear repeatedly. Here are the most damaging ones and how to fix them.
Anti-Pattern 1: Vague System Messages
Bad:
You are a helpful assistant.
Better:
You are a .NET 9 code review assistant. Analyze C# code for bugs, performance issues, and style violations. Reference specific line numbers. Suggest fixes with corrected code snippets. If the code has no issues, say so explicitly.
The vague version gives the model no guidance. It will produce generic, unfocused output. The specific version constrains the model’s behavior in ways that match your application’s needs.
Anti-Pattern 2: Instructions Without Structure
Bad:
Summarize this document and also extract any dates mentioned and list the people involved and tell me if the sentiment is positive or negative.
Better:
Analyze the following document. Provide your analysis in these sections:
## Summary
Two to three sentences summarizing the key points.
## People Mentioned
Bulleted list of names with their role or context.
## Key Dates
Bulleted list of dates with associated events.
## Overall Sentiment
One word: positive, neutral, or negative. Followed by a one-sentence justification.
Structured instructions produce structured output. The model follows formatting patterns when they are clearly demonstrated.
Anti-Pattern 3: Ignoring Token Economics
Every token in your system prompt is charged on every request. A 3,000-token system prompt across 100,000 daily requests is 300 million tokens per day in system prompt alone. At typical pricing, that adds up fast.
Audit your system prompts. Remove redundant instructions. Move examples to few-shot messages that you can conditionally include. Measure whether each sentence in your system prompt actually changes the model’s behavior — if it does not, remove it.
Anti-Pattern 4: No Retry or Validation
LLM responses are probabilistic. Even with temperature 0, the model can occasionally produce malformed output (especially at API boundaries like rate limits or timeouts). Always implement:
public async Task<ExtractionResult> ExtractWithRetryAsync(
IChatClient client,
List<ChatMessage> messages,
int maxRetries = 3)
{
for (int attempt = 0; attempt < maxRetries; attempt++)
{
var response = await client.GetResponseAsync(messages);
var result = ParseAndValidate(response.Text);
if (result is not null)
return result;
// Log the failed attempt for prompt iteration
Log.Warning("Extraction attempt {Attempt} failed. Raw output: {Output}",
attempt + 1, response.Text);
}
throw new InvalidOperationException(
$"Failed to extract valid result after {maxRetries} attempts");
}
Building Reusable Prompt Templates
Scattering prompt strings across your codebase is a maintenance nightmare. Centralize them in template classes that are easy to find, test, and iterate on.
public static class PromptTemplates
{
public static string CodeReviewSystem(string dotnetVersion = "9") => $"""
You are a code review assistant specializing in .NET {dotnetVersion} and C#.
For each code snippet submitted:
1. Identify bugs, including null reference risks and unhandled exceptions
2. Flag performance concerns with estimated impact
3. Note style issues per current .NET conventions
4. Provide corrected code for any issues found
If the code is clean, explicitly state that no issues were found.
Respond in markdown with separate sections for each category.
""";
public static string EntityExtractionSystem(string[] entityTypes) => $"""
You are a named entity extraction API.
Extract entities of these types: {string.Join(", ", entityTypes)}.
Return a JSON array of objects with "name", "type", and "context" fields.
Return an empty array if no entities are found.
Do not include any text outside the JSON array.
""";
public static string SummarizationUser(string content, int maxSentences = 3) => $"""
Summarize the following content in {maxSentences} sentences or fewer.
Focus on actionable information relevant to software engineers.
Content:
{content}
""";
}
This approach keeps prompt logic testable and versionable. When you need to iterate on a prompt — and you will — you change it in one place.
Complete Example: Azure OpenAI Structured Extraction
Here is a production-ready example that ties everything together — calling Azure OpenAI with Microsoft.Extensions.AI, using a structured prompt, and validating the response:
using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Extensions.AI;
using System.Text.Json;
// Configure the client with Managed Identity (no API keys in production)
var azureClient = new AzureOpenAIClient(
new Uri("https://your-resource.openai.azure.com/"),
new DefaultAzureCredential());
IChatClient chatClient = azureClient
.GetChatClient("gpt-4o")
.AsIChatClient();
// Build the prompt
var messages = new List<ChatMessage>
{
new(ChatRole.System, """
You are a technical content classifier for .NET developer articles.
Classify the given article title and description into:
- category: one of "tutorial", "reference", "troubleshooting", "news", "opinion"
- difficulty: one of "beginner", "intermediate", "advanced"
- tags: array of 3-5 relevant technology tags
- summary: one-sentence summary of what the article covers
Respond with valid JSON only.
"""),
// Few-shot example
new(ChatRole.User, """
Title: Fix: CS8032 Instance of Analyzer Cannot Be Created in .NET 8+
Description: How to resolve the CS8032 error when building .NET 8 projects with Roslyn analyzers.
"""),
new(ChatRole.Assistant, """
{"category":"troubleshooting","difficulty":"intermediate","tags":["Roslyn","Analyzers",".NET 8","Build Errors"],"summary":"Guide to resolving CS8032 analyzer instantiation errors caused by Roslyn API version mismatches in .NET 8+ projects."}
"""),
// Actual request
new(ChatRole.User, """
Title: Build a RAG Chatbot with .NET, Semantic Kernel, and Azure Cosmos DB
Description: End-to-end tutorial for building a production-ready RAG chatbot using Semantic Kernel and Cosmos DB vector search.
""")
};
var options = new ChatOptions
{
Temperature = 0.1f,
MaxOutputTokens = 300
};
var response = await chatClient.GetResponseAsync(messages, options);
// Parse and validate
var classification = JsonSerializer.Deserialize<ArticleClassification>(
response.Text,
new JsonSerializerOptions { PropertyNameCaseInsensitive = true });
if (classification is not null)
{
Console.WriteLine($"Category: {classification.Category}");
Console.WriteLine($"Difficulty: {classification.Difficulty}");
Console.WriteLine($"Tags: {string.Join(", ", classification.Tags)}");
Console.WriteLine($"Summary: {classification.Summary}");
}
public record ArticleClassification(
string Category,
string Difficulty,
string[] Tags,
string Summary);
This example demonstrates the complete pattern: typed client setup with managed identity, system message with clear schema, few-shot example, low temperature for consistency, and strongly-typed deserialization.
What Comes Next
You now have the prompting fundamentals — roles, system messages, few-shot patterns, chain-of-thought, structured output, and template management. These techniques apply to every LLM interaction you build, regardless of provider.
The next step is understanding the provider landscape. Different models have different strengths, pricing, and API patterns. In Comparing LLM Providers: OpenAI, Azure, and Anthropic, we break down the trade-offs to help you choose the right provider for your use case.
For hands-on practice with Azure OpenAI streaming in C#, check out the Azure OpenAI Chat Completion with Streaming in .NET workshop.