How do I summarize long documents with Azure OpenAI?

Split the document into chunks that fit within the model's context window, summarize each chunk independently, then combine those summaries into a final summary. This map-reduce pattern handles documents of any length without exceeding token limits.

How do I handle token limits when summarizing?

Estimate token counts using a simple character-to-token ratio (roughly 4 characters per token for English text) or a proper tokenizer. Keep each chunk well below the model's context window -- typically 60-70 percent -- to leave room for the system prompt and the generated summary.

Can I summarize PDFs with Azure OpenAI?

Azure OpenAI works with text, not PDF binary data. You need a text extraction step first. Libraries like PdfPig (for .NET) or Azure Document Intelligence extract text from PDFs, which you then pass to your summarization pipeline.

Build a Document Summarizer with C# and Azure OpenAI

Q: What is recursive summarization?

Recursive summarization takes the summaries produced from individual chunks and feeds them back into the model for a second (or third) pass. If the combined summaries still exceed the context window, the process repeats until the result fits in a single prompt. This produces coherent, high-quality summaries from very large documents.

Large documents do not fit in a single prompt. A ten-page report might contain 4,000 tokens; a legal contract or research paper can easily exceed 50,000. Summarization at that scale demands a strategy: split the text, summarize the pieces, and combine the results. This workshop builds that strategy into a complete API.

You will create a .NET 9 Web API that accepts plain text documents, splits them into token-aware chunks, summarizes each chunk with Azure OpenAI, recursively combines those summaries, and streams the final result back to the client. Every line of code is included. The project runs end to end.

Prerequisites

.NET 9 SDK installed
An Azure OpenAI resource with a deployed gpt-4o (or gpt-4o-mini) model
Your Azure OpenAI endpoint, API key, and deployment name
Familiarity with how large language models work will help but is not required

Step 1 — Scaffold the Project

dotnet new webapi -n DocumentSummarizer --use-minimal-apis
cd DocumentSummarizer
dotnet add package Azure.AI.OpenAI --version 2.1.0

Step 2 — Application Configuration

appsettings.json

{
  "AzureOpenAI": {
    "Endpoint": "https://<your-resource>.openai.azure.com/",
    "ApiKey": "<your-api-key>",
    "DeploymentName": "gpt-4o"
  },
  "Summarization": {
    "MaxTokensPerChunk": 3000,
    "MaxSummaryTokens": 800,
    "OverlapTokens": 200
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

The Summarization section controls chunking behavior. MaxTokensPerChunk is the target size for each slice of text. MaxSummaryTokens caps how long each individual summary can be. OverlapTokens preserves context between adjacent chunks so the model does not miss information at boundaries.

Step 3 — Define Models

Models/AzureOpenAISettings.cs

namespace DocumentSummarizer.Models;

public sealed class AzureOpenAISettings
{
    public const string SectionName = "AzureOpenAI";

    public required string Endpoint { get; init; }
    public required string ApiKey { get; init; }
    public required string DeploymentName { get; init; }
}

Models/SummarizationSettings.cs

namespace DocumentSummarizer.Models;

public sealed class SummarizationSettings
{
    public const string SectionName = "Summarization";

    public int MaxTokensPerChunk { get; init; } = 3000;
    public int MaxSummaryTokens { get; init; } = 800;
    public int OverlapTokens { get; init; } = 200;
}

Models/SummarizeRequest.cs

namespace DocumentSummarizer.Models;

public sealed class SummarizeRequest
{
    public required string Text { get; init; }
    public string? Title { get; init; }
    public string Style { get; init; } = "concise";
}

Models/SummarizeResponse.cs

namespace DocumentSummarizer.Models;

public sealed class SummarizeResponse
{
    public required string Summary { get; init; }
    public int OriginalTokenEstimate { get; init; }
    public int ChunkCount { get; init; }
    public int RecursionDepth { get; init; }
}

Step 4 — Build the Token-Aware Chunking Service

The chunking strategy must respect token limits while preserving readability. Splitting mid-sentence destroys context. The approach below splits on paragraph boundaries and falls back to sentence boundaries when a single paragraph exceeds the token budget.

Services/TextChunker.cs

using DocumentSummarizer.Models;
using Microsoft.Extensions.Options;

namespace DocumentSummarizer.Services;

public sealed class TextChunker
{
    private readonly SummarizationSettings _settings;

    // Rough approximation: 1 token ~ 4 characters for English text.
    // For production accuracy, use a proper tokenizer like Microsoft.ML.Tokenizers.
    private const int CharsPerToken = 4;

    public TextChunker(IOptions<SummarizationSettings> settings)
    {
        _settings = settings.Value;
    }

    public static int EstimateTokens(string text) =>
        (int)Math.Ceiling((double)text.Length / CharsPerToken);

    public List<string> Chunk(string text)
    {
        var maxChars = _settings.MaxTokensPerChunk * CharsPerToken;
        var overlapChars = _settings.OverlapTokens * CharsPerToken;

        if (text.Length <= maxChars)
            return [text];

        var paragraphs = text.Split(
            ["\r\n\r\n", "\n\n"],
            StringSplitOptions.RemoveEmptyEntries);

        var chunks = new List<string>();
        var currentChunk = new System.Text.StringBuilder();

        foreach (var paragraph in paragraphs)
        {
            // If a single paragraph exceeds the limit, split by sentences
            if (paragraph.Length > maxChars)
            {
                if (currentChunk.Length > 0)
                {
                    chunks.Add(currentChunk.ToString().Trim());
                    currentChunk.Clear();
                }
                chunks.AddRange(SplitBySentences(paragraph, maxChars));
                continue;
            }

            if (currentChunk.Length + paragraph.Length + 2 > maxChars)
            {
                chunks.Add(currentChunk.ToString().Trim());

                // Overlap: carry the end of the previous chunk forward
                var overlap = currentChunk.Length > overlapChars
                    ? currentChunk.ToString()[^overlapChars..]
                    : "";
                currentChunk.Clear();
                currentChunk.Append(overlap);
            }

            currentChunk.AppendLine(paragraph);
            currentChunk.AppendLine();
        }

        if (currentChunk.Length > 0)
            chunks.Add(currentChunk.ToString().Trim());

        return chunks;
    }

    private static List<string> SplitBySentences(string text, int maxChars)
    {
        var sentences = text.Split(
            [". ", "! ", "? "],
            StringSplitOptions.RemoveEmptyEntries);

        var chunks = new List<string>();
        var current = new System.Text.StringBuilder();

        foreach (var sentence in sentences)
        {
            if (current.Length + sentence.Length + 2 > maxChars && current.Length > 0)
            {
                chunks.Add(current.ToString().Trim());
                current.Clear();
            }
            current.Append(sentence.TrimEnd());
            current.Append(". ");
        }

        if (current.Length > 0)
            chunks.Add(current.ToString().Trim());

        return chunks;
    }
}

The 4-characters-per-token estimate is deliberately conservative for English. Production systems should use a proper tokenizer, but this ratio works well for a workshop and avoids adding another dependency.

Step 5 — Build the Summarization Service

The summarization service implements two strategies. For short documents that fit in one chunk, it calls the model once. For longer documents, it uses the map-reduce pattern: summarize each chunk, then summarize the summaries. If the combined summaries still exceed the context window, it recurses.

Services/SummarizationService.cs

using System.Runtime.CompilerServices;
using Azure.AI.OpenAI;
using DocumentSummarizer.Models;
using Microsoft.Extensions.Options;
using OpenAI.Chat;

namespace DocumentSummarizer.Services;

public sealed class SummarizationService
{
    private readonly AzureOpenAIClient _aiClient;
    private readonly AzureOpenAISettings _aiSettings;
    private readonly SummarizationSettings _sumSettings;
    private readonly TextChunker _chunker;
    private readonly ILogger<SummarizationService> _logger;

    public SummarizationService(
        AzureOpenAIClient aiClient,
        IOptions<AzureOpenAISettings> aiSettings,
        IOptions<SummarizationSettings> sumSettings,
        TextChunker chunker,
        ILogger<SummarizationService> logger)
    {
        _aiClient = aiClient;
        _aiSettings = aiSettings.Value;
        _sumSettings = sumSettings.Value;
        _chunker = chunker;
        _logger = logger;
    }

    public async Task<SummarizeResponse> SummarizeAsync(SummarizeRequest request)
    {
        var chunks = _chunker.Chunk(request.Text);
        _logger.LogInformation(
            "Document split into {ChunkCount} chunks", chunks.Count);

        int recursionDepth = 0;
        var summaries = await SummarizeChunksAsync(chunks, request.Style);

        // Recursive reduction: if combined summaries are still too long
        while (summaries.Count > 1)
        {
            recursionDepth++;
            _logger.LogInformation(
                "Recursion depth {Depth}: combining {Count} summaries",
                recursionDepth, summaries.Count);

            var combinedText = string.Join("\n\n", summaries);
            var subChunks = _chunker.Chunk(combinedText);

            if (subChunks.Count == 1)
            {
                // Fits in one prompt now -- do the final summary
                summaries = [await SummarizeSingleAsync(
                    subChunks[0], request.Style, isFinalPass: true)];
            }
            else
            {
                summaries = await SummarizeChunksAsync(subChunks, request.Style);
            }
        }

        return new SummarizeResponse
        {
            Summary = summaries[0],
            OriginalTokenEstimate = TextChunker.EstimateTokens(request.Text),
            ChunkCount = chunks.Count,
            RecursionDepth = recursionDepth
        };
    }

    public async IAsyncEnumerable<string> SummarizeStreamingAsync(
        SummarizeRequest request,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        var chunks = _chunker.Chunk(request.Text);

        List<string> summaries;
        if (chunks.Count == 1)
        {
            // Stream the single-chunk summary directly
            await foreach (var token in StreamSummaryAsync(
                chunks[0], request.Style, true, cancellationToken))
            {
                yield return token;
            }
            yield break;
        }

        // Multi-chunk: summarize chunks, then stream the final pass
        summaries = await SummarizeChunksAsync(chunks, request.Style);

        while (summaries.Count > 1)
        {
            var combinedText = string.Join("\n\n", summaries);
            var subChunks = _chunker.Chunk(combinedText);

            if (subChunks.Count == 1)
                break;

            summaries = await SummarizeChunksAsync(subChunks, request.Style);
        }

        var finalInput = string.Join("\n\n", summaries);
        await foreach (var token in StreamSummaryAsync(
            finalInput, request.Style, true, cancellationToken))
        {
            yield return token;
        }
    }

    private async Task<List<string>> SummarizeChunksAsync(
        List<string> chunks, string style)
    {
        var tasks = chunks.Select(chunk =>
            SummarizeSingleAsync(chunk, style, isFinalPass: false));
        var results = await Task.WhenAll(tasks);
        return results.ToList();
    }

    private async Task<string> SummarizeSingleAsync(
        string text, string style, bool isFinalPass)
    {
        var chatClient = _aiClient.GetChatClient(_aiSettings.DeploymentName);

        var systemPrompt = isFinalPass
            ? $"You are a document summarizer. Produce a {style} final summary of the following content. Preserve key facts and conclusions."
            : $"You are a document summarizer. Produce a {style} summary of this section. Preserve all important details for later synthesis.";

        var messages = new List<ChatMessage>
        {
            new SystemChatMessage(systemPrompt),
            new UserChatMessage(text)
        };

        var options = new ChatCompletionOptions
        {
            MaxOutputTokenCount = _sumSettings.MaxSummaryTokens
        };

        ChatCompletion completion = await chatClient.CompleteChatAsync(
            messages, options);
        return completion.Content[0].Text;
    }

    private async IAsyncEnumerable<string> StreamSummaryAsync(
        string text,
        string style,
        bool isFinalPass,
        [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        var chatClient = _aiClient.GetChatClient(_aiSettings.DeploymentName);

        var systemPrompt = isFinalPass
            ? $"You are a document summarizer. Produce a {style} final summary. Preserve key facts and conclusions."
            : $"You are a document summarizer. Produce a {style} section summary.";

        var messages = new List<ChatMessage>
        {
            new SystemChatMessage(systemPrompt),
            new UserChatMessage(text)
        };

        var options = new ChatCompletionOptions
        {
            MaxOutputTokenCount = _sumSettings.MaxSummaryTokens
        };

        await foreach (var update in chatClient.CompleteChatStreamingAsync(
            messages, options, cancellationToken))
        {
            foreach (var part in update.ContentUpdate)
            {
                yield return part.Text;
            }
        }
    }
}

Notice that SummarizeChunksAsync fires all chunk summaries in parallel using Task.WhenAll. This is one of the significant advantages of the map-reduce approach — N chunks can be processed concurrently, limited only by your Azure OpenAI rate quota.

Step 6 — Wire Up Program.cs

using System.ClientModel;
using System.ClientModel.Primitives;
using Azure;
using Azure.AI.OpenAI;
using DocumentSummarizer.Models;
using DocumentSummarizer.Services;

var builder = WebApplication.CreateBuilder(args);

builder.Services.Configure<AzureOpenAISettings>(
    builder.Configuration.GetSection(AzureOpenAISettings.SectionName));
builder.Services.Configure<SummarizationSettings>(
    builder.Configuration.GetSection(SummarizationSettings.SectionName));

builder.Services.AddSingleton(sp =>
{
    var settings = builder.Configuration
        .GetSection(AzureOpenAISettings.SectionName)
        .Get<AzureOpenAISettings>()
        ?? throw new InvalidOperationException("AzureOpenAI settings missing.");

    var options = new AzureOpenAIClientOptions
    {
        RetryPolicy = new ClientRetryPolicy(maxRetries: 3)
    };

    return new AzureOpenAIClient(
        new Uri(settings.Endpoint),
        new AzureKeyCredential(settings.ApiKey),
        options);
});

builder.Services.AddSingleton<TextChunker>();
builder.Services.AddScoped<SummarizationService>();

var app = builder.Build();

// Non-streaming summarization
app.MapPost("/api/summarize", async (SummarizeRequest request, SummarizationService service) =>
{
    if (string.IsNullOrWhiteSpace(request.Text))
        return Results.BadRequest("Text is required.");

    try
    {
        var result = await service.SummarizeAsync(request);
        return Results.Ok(result);
    }
    catch (ClientResultException ex) when (ex.Status == 429)
    {
        return Results.Problem(
            "Rate limit exceeded. Try again later.", statusCode: 429);
    }
});

// Streaming summarization via SSE
app.MapPost("/api/summarize/stream", async (
    SummarizeRequest request,
    SummarizationService service,
    HttpContext context) =>
{
    if (string.IsNullOrWhiteSpace(request.Text))
    {
        context.Response.StatusCode = 400;
        await context.Response.WriteAsync("Text is required.");
        return;
    }

    context.Response.ContentType = "text/event-stream";
    context.Response.Headers.CacheControl = "no-cache";

    try
    {
        await foreach (var token in service.SummarizeStreamingAsync(
            request, context.RequestAborted))
        {
            var escaped = token.Replace("\n", "\\n").Replace("\r", "");
            await context.Response.WriteAsync($"data: {escaped}\n\n");
            await context.Response.Body.FlushAsync(context.RequestAborted);
        }
        await context.Response.WriteAsync("data: [DONE]\n\n");
        await context.Response.Body.FlushAsync();
    }
    catch (OperationCanceledException)
    {
        // Client disconnected
    }
});

app.Run();

Step 7 — Test the API

Start the application:

dotnet run

Summarize a short document:

curl -X POST http://localhost:5000/api/summarize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial intelligence has transformed software development. Modern AI models can generate code, review pull requests, and identify bugs. These capabilities reduce development time and improve code quality. However, developers must understand the limitations of AI tools to use them effectively. AI-generated code still requires human review for correctness, security, and maintainability.",
    "style": "concise"
  }'

Stream a summary:

curl -X POST http://localhost:5000/api/summarize/stream \
  -H "Content-Type: application/json" \
  -d '{"text": "<paste a longer document here>", "style": "detailed"}' \
  --no-buffer

For large documents, you will see the intermediate summaries processed (watch the log output) before the final streamed result arrives.

Step 8 — Supporting PDF Input (Extension Point)

This workshop uses plain text input for simplicity. To accept PDFs, add a text extraction layer. The open-source UglyToad.PdfPig library extracts text from PDFs cleanly:

dotnet add package UglyToad.PdfPig

using UglyToad.PdfPig;

public static string ExtractTextFromPdf(Stream pdfStream)
{
    using var document = PdfDocument.Open(pdfStream);
    var sb = new System.Text.StringBuilder();

    foreach (var page in document.GetPages())
    {
        sb.AppendLine(page.Text);
        sb.AppendLine();
    }

    return sb.ToString();
}

Feed the extracted text into the same SummarizeRequest pipeline. The chunking and summarization logic remains unchanged.

Complete Project Structure

DocumentSummarizer/
  Program.cs
  appsettings.json
  Models/
    AzureOpenAISettings.cs
    SummarizationSettings.cs
    SummarizeRequest.cs
    SummarizeResponse.cs
  Services/
    TextChunker.cs
    SummarizationService.cs

How the Pieces Fit Together

The flow is linear and composable. Text comes in through the API endpoint. The TextChunker splits it into pieces that respect the token budget. The SummarizationService fans out chunk summaries in parallel, collects them, and checks whether the combined result still exceeds the context window. If it does, the cycle repeats. Once the combined text fits in a single prompt, the service generates the final summary — either as a complete response or as a stream.

This map-reduce approach scales to documents of any size. A 100-page document might need two recursion passes. A 500-page document might need three. The cost grows logarithmically with document length, not linearly.

What You Learned

You built a document summarization pipeline that handles documents of arbitrary length. The key techniques were token-aware chunking with paragraph-boundary awareness, parallel chunk summarization with Task.WhenAll, recursive reduction for very long documents, and streaming output for the final pass. Every component is isolated behind a clear interface, making it straightforward to swap the chunking strategy, add PDF extraction, or replace the summarization model.

For prompt design techniques that improve summary quality, see the Prompt Engineering Fundamentals in C# guide. To extend this project with streaming chat capabilities, the Streaming Chat API workshop covers that pattern in depth.

Build a Document Summarizer with C# and Azure OpenAI

Prerequisites

Step 1 — Scaffold the Project

Step 2 — Application Configuration

Step 3 — Define Models

Step 4 — Build the Token-Aware Chunking Service

Step 5 — Build the Summarization Service

Step 6 — Wire Up Program.cs

Step 7 — Test the API

Step 8 — Supporting PDF Input (Extension Point)

Complete Project Structure

How the Pieces Fit Together

What You Learned

AI-Friendly Summary

Summary

Key Takeaways

Implementation Checklist

Frequently Asked Questions

Related Articles

Build a Streaming Chat API with Azure OpenAI and .NET

Build a Real-Time AI Chat App with SignalR and Azure OpenAI

Comparing LLM Providers — OpenAI, Azure OpenAI, Anthropic, and Open-Source

Was this article useful?

Discussion