The Error
You’re calling Azure OpenAI from your .NET application and getting this response:
Azure.RequestFailedException: Service request failed.
Status: 503 (Service Unavailable)
Content:
{
"error": {
"code": "ServiceUnavailable",
"message": "The server is temporarily unable to handle the request. Please try again later."
}
}
Or in some cases, a more terse variant:
System.Net.Http.HttpRequestException: Response status code does not indicate success: 503 (Service Temporarily Unavailable).
This error means Azure OpenAI received your request but couldn’t process it. Unlike a 401 (wrong key) or 404 (wrong deployment), a 503 is recoverable — the service is alive but overloaded or briefly unavailable.
Fixes at a Glance
- Configure proper retry logic — set
MaxRetriesto 3–5 withDelayBackoffType.ExponentialandUseJitter: trueviaMicrosoft.Extensions.Http.Resilience - Add multi-region failover — deploy the same model in a secondary Azure region and use a circuit breaker that routes traffic on sustained 503s
- Verify endpoint configuration — confirm the endpoint URL matches your Azure OpenAI resource exactly, including the correct region segment
Why It Happens
Cause 1: Regional Capacity Pressure (Most Common)
Azure OpenAI deployments share regional capacity. When a region is under heavy load — typically during business hours in US East or West Europe — individual requests can get rejected with 503. This is transient. The request would succeed if sent again a few seconds later.
The telltale sign: your application works fine most of the time but starts throwing 503 intermittently, especially during peak hours.
Cause 2: Misconfigured Endpoint
Less common but easy to miss. If your endpoint URL is almost correct — right domain format but wrong region or resource name — Azure may route the request to a load balancer that can’t find the backend, returning 503 instead of 404.
Check your configuration:
// Wrong — resource name typo or wrong region
var endpoint = "https://my-oai-resrce.openai.azure.com/";
// Correct — exact resource name from the Azure portal
var endpoint = "https://my-oai-resource.openai.azure.com/";
Cause 3: Deployment Not Ready
When you create or update an Azure OpenAI deployment, there’s a brief provisioning window where the endpoint returns 503. If you just created the deployment, wait 2–3 minutes and try again.
Fix 1: Configure Proper Retry Logic
The Azure SDK has built-in retries, but the defaults are tuned for general Azure services, not for AI inference which is inherently slower and more capacity-constrained.
Customize Azure SDK Retry Options
using Azure.AI.OpenAI;
using Azure;
var options = new AzureOpenAIClientOptions();
options.RetryPolicy = new RetryPolicy(
maxRetries: 4,
delay: TimeSpan.FromSeconds(2),
maxDelay: TimeSpan.FromSeconds(30));
var client = new AzureOpenAIClient(
new Uri("https://your-resource.openai.azure.com/"),
new AzureKeyCredential("your-key"),
options);
This tells the SDK to retry up to 4 times, starting with a 2-second delay and capping at 30 seconds. The SDK applies exponential backoff and jitter automatically.
Production-Grade Resilience with Microsoft.Extensions.Http.Resilience
For applications where you need circuit breakers, hedging, or multi-tier fallback, use the official resilience library:
dotnet add package Microsoft.Extensions.Http.Resilience
Register your HTTP client with resilience:
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http.Resilience;
using Polly;
services.AddHttpClient("AzureOpenAI", client =>
{
client.BaseAddress = new Uri("https://your-resource.openai.azure.com/");
})
.AddResilienceHandler("openai-pipeline", builder =>
{
// Retry with exponential backoff + jitter
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 4,
Delay = TimeSpan.FromSeconds(2),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = args => ValueTask.FromResult(
args.Outcome.Result?.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable ||
args.Outcome.Result?.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
});
// Circuit breaker — stop hammering if service is truly down
builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
SamplingDuration = TimeSpan.FromSeconds(30),
FailureRatio = 0.7,
MinimumThroughput = 5,
BreakDuration = TimeSpan.FromSeconds(15)
});
// Overall timeout
builder.AddTimeout(TimeSpan.FromSeconds(60));
});
This pipeline retries on both 503 and 429 (rate limit) errors, breaks the circuit if 70% of requests fail (preventing downstream harm), and enforces a 60-second total timeout.
Fix 2: Multi-Region Failover
For production systems that can’t afford downtime, deploy the same model in two Azure OpenAI regions and fail over automatically:
public class ResilientOpenAIService
{
private readonly AzureOpenAIClient _primary;
private readonly AzureOpenAIClient _secondary;
public ResilientOpenAIService()
{
_primary = new AzureOpenAIClient(
new Uri("https://your-resource-eastus.openai.azure.com/"),
new AzureKeyCredential(Environment.GetEnvironmentVariable("AOAI_KEY_EASTUS")!));
_secondary = new AzureOpenAIClient(
new Uri("https://your-resource-westeurope.openai.azure.com/"),
new AzureKeyCredential(Environment.GetEnvironmentVariable("AOAI_KEY_WESTEUROPE")!));
}
public async Task<ChatCompletion> GetCompletionAsync(
string deploymentName, IEnumerable<ChatMessage> messages)
{
try
{
var client = _primary.GetChatClient(deploymentName);
return await client.CompleteChatAsync(messages);
}
catch (RequestFailedException ex) when (ex.Status == 503)
{
// Fail over to secondary region
var client = _secondary.GetChatClient(deploymentName);
return await client.CompleteChatAsync(messages);
}
}
}
For Semantic Kernel users, the same pattern applies at the kernel level:
var primaryKernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion("chat-deployment",
"https://resource-eastus.openai.azure.com/", keyEastUs)
.Build();
var fallbackKernel = Kernel.CreateBuilder()
.AddAzureOpenAIChatCompletion("chat-deployment",
"https://resource-westeurope.openai.azure.com/", keyWestEurope)
.Build();
Fix 3: Verify Your Endpoint Configuration
Before building retry infrastructure, rule out configuration errors:
// Quick diagnostic — paste into a .NET Interactive notebook or console app
using Azure.AI.OpenAI;
using Azure;
var endpoint = "https://your-resource.openai.azure.com/";
var key = "your-key";
var deployment = "chat-deployment";
try
{
var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(key));
var chatClient = client.GetChatClient(deployment);
var response = await chatClient.CompleteChatAsync("Say hello");
Console.WriteLine($"Success: {response.Value.Content[0].Text}");
}
catch (RequestFailedException ex)
{
Console.WriteLine($"Status: {ex.Status}");
Console.WriteLine($"Error: {ex.Message}");
Console.WriteLine($"Check: endpoint={endpoint}, deployment={deployment}");
}
If this diagnostic fails with 503 repeatedly (not intermittently), the problem is likely your endpoint URL. Verify it matches the exact resource name in the Azure portal under Keys and Endpoint.
When to Escalate
If 503 errors persist for more than 15 minutes across multiple regions, it’s a service incident. Check:
- Azure Status Page — for declared outages
- Azure Monitor — for your resource’s health metrics
- Application Insights — for your application’s retry patterns
Persistent 503s across regions typically indicate a platform-level issue. Open a support ticket with your subscription ID and the timestamps of failed requests from your Application Insights logs.
Prevention Checklist
- Configure retry with exponential backoff and jitter (not fixed delays)
- Set circuit breakers to prevent cascading failures
- Deploy critical workloads to two or more regions
- Monitor retry rates — sudden spikes in retries = early warning
- Size your Provisioned Throughput Units (PTUs) based on actual p95 load, not average
- Test failover paths during development, not during the incident
If your failover is solid but you’re hitting quota limits after recovering, see Fix Azure OpenAI 429 Too Many Requests in .NET for rate limiting and Polly circuit breaker patterns that complement 503 resilience.