Practical OpenTelemetry on Azure

Comprehensive technical guide to implementing OpenTelemetry in cloud environments, covering distributed tracing, metrics, and logging patterns for Azure Functions and ASP.NET Core Web APIs.

Author Avatar

Fernando

  ·  13 min read

What is OpenTelemetry? #

OpenTelemetry is a vendor-neutral, open-source standard for collecting traces, metrics, and logs from cloud-native applications through a single SDK. Traditional observability tooling requires vendor-specific instrumentation — switching platforms means rewriting all instrumentation code. OpenTelemetry solves this with one implementation that exports to any backend: Azure Monitor, Datadog, Grafana, Jaeger, Prometheus, or multiple simultaneously.

Understanding Observability vs Monitoring #

Monitoring answers the question: “Is the system working?” It provides predefined dashboards, alerts, and health checks based on known failure modes.

Observability answers the question: “Why is the system behaving this way?” It enables arbitrary queries against telemetry data to debug unknown problems without predicting failure modes in advance.

Distributed systems require observability because failures emerge from complex interactions between components. Logs scattered across multiple services cannot reconstruct request flows. Observability provides the context to understand system behavior under all conditions, not just anticipated failure scenarios.

OpenTelemetry Signal Types: Traces, Metrics, and Logs #

Distributed Tracing #

A trace represents a single request’s journey through your distributed system. Each step is a span, connected by a unique trace ID across service boundaries — API gateway, function invocations, database calls, downstream services.

Why it matters: 95% of requests complete under 200ms, but 5% take 8+ seconds. Filtering traces by latency reveals they all cluster in the same time window — database lock contention from a background job. That correlation is invisible in logs alone.

Metrics #

Numerical measurements aggregated over time. Metrics answer when and how often: error rate spikes, throughput drops, queue depth grows. Traces answer why: show the specific failing request paths behind those numbers.

Logs #

Structured event records that provide narrative detail. Include trace ID and span ID in every log entry so you can pivot from a failing trace directly to the detailed event sequence it generated — logs become a narrative, not scattered noise.

Implementation Guide: Azure Functions vs Web APIs #

OpenTelemetry implementation differs significantly between serverless Azure Functions and traditional ASP.NET Core Web APIs. Each platform has distinct architectural characteristics that affect instrumentation strategy.

This section provides step-by-step configuration instructions for both platforms.

Azure Functions Implementation #

Azure Functions present unique observability challenges due to their ephemeral, event-driven nature. Unlike long-running processes, Functions consist of hundreds of short-lived executions that must maintain trace context across invocations.

Architectural consideration: Azure Functions use a dual-process model:

  1. Functions Host - Runtime that receives triggers and manages lifecycle
  2. Worker Process - Application code (Program.cs and function implementations)

Complete observability requires instrumenting both processes. Failure to instrument either process results in incomplete traces and broken context propagation.

Step 1: Configure the Functions Host #

Enable OpenTelemetry output in host.json by adding telemetryMode at the root level:

1{
2    "version": "2.0",
3    "telemetryMode": "OpenTelemetry",
4    "logging": {
5        "logLevel": {
6            "default": "Warning"
7        }
8    }
9}

This configuration instructs the Functions Host to emit telemetry as OpenTelemetry signals instead of using the legacy Application Insights SDK. Without this setting, duplicate telemetry and inconsistent trace correlation will occur.

Important: When telemetryMode is set to OpenTelemetry, the logging.applicationInsights section of host.json no longer applies. Log level filtering and other settings must be configured under the logging key directly. Additionally, the Azure portal’s log streaming feature is disabled when OpenTelemetry mode is active.

The application settings in your function app determine where the telemetry is sent:

  • APPLICATIONINSIGHTS_CONNECTION_STRING — sends OpenTelemetry data to Application Insights
  • OTEL_EXPORTER_OTLP_ENDPOINT — sends data to any OTLP-compliant endpoint (Datadog, Grafana, New Relic, etc.)
  • OTEL_EXPORTER_OTLP_HEADERS — authentication headers (API keys) for your OTLP provider

Both settings can coexist to export to multiple backends simultaneously.

Step 2: Configure the Worker Process #

For .NET isolated Functions (Worker Extension v2.x+), configure OpenTelemetry in Program.cs. The recommended approach uses IHostApplicationBuilder:

Exporting to Application Insights:

 1using Microsoft.Azure.Functions.Worker;
 2using Microsoft.Extensions.Hosting;
 3using Azure.Monitor.OpenTelemetry.Exporter;
 4
 5var builder = new HostApplicationBuilder();
 6builder.ConfigureFunctionsWebApplication();
 7
 8builder.Services.AddOpenTelemetry()
 9    .UseFunctionsWorkerDefaults()
10    .UseAzureMonitorExporter();
11
12var host = builder.Build();
13host.Run();

Exporting to any OTLP-compliant endpoint (Datadog, Grafana, New Relic, etc.):

 1using Microsoft.Azure.Functions.Worker;
 2using Microsoft.Extensions.Hosting;
 3using OpenTelemetry;
 4
 5var builder = new HostApplicationBuilder();
 6builder.ConfigureFunctionsWebApplication();
 7
 8builder.Services.AddOpenTelemetry()
 9    .UseFunctionsWorkerDefaults()
10    .UseOtlpExporter(); // reads OTEL_EXPORTER_OTLP_ENDPOINT automatically
11
12var host = builder.Build();
13host.Run();

Both exporters can be chained on the same AddOpenTelemetry() call to export to multiple backends simultaneously.

Key notes:

  • UseFunctionsWorkerDefaults() correlates host and worker telemetry and respects OTEL_SERVICE_NAME to avoid duplicates. Do not add AddAspNetCoreInstrumentation() separately — the host already emits request telemetry and calling it again creates duplicates.
  • Service name defaults to the function app name. Override with OTEL_SERVICE_NAME for stable identity across slots.

Required NuGet packages:

1<PackageReference Include="Microsoft.Azure.Functions.Worker.OpenTelemetry" Version="1.2.0" />
2<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.11.0" />
3<PackageReference Include="Azure.Monitor.OpenTelemetry.Exporter" Version="1.4.0" />

For OTLP export instead of Application Insights:

1<PackageReference Include="Microsoft.Azure.Functions.Worker.OpenTelemetry" Version="1.2.0" />
2<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.11.0" />
3<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.11.0" />

Note: Microsoft.Azure.Functions.Worker.OpenTelemetry 1.2.0 (April 2026) is the first stable (non-preview) release. It adds support for propagating OpenTelemetry baggage to the worker process.

Tracking Cold Starts #

Serverless functions experience cold starts where the first request to a function instance takes longer due to runtime initialization. These appear in traces as unusually long spans for initial requests.

Track cold starts explicitly by adding custom attributes to spans:

 1[Function("ProcessOrder")]
 2public async Task<HttpResponseData> Run(
 3    [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req,
 4    FunctionContext context)
 5{
 6    var activity = Activity.Current;
 7
 8    // Check if this is a cold start by looking for a static flag
 9    if (!_isWarmedUp)
10    {
11        activity?.SetTag("cold.start", true);
12        _isWarmedUp = true;
13    }
14
15    // Your function logic here
16}

This enables filtering traces by cold starts to analyze their performance characteristics separately from warm executions.

Web API Implementation #

Traditional Web APIs (ASP.NET Core) are simpler to instrument due to their long-running process model with full control over the startup pipeline.

Configure OpenTelemetry in Program.cs.

Recommended approach — Azure Monitor Distro for ASP.NET Core:

 1using Azure.Monitor.OpenTelemetry.AspNetCore;
 2
 3var builder = WebApplication.CreateBuilder(args);
 4
 5builder.Services.AddControllers();
 6
 7// Single call wires up traces, metrics, logs, and exports to Application Insights.
 8// Reads APPLICATIONINSIGHTS_CONNECTION_STRING from environment automatically.
 9builder.Services.AddOpenTelemetry().UseAzureMonitor();
10
11var app = builder.Build();
12app.MapControllers();
13app.Run();

UseAzureMonitor() is the recommended entry point for ASP.NET Core apps targeting Application Insights. It automatically includes ASP.NET Core, HttpClient, EF Core, and runtime instrumentation without manual configuration.

Customized approach — for OTLP export or advanced configuration:

When you need fine-grained control (custom exporters, additional enrichment, EF Core SQL capture), the detailed configuration approach gives more flexibility:

 1using OpenTelemetry.Resources;
 2using OpenTelemetry.Trace;
 3using OpenTelemetry.Metrics;
 4using OpenTelemetry;
 5
 6var builder = WebApplication.CreateBuilder(args);
 7
 8builder.Services.AddControllers();
 9
10builder.Services.AddOpenTelemetry()
11    .ConfigureResource(resource => resource
12        .AddService(
13            serviceName: "order-api",
14            serviceVersion: "1.0.0"))
15    .WithTracing(tracing => tracing
16        .AddAspNetCoreInstrumentation(options =>
17        {
18            options.RecordException = true;
19            options.EnrichWithHttpRequest = (activity, request) =>
20            {
21                activity.SetTag("http.route", request.Path);
22            };
23        })
24        .AddHttpClientInstrumentation()
25        .AddEntityFrameworkCoreInstrumentation(options =>
26        {
27            options.SetDbStatementForText = true; // Capture SQL queries
28        })
29        .UseOtlpExporter()) // reads OTEL_EXPORTER_OTLP_ENDPOINT from environment
30    .WithMetrics(metrics => metrics
31        .AddAspNetCoreInstrumentation()
32        .AddHttpClientInstrumentation()
33        .AddRuntimeInstrumentation()
34        .UseOtlpExporter());
35
36var app = builder.Build();
37app.MapControllers();
38app.Run();

Required NuGet packages:

For Application Insights (recommended):

1<PackageReference Include="Azure.Monitor.OpenTelemetry.AspNetCore" Version="1.4.0" />

For OTLP export or custom configuration:

1<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.11.0" />
2<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.11.0" />
3<PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.11.0" />
4<PackageReference Include="OpenTelemetry.Instrumentation.EntityFrameworkCore" Version="1.0.0-beta.1" />
5<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.11.0" />

OpenTelemetry Collector: Centralized Telemetry Hub #

The OpenTelemetry Collector pattern provides a powerful architecture for managing telemetry data.

Direct service-to-backend export (e.g., each service exporting directly to Azure Monitor) works but creates inflexibility. Adding additional backends (such as Jaeger for local development traces or Prometheus for metrics) requires updating configuration across all services.

The Collector pattern solves this by acting as a centralized proxy:

Services → Collector → Multiple Backends

Your services send telemetry to the Collector using the OTLP protocol (OpenTelemetry Protocol). The Collector then routes that data to whatever backends you’ve configured—Azure Monitor, Datadog, Grafana, Jaeger, Prometheus, or all of the above.

Here’s a simplified Collector configuration (otel-collector-config.yaml):

 1receivers:
 2  otlp:
 3    protocols:
 4      grpc:
 5        endpoint: 0.0.0.0:4317
 6      http:
 7        endpoint: 0.0.0.0:4318
 8
 9processors:
10  batch:
11    timeout: 10s
12    send_batch_size: 1024
13
14  # Add resource attributes to all telemetry
15  resource:
16    attributes:
17      - key: environment
18        value: production
19        action: upsert
20
21exporters:
22  # Azure Monitor for production observability
23  azuremonitor:
24    connection_string: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"
25
26  # Jaeger for local development
27  jaeger:
28    endpoint: "jaeger:14250"
29    tls:
30      insecure: true
31
32  # Prometheus for metrics
33  prometheus:
34    endpoint: "0.0.0.0:8889"
35
36service:
37  pipelines:
38    traces:
39      receivers: [otlp]
40      processors: [batch, resource]
41      exporters: [azuremonitor, jaeger]
42
43    metrics:
44      receivers: [otlp]
45      processors: [batch, resource]
46      exporters: [azuremonitor, prometheus]

Deployment options:

  • Sidecar container in Kubernetes clusters
  • Standalone service in Azure Container Instances for serverless functions
  • Docker container for local development

Sampling Strategies for Production #

Capturing every trace in production is cost-prohibitive due to data volume. Production systems require sampling strategies.

Sampling intelligently selects which traces to retain while maintaining statistical validity for system analysis. Two primary strategies exist:

Head Sampling (Simple) #

Decide at the start of a trace whether to record it, typically based on a percentage:

1builder.Services.AddOpenTelemetry()
2    .WithTracing(tracing => tracing
3        .SetSampler(new TraceIdRatioBasedSampler(0.1))); // Sample 10% of traces

Head sampling is simple to implement but has limitations. It discards 90% of traces regardless of content, including potentially critical traces (errors, slow requests).

Tail Sampling (Smart) #

Make sampling decisions at the end of a trace based on what actually happened. Keep all errors, keep slow requests, sample everything else at a lower rate.

Tail sampling requires the Collector because you need to wait for the entire trace to complete:

 1processors:
 2  tail_sampling:
 3    policies:
 4      # Always sample errors
 5      - name: errors
 6        type: status_code
 7        status_code: {status_codes: [ERROR]}
 8
 9      # Always sample slow requests (>1s)
10      - name: slow-requests
11        type: latency
12        latency: {threshold_ms: 1000}
13
14      # Sample 5% of successful, fast requests
15      - name: baseline
16        type: probabilistic
17        probabilistic: {sampling_percentage: 5}
18
19service:
20  pipelines:
21    traces:
22      receivers: [otlp]
23      processors: [tail_sampling, batch]
24      exporters: [azuremonitor]

This configuration provides complete error traces while controlling costs on successful requests.

Context Propagation: The Hidden Magic #

The most subtle part of distributed tracing is context propagation—ensuring trace IDs flow correctly across service boundaries. OpenTelemetry uses W3C Trace Context headers (traceparent, tracestate) to carry this information automatically for HTTP calls:

1// Service A — OpenTelemetry injects traceparent header automatically
2var response = await httpClient.GetAsync("https://service-b/api/orders");
3
4// Service B — OpenTelemetry extracts it and this span becomes a child of Service A's span
5public async Task<IActionResult> GetOrders() => Ok(await _orderRepository.GetAllAsync());

For anything outside HTTP — message queues (Azure Service Bus, RabbitMQ), background jobs, scheduled tasks — context does not propagate automatically. You need to manually capture Activity.Current?.Id when publishing and restore it as the parent span when consuming. The principle is the same across all of them: carry the traceparent value alongside your payload.

Local Development Setup #

OpenTelemetry enables full observability during local development using the same OTLP exporters as production—only the endpoint URL differs. A docker-compose.yml with three services covers the essentials:

  • Jaeger (port 16686) — trace viewer, accepts OTLP on 4317/4318
  • Prometheus (port 9090) — metrics scraping and querying
  • Grafana (port 3000) — dashboards connecting to both

Point your local OTEL_EXPORTER_OTLP_ENDPOINT at http://localhost:4317 and run docker-compose up. You get production-like trace visibility before a single line ships to Azure — useful for catching race conditions, validating instrumentation, and testing sampling rules early.

Best Practices and Common Pitfalls #

Critical lessons for successful OpenTelemetry implementation in production environments:

Start with the Collector #

Don’t export directly from services to your observability backend. Start with the Collector from day one. It gives you flexibility, testability, and control.

Implement Instrumentation Incrementally #

Attempting comprehensive instrumentation simultaneously leads to complexity and delays. Use a phased approach:

  1. Start with auto-instrumentation for HTTP and database calls
  2. Add custom spans for critical business operations
  3. Add detailed instrumentation to problem areas as they are identified

Incremental instrumentation is more effective than attempting perfect coverage initially.

Strategic Tag Implementation #

Tags added to spans determine available query capabilities. Essential tags include:

  • User IDs (with appropriate privacy considerations)
  • Tenant IDs in multi-tenant systems
  • Active feature flags
  • Deployment version
  • Environment (staging, production)

Comprehensive tagging enables precise filtering: from “what happened” to “what happened for this specific user in this specific situation.”

Logs Still Matter #

OpenTelemetry focuses on traces and metrics, but don’t neglect structured logging. The combination of traces and correlated logs is more powerful than either alone.

Use the trace ID in your log messages:

1logger.LogInformation(
2    "Processing order {OrderId} [TraceId: {TraceId}]",
3    order.Id,
4    Activity.Current?.TraceId);

This correlation enables finding all logs for a given trace, or all traces matching log search criteria.

Cost Management #

Observability costs can escalate rapidly without proper controls. Example: Unmanaged Azure Monitor telemetry can reach $5,000/month or more. Implementing tail sampling and data retention policies can reduce costs to $800/month while maintaining signal quality.

Cost control measures:

  • Implement sampling strategies (see Sampling section)
  • Define data retention policies
  • Monitor observability costs as a metric
  • Regularly review telemetry volume by service
  • Remove unnecessary instrumentation

Azure Functions Performance Considerations #

Azure Functions with OpenTelemetry experience higher cold-start latency (~500ms additional on first request) compared to the legacy Application Insights SDK. This represents the trade-off for vendor neutrality and enhanced telemetry capabilities.

Mitigation strategies for latency-sensitive scenarios:

  • Implement warm-up functions
  • Use provisioned instances
  • Evaluate if vendor neutrality justifies the latency impact

Enable Log Scopes in the Worker Process #

By default, the .NET isolated worker process does not include logging scopes in its telemetry. Scopes provide context like correlation IDs and operation names that flow through the logging pipeline. To enable them explicitly:

1builder.Logging.AddOpenTelemetry(b => b.IncludeScopes = true);

This applies to both Azure Functions and Web APIs and is especially useful when using structured logging with custom context.

Log Filtering: Host vs Worker #

In Azure Functions, log filtering requires understanding where each log originates:

  • Host process logs (runtime triggers, scaling, health) — filtered via host.json logging.logLevel (see Step 1 above)
  • Worker process logs (your function code) — filtered using OpenTelemetry settings in code. host.json filters have no effect on worker logs.

Documentation and Maturity #

Microsoft.Azure.Functions.Worker.OpenTelemetry 1.2.0 (April 2026) is the first stable release, adding baggage propagation and improved resource detection. Some features are still maturing — e.g., resource attributes in Azure Monitor require OTEL_DOTNET_AZURE_MONITOR_ENABLE_RESOURCE_METRICS=true. Check GitHub issues for current status on specific instrumentation libraries before committing to production.

Observability Impact on Development Practices #

Comprehensive observability changes day-to-day engineering. “What’s common among slow requests between 2 PM and 3 PM?” becomes a 30-second trace query instead of a 2-hour log trawl. Post-mortems shift from probable causes to definitive trace evidence. Performance work becomes data-driven rather than assumption-driven.

Key Takeaways #

OpenTelemetry implementation fundamentals for cloud services:

  • Distributed tracing reveals system behavior that logs and metrics alone cannot capture
  • Azure Functions require dual instrumentation of both host and worker processes — telemetryMode: OpenTelemetry in host.json plus UseFunctionsWorkerDefaults() in the worker
  • Web APIs use UseAzureMonitor() for zero-config Application Insights, or UseOtlpExporter() for any OTLP-compliant backend
  • The Collector pattern provides flexibility to route telemetry to multiple backends without code changes
  • Sampling strategies control costs while preserving visibility into errors and performance issues
  • Local observability stack enables debugging during development rather than only in production
  • Observability changes development practices including architecture, debugging, and incident response
  • Comprehensive observability reduces mean time to resolution (MTTR) for production incidents and enables data-driven architecture decisions.

Sources #

This article synthesizes research and practical experience from multiple authoritative sources: