Building Resilient Customer Onboarding with Azure Durable Functions
From state machines to durable orchestrations: building customer onboarding workflows that survive failures, coordinate distributed services, and handle long-running processes with grace
· 17 min read
The Moment State Machines Weren’t Enough #
I was reviewing the customer onboarding system I’d built using finite state machines, proud of how cleanly it handled state transitions and prevented impossible states. Then production taught me a humbling lesson: state management is only half the battle.
During a routine deployment, the application restarted mid-onboarding for hundreds of customers. When the system came back online, I discovered our carefully designed state machine had a critical weakness—it didn’t remember where each customer was in their multi-day onboarding journey. Customers who had submitted documents were stuck waiting. Payment processing had stalled. Approval workflows had vanished into the ether.
The FSM told us what state each customer was in, but it couldn’t orchestrate the workflow across time, failures, and system boundaries. We needed something more—something that could survive restarts, coordinate long-running processes, and handle the inherent unreliability of distributed systems.
That’s when I discovered Azure Durable Functions.
What Traditional State Machines Miss #
In my previous article on finite state machines for customer onboarding, I showed how FSMs prevent impossible states and make business logic explicit. That foundation remains critical—you need clear state definitions and valid transitions.
But production customer onboarding involves challenges that pure state machines don’t address:
Temporal Durability: Customer onboarding takes days or weeks. What happens when your application restarts? With a basic FSM, you know the customer is in “DocumentsPending” state, but you don’t know:
- When the documents were requested
- How many reminders have been sent
- What the next scheduled action is
- What context was being tracked
Cross-Service Orchestration: Real onboarding coordinates multiple systems—document verification services, payment processors, credit bureaus, email providers, CRM systems. Each has different latencies, failure modes, and retry requirements.
Human-in-the-Loop Workflows: Regulatory approval might take 2-3 business days. How do you pause a workflow, wait for human action, and resume exactly where you left off?
Failure Resilience: External APIs fail. Databases timeout. Networks partition. Your workflow needs automatic retry logic, compensation, and graceful degradation.
This is where orchestration patterns shine, and Azure Durable Functions provides the infrastructure to implement them reliably.
Azure Durable Functions: Orchestration That Survives #
Azure Durable Functions extends the serverless model to support stateful workflows through an elegant programming model. Think of it as giving your functions a memory that persists across executions, failures, and even weeks of elapsed time.
The key insight: Durable Functions uses event sourcing at its core. Every orchestration is a stream of events recording what happened. When the orchestrator needs to resume, it replays these events to rebuild its state. This gives you:
- Automatic checkpointing - Your workflow state survives restarts
- Replay-based execution - Deterministic orchestration behavior
- Built-in retry logic - Configurable policies for each activity
- Long-running workflows - Hours, days, or weeks without holding resources
If you’re familiar with event sourcing, this pattern will feel natural—the orchestration history is your event log.
The Architecture of Resilience #
Durable Functions separates concerns into three function types:
1// 1. Client Function - Starts the orchestration
2[FunctionName("OnboardingClient")]
3public static async Task<HttpResponseData> StartOnboarding(
4 [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req,
5 [DurableClient] DurableTaskClient client)
6{
7 var customer = await req.ReadFromJsonAsync<CustomerInfo>();
8
9 // Start the orchestration
10 string instanceId = await client.ScheduleNewOrchestrationInstanceAsync(
11 "CustomerOnboardingOrchestrator",
12 customer);
13
14 // Return status query URLs
15 return client.CreateCheckStatusResponse(req, instanceId);
16}
17
18// 2. Orchestrator Function - Coordinates the workflow (must be deterministic)
19[FunctionName("CustomerOnboardingOrchestrator")]
20public static async Task<OnboardingResult> RunOrchestrator(
21 [OrchestrationTrigger] TaskOrchestrationContext context)
22{
23 var customer = context.GetInput<CustomerInfo>();
24
25 try
26 {
27 // Each activity is automatically checkpointed
28 var documentsVerified = await context.CallActivityAsync<bool>(
29 "VerifyDocuments", customer.DocumentUrls);
30
31 if (!documentsVerified)
32 {
33 await context.CallActivityAsync("RejectApplication",
34 new { customer.Id, Reason = "Document verification failed" });
35 return OnboardingResult.Rejected;
36 }
37
38 // Process payment with retry policy
39 var paymentResult = await context.CallActivityAsync<PaymentResult>(
40 "ProcessPayment", customer.PaymentInfo);
41
42 if (!paymentResult.Success)
43 {
44 return OnboardingResult.Rejected;
45 }
46
47 // Run credit check
48 var creditScore = await context.CallActivityAsync<decimal>(
49 "CheckCreditScore", customer.SSN);
50
51 if (creditScore < 650)
52 {
53 // Low credit score - require manual approval
54 var approved = await context.WaitForExternalEventAsync<bool>(
55 "ApprovalDecision", TimeSpan.FromDays(3));
56
57 if (!approved)
58 {
59 return OnboardingResult.Rejected;
60 }
61 }
62
63 // Activate services
64 await context.CallActivityAsync("ActivateCustomerServices", customer.Id);
65
66 return OnboardingResult.Success;
67 }
68 catch (Exception ex)
69 {
70 // Orchestrations can handle failures gracefully
71 await context.CallActivityAsync("LogFailure",
72 new { customer.Id, Error = ex.Message });
73 throw;
74 }
75}
76
77// 3. Activity Functions - Do the actual work (can have side effects)
78[FunctionName("VerifyDocuments")]
79public static async Task<bool> VerifyDocuments(
80 [ActivityTrigger] List<string> documentUrls,
81 ILogger log)
82{
83 // Call external document verification service
84 var verificationService = new DocumentVerificationService();
85 return await verificationService.VerifyAsync(documentUrls);
86}
This separation of concerns is powerful:
- Client functions start orchestrations and query status
- Orchestrator functions define workflow logic (must be deterministic)
- Activity functions perform actual work (can have side effects)
The Critical Determinism Requirement #
Here’s something that caught me off guard initially: orchestrator functions must be deterministic. This means:
1// ❌ DON'T DO THIS - Non-deterministic!
2public static async Task BadOrchestrator(TaskOrchestrationContext context)
3{
4 // Problem: DateTime.UtcNow returns different values on replay
5 var deadline = DateTime.UtcNow.AddDays(7);
6
7 // Problem: Guid.NewGuid() generates different IDs on replay
8 var correlationId = Guid.NewGuid();
9
10 // Problem: Random values break determinism
11 var random = new Random();
12 if (random.Next(100) > 50)
13 {
14 // This code path will be unpredictable on replay
15 }
16
17 // Problem: Direct HTTP calls bypass orchestration tracking
18 var httpClient = new HttpClient();
19 await httpClient.GetAsync("https://api.example.com");
20}
21
22// ✅ DO THIS - Fully deterministic
23public static async Task GoodOrchestrator(TaskOrchestrationContext context)
24{
25 // Use context-provided time
26 var deadline = context.CurrentUtcDateTime.AddDays(7);
27
28 // Use context-provided GUID generation
29 var correlationId = context.NewGuid();
30
31 // Move HTTP calls to activity functions
32 var apiData = await context.CallActivityAsync<string>("CallExternalApi", null);
33
34 // Use activities for non-deterministic operations
35 var creditScore = await context.CallActivityAsync<decimal>(
36 "CheckCreditScore", customer.SSN);
37}
Why does this matter? During replay, the orchestrator re-executes from the beginning, using the history to return previously computed values instantly. If you use DateTime.UtcNow, the orchestrator might make different decisions on replay versus the original execution, breaking the workflow’s consistency.
I learned this the hard way when a production orchestration made different credit decisions on replay because I’d used DateTime.UtcNow to calculate a customer’s age. The replay happened on their birthday, changing their age by one year, which shifted them into a different risk category. Debugging that was… enlightening.
The Fan-Out/Fan-In Pattern: Parallel Verification #
Customer onboarding often requires multiple independent checks: identity verification, address validation, credit check, sanctions screening, fraud detection. Running these sequentially wastes time.
The fan-out/fan-in pattern executes multiple activities in parallel and aggregates results:
1[FunctionName("ParallelVerificationOrchestrator")]
2public static async Task<VerificationResult> ParallelVerification(
3 [OrchestrationTrigger] TaskOrchestrationContext context)
4{
5 var customer = context.GetInput<CustomerInfo>();
6
7 // Fan-out: Start all verifications in parallel
8 var tasks = new List<Task<bool>>
9 {
10 context.CallActivityAsync<bool>("VerifyIdentity", customer.IdDocument),
11 context.CallActivityAsync<bool>("ValidateAddress", customer.Address),
12 context.CallActivityAsync<bool>("CheckCredit", customer.SSN),
13 context.CallActivityAsync<bool>("ScreenSanctions", customer.Name),
14 context.CallActivityAsync<bool>("AssessFraud", customer)
15 };
16
17 // Fan-in: Wait for all to complete
18 var results = await Task.WhenAll(tasks);
19
20 // Aggregate results
21 if (results.All(r => r))
22 {
23 return VerificationResult.Approved;
24 }
25
26 // Log which checks failed
27 var failedChecks = new List<string>();
28 if (!results[0]) failedChecks.Add("Identity");
29 if (!results[1]) failedChecks.Add("Address");
30 if (!results[2]) failedChecks.Add("Credit");
31 if (!results[3]) failedChecks.Add("Sanctions");
32 if (!results[4]) failedChecks.Add("Fraud");
33
34 await context.CallActivityAsync("LogFailedChecks",
35 new { customer.Id, FailedChecks = failedChecks });
36
37 return VerificationResult.Rejected;
38}
In production, this pattern reduced our verification time from 45-60 seconds (sequential) to 12-15 seconds (parallel). For customers, that’s the difference between frustration and delight.
The beauty of this pattern is that Durable Functions handles all the complexity of tracking parallel executions, collecting results, and ensuring the orchestrator only resumes when all activities complete.
Human-in-the-Loop: The Approval Pattern #
Regulatory compliance often requires human approval for high-risk customers. Durable Functions handles this elegantly with external events:
1[FunctionName("OnboardingWithApprovalOrchestrator")]
2public static async Task<OnboardingResult> OnboardingWithApproval(
3 [OrchestrationTrigger] TaskOrchestrationContext context)
4{
5 var customer = context.GetInput<CustomerInfo>();
6
7 // Automated checks
8 var verificationPassed = await context.CallActivityAsync<bool>(
9 "VerifyDocuments", customer);
10
11 if (!verificationPassed)
12 {
13 return OnboardingResult.Rejected;
14 }
15
16 // Credit score check
17 var creditScore = await context.CallActivityAsync<decimal>(
18 "CheckCreditScore", customer.SSN);
19
20 if (creditScore < 600)
21 {
22 // Low credit score - notify compliance team
23 await context.CallActivityAsync("NotifyComplianceTeam",
24 new { customer.Id, CreditScore = creditScore });
25
26 // Wait for human decision (with timeout)
27 using var cts = new CancellationTokenSource();
28
29 var approvalTask = context.WaitForExternalEventAsync<ApprovalDecision>(
30 "ApprovalDecision");
31 var timeoutTask = context.CreateTimer(
32 context.CurrentUtcDateTime.AddDays(3), cts.Token);
33
34 var completedTask = await Task.WhenAny(approvalTask, timeoutTask);
35
36 if (completedTask == approvalTask)
37 {
38 cts.Cancel(); // Cancel the timer
39 var decision = await approvalTask;
40
41 if (!decision.Approved)
42 {
43 await context.CallActivityAsync("SendRejectionEmail", customer.Id);
44 return OnboardingResult.Rejected;
45 }
46 }
47 else
48 {
49 // Timeout - escalate
50 await context.CallActivityAsync("EscalateToSeniorCompliance", customer.Id);
51
52 // Wait indefinitely for escalated approval
53 var escalatedDecision = await context.WaitForExternalEventAsync<ApprovalDecision>(
54 "EscalatedApproval");
55
56 if (!escalatedDecision.Approved)
57 {
58 return OnboardingResult.Rejected;
59 }
60 }
61 }
62
63 // Continue with activation
64 await context.CallActivityAsync("ActivateServices", customer.Id);
65 return OnboardingResult.Success;
66}
67
68// API endpoint for compliance team to approve/reject
69[FunctionName("ApproveOnboarding")]
70public static async Task<HttpResponseData> ApproveOnboarding(
71 [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req,
72 [DurableClient] DurableTaskClient client)
73{
74 var approval = await req.ReadFromJsonAsync<ApprovalRequest>();
75
76 // Send external event to waiting orchestration
77 await client.RaiseEventAsync(
78 approval.InstanceId,
79 "ApprovalDecision",
80 new ApprovalDecision
81 {
82 Approved = approval.Approved,
83 ReviewedBy = approval.ReviewerId,
84 Comments = approval.Comments
85 });
86
87 var response = req.CreateResponse(HttpStatusCode.OK);
88 await response.WriteAsJsonAsync(new { Status = "Decision recorded" });
89 return response;
90}
The orchestration pauses for days if needed, consuming zero compute resources while waiting. When the compliance officer clicks “Approve” or “Reject” in the admin portal, the workflow instantly resumes from where it left off.
This pattern transformed our compliance workflow. Before Durable Functions, we used polling—checking every few hours if a decision had been made. Now, the decision instantly triggers the next step, improving both response time and resource efficiency.
Error Handling and Retry Strategies #
Production systems fail. APIs timeout. Networks partition. Durable Functions provides sophisticated retry capabilities:
1[FunctionName("ResilientOnboardingOrchestrator")]
2public static async Task<OnboardingResult> ResilientOnboarding(
3 [OrchestrationTrigger] TaskOrchestrationContext context)
4{
5 var customer = context.GetInput<CustomerInfo>();
6
7 var retryOptions = new RetryPolicy(
8 maxNumberOfAttempts: 3,
9 firstRetryInterval: TimeSpan.FromSeconds(5),
10 backoffCoefficient: 2.0, // Exponential backoff
11 maxRetryInterval: TimeSpan.FromMinutes(5),
12 retryTimeout: TimeSpan.FromMinutes(30));
13
14 try
15 {
16 // Credit check with automatic retry
17 var creditCheck = await context.CallActivityAsync<CreditResult>(
18 "CheckCreditScore",
19 new TaskOptions(retryOptions),
20 customer.SSN);
21
22 // Payment processing with different retry strategy
23 var paymentRetry = new RetryPolicy(
24 maxNumberOfAttempts: 5, // More retries for critical payment
25 firstRetryInterval: TimeSpan.FromSeconds(10));
26
27 var payment = await context.CallActivityAsync<PaymentResult>(
28 "ProcessPayment",
29 new TaskOptions(paymentRetry),
30 customer.PaymentInfo);
31
32 if (!payment.Success)
33 {
34 // Implement compensating transaction
35 await context.CallActivityAsync("RefundPayment", payment.TransactionId);
36 throw new PaymentFailedException("Payment processing failed after retries");
37 }
38
39 return OnboardingResult.Success;
40 }
41 catch (TaskFailedException ex) when (ex.InnerException is PaymentFailedException)
42 {
43 // Handle payment failure specifically
44 await context.CallActivityAsync("NotifyPaymentFailure", customer.Id);
45 return OnboardingResult.PaymentFailed;
46 }
47 catch (Exception ex)
48 {
49 // Log and handle unexpected failures
50 await context.CallActivityAsync("LogCriticalError",
51 new { customer.Id, Error = ex.ToString() });
52
53 // Notify operations team
54 await context.CallActivityAsync("AlertOpsTeam",
55 new { customer.Id, Severity = "Critical" });
56
57 throw; // Orchestration will be marked as failed
58 }
59}
Circuit Breaker Pattern for External Services #
When an external service becomes unhealthy, continuing to call it wastes time and resources. Implement circuit breakers:
1public class CircuitBreakerActivity
2{
3 private readonly CircuitBreakerPolicy _circuitBreaker;
4
5 public CircuitBreakerActivity()
6 {
7 _circuitBreaker = Policy
8 .Handle<HttpRequestException>()
9 .CircuitBreakerAsync(
10 exceptionsAllowedBeforeBreaking: 5,
11 durationOfBreak: TimeSpan.FromMinutes(2),
12 onBreak: (exception, duration) =>
13 {
14 // Log circuit breaker opened
15 Console.WriteLine($"Circuit breaker opened for {duration}");
16 },
17 onReset: () =>
18 {
19 // Log circuit breaker reset
20 Console.WriteLine("Circuit breaker reset");
21 });
22 }
23
24 [FunctionName("CheckSanctionsList")]
25 public async Task<bool> CheckSanctions(
26 [ActivityTrigger] string customerName,
27 ILogger log)
28 {
29 try
30 {
31 return await _circuitBreaker.ExecuteAsync(async () =>
32 {
33 var client = new HttpClient();
34 var response = await client.GetAsync(
35 $"https://sanctions-api.example.com/check?name={customerName}");
36
37 response.EnsureSuccessStatusCode();
38
39 var result = await response.Content.ReadFromJsonAsync<SanctionsResult>();
40 return !result.IsListed;
41 });
42 }
43 catch (BrokenCircuitException)
44 {
45 log.LogWarning("Sanctions API circuit breaker is open, using fallback");
46
47 // Fallback strategy: escalate for manual review
48 return false; // Treat as potential match to trigger manual review
49 }
50 }
51}
This pattern saved us during a third-party sanctions API outage. Instead of overwhelming the failing service with retries, the circuit breaker opened after 5 failures, automatically routing all subsequent checks to manual review. When the API recovered, the circuit breaker closed, and normal automated processing resumed.
Testing Durable Orchestrations #
Testing orchestrators requires special considerations due to their asynchronous, stateful nature:
1public class OnboardingOrchestratorTests
2{
3 [Fact]
4 public async Task Should_Approve_Low_Risk_Customer()
5 {
6 // Arrange
7 var context = new MockOrchestrationContext();
8 var customer = new CustomerInfo
9 {
10 Id = Guid.NewGuid(),
11 CreditScore = 750,
12 AnnualIncome = 85000,
13 EmploymentStatus = "Employed"
14 };
15
16 context.SetInput(customer);
17
18 // Mock activity results
19 context.MockCallActivityAsync<bool>("VerifyDocuments", true);
20 context.MockCallActivityAsync<PaymentResult>("ProcessPayment",
21 new PaymentResult { Success = true });
22 context.MockCallActivityAsync<decimal>("CheckCreditScore", 750);
23
24 // Act
25 var result = await CustomerOnboardingOrchestrator.RunOrchestrator(context);
26
27 // Assert
28 Assert.Equal(OnboardingResult.Success, result);
29 context.VerifyActivityCalled("ActivateCustomerServices", Times.Once);
30 }
31
32 [Fact]
33 public async Task Should_Require_Approval_For_Low_Credit_Score()
34 {
35 // Arrange
36 var context = new MockOrchestrationContext();
37 var customer = new CustomerInfo { Id = Guid.NewGuid() };
38
39 context.SetInput(customer);
40 context.MockCallActivityAsync<bool>("VerifyDocuments", true);
41 context.MockCallActivityAsync<PaymentResult>("ProcessPayment",
42 new PaymentResult { Success = true });
43 context.MockCallActivityAsync<decimal>("CheckCreditScore", 550); // Low credit
44
45 // Mock approval event
46 context.MockWaitForExternalEventAsync<bool>("ApprovalDecision", true);
47
48 // Act
49 var result = await CustomerOnboardingOrchestrator.RunOrchestrator(context);
50
51 // Assert
52 Assert.Equal(OnboardingResult.Success, result);
53 context.VerifyActivityCalled("NotifyComplianceTeam", Times.Once);
54 context.VerifyExternalEventAwaited("ApprovalDecision", Times.Once);
55 }
56
57 [Fact]
58 public async Task Should_Handle_Payment_Failure_With_Retry()
59 {
60 // Integration test with actual Durable Functions test host
61 using var host = new HostBuilder()
62 .ConfigureWebJobs(builder => builder.AddDurableTask())
63 .Build();
64
65 await host.StartAsync();
66
67 var client = host.Services.GetRequiredService<IDurableClient>();
68
69 var customer = new CustomerInfo { Id = Guid.NewGuid() };
70
71 // Start orchestration
72 var instanceId = await client.StartNewAsync(
73 "CustomerOnboardingOrchestrator", customer);
74
75 // Wait for completion
76 var result = await client.WaitForCompletionAsync(
77 instanceId, TimeSpan.FromSeconds(30));
78
79 Assert.Equal(OrchestrationRuntimeStatus.Completed, result.RuntimeStatus);
80 }
81}
Unit testing orchestrators is straightforward with mocking frameworks. For integration tests, the Durable Functions SDK provides a test host that runs orchestrations in-memory, enabling fast, reliable tests without deploying to Azure.
Production Lessons and Best Practices #
After running Durable Functions in production for customer onboarding, here are the hard-won lessons:
1. Always Implement Idempotency #
Activity functions may be retried, so they must be idempotent:
1[FunctionName("SendWelcomeEmail")]
2public static async Task SendWelcomeEmail(
3 [ActivityTrigger] Guid customerId,
4 ILogger log)
5{
6 // Check if email already sent (idempotency)
7 var emailSent = await _emailRepository.HasWelcomeEmailBeenSentAsync(customerId);
8
9 if (emailSent)
10 {
11 log.LogInformation($"Welcome email already sent to {customerId}, skipping");
12 return;
13 }
14
15 // Send email
16 await _emailService.SendAsync(customerId, "WelcomeTemplate");
17
18 // Record that email was sent
19 await _emailRepository.RecordEmailSentAsync(customerId, "Welcome");
20}
I learned this lesson when a payment processing activity retried during a transient network issue, charging a customer twice. We had to refund the duplicate charge and implement idempotency checks across all financial activities. That was an expensive lesson.
2. Set Appropriate Timeouts #
Long-running external calls need timeouts to prevent workflows from hanging:
1public static async Task<OnboardingResult> RobustOrchestrator(
2 TaskOrchestrationContext context)
3{
4 var customer = context.GetInput<CustomerInfo>();
5
6 try
7 {
8 // Set timeout for slow external service
9 using var cts = new CancellationTokenSource();
10 var activityTask = context.CallActivityAsync<bool>(
11 "SlowExternalVerification", customer);
12 var timeoutTask = context.CreateTimer(
13 context.CurrentUtcDateTime.AddMinutes(5), cts.Token);
14
15 var completedTask = await Task.WhenAny(activityTask, timeoutTask);
16
17 if (completedTask == timeoutTask)
18 {
19 // Timeout occurred
20 await context.CallActivityAsync("LogTimeout", customer.Id);
21 return OnboardingResult.TimedOut;
22 }
23
24 cts.Cancel(); // Cancel timer if activity completed first
25 return await activityTask ? OnboardingResult.Success : OnboardingResult.Failed;
26 }
27 catch (Exception ex)
28 {
29 // Handle errors
30 return OnboardingResult.Error;
31 }
32}
3. Monitor Orchestration Performance #
Observability is critical for production orchestrations:
1[FunctionName("MonitoredOnboardingOrchestrator")]
2public static async Task<OnboardingResult> MonitoredOrchestrator(
3 [OrchestrationTrigger] TaskOrchestrationContext context,
4 ILogger log)
5{
6 var customer = context.GetInput<CustomerInfo>();
7 var startTime = context.CurrentUtcDateTime;
8
9 try
10 {
11 var result = await ProcessOnboardingAsync(context, customer);
12
13 var duration = context.CurrentUtcDateTime - startTime;
14
15 // Log performance metrics (only on non-replay)
16 if (!context.IsReplaying)
17 {
18 log.LogMetric("OnboardingDuration", duration.TotalSeconds,
19 new Dictionary<string, object>
20 {
21 { "CustomerId", customer.Id },
22 { "Result", result.ToString() }
23 });
24 }
25
26 return result;
27 }
28 catch (Exception ex)
29 {
30 if (!context.IsReplaying)
31 {
32 log.LogError(ex, $"Onboarding failed for customer {customer.Id}");
33 }
34 throw;
35 }
36}
4. Implement Comprehensive Logging for Debugging #
Orchestrators replay frequently, so logging needs care:
1[FunctionName("WellLoggedOrchestrator")]
2public static async Task<OnboardingResult> WellLoggedOrchestrator(
3 [OrchestrationTrigger] TaskOrchestrationContext context,
4 ILogger log)
5{
6 var customer = context.GetInput<CustomerInfo>();
7
8 // Only log during actual execution, not replay
9 if (!context.IsReplaying)
10 {
11 log.LogInformation($"Starting onboarding for customer {customer.Id}");
12 }
13
14 var docResult = await context.CallActivityAsync<bool>("VerifyDocuments", customer);
15
16 if (!context.IsReplaying)
17 {
18 log.LogInformation($"Document verification for {customer.Id}: {docResult}");
19 }
20
21 if (!docResult)
22 {
23 if (!context.IsReplaying)
24 {
25 log.LogWarning($"Rejecting customer {customer.Id} due to document verification failure");
26 }
27 return OnboardingResult.Rejected;
28 }
29
30 // Continue workflow...
31
32 return OnboardingResult.Success;
33}
The IsReplaying check prevents log spam. Without it, an orchestration with 10 activities might log the same “Starting onboarding” message 11 times as it replays after each activity completion.
Combining FSM + Durable Functions: The Hybrid Approach #
The most powerful approach combines both patterns:
- Finite State Machines define valid states and transitions
- Durable Functions orchestrate long-running workflows with resilience
1// Combining FSM with Durable Functions
2[FunctionName("HybridOnboardingOrchestrator")]
3public static async Task<OnboardingResult> HybridOnboarding(
4 [OrchestrationTrigger] TaskOrchestrationContext context)
5{
6 var customer = context.GetInput<CustomerInfo>();
7
8 // Initialize state machine (stored in orchestration state)
9 var fsm = new OnboardingStateMachine();
10
11 // The orchestrator respects FSM transitions
12 fsm.Fire(OnboardingTrigger.SubmitDocuments);
13
14 var docsVerified = await context.CallActivityAsync<bool>(
15 "VerifyDocuments", customer);
16
17 if (!docsVerified)
18 {
19 fsm.Fire(OnboardingTrigger.RejectApplication);
20 return OnboardingResult.Rejected;
21 }
22
23 fsm.Fire(OnboardingTrigger.VerifyIdentity);
24 fsm.Fire(OnboardingTrigger.RequirePayment);
25
26 var paymentResult = await context.CallActivityAsync<PaymentResult>(
27 "ProcessPayment", customer);
28
29 if (paymentResult.Success)
30 {
31 fsm.Fire(OnboardingTrigger.PaymentSuccessful);
32 }
33 else
34 {
35 fsm.Fire(OnboardingTrigger.PaymentFailed);
36
37 // FSM enforces valid retry logic
38 if (fsm.CanFire(OnboardingTrigger.RetryPayment))
39 {
40 await context.CreateTimer(context.CurrentUtcDateTime.AddHours(1), CancellationToken.None);
41 // Retry logic...
42 }
43 else
44 {
45 // Max retries reached
46 return OnboardingResult.PaymentFailed;
47 }
48 }
49
50 // Credit check
51 var creditScore = await context.CallActivityAsync<decimal>("CheckCreditScore", customer.SSN);
52
53 if (creditScore < 600)
54 {
55 fsm.Fire(OnboardingTrigger.RequireApproval);
56 var approved = await context.WaitForExternalEventAsync<bool>("Approval", TimeSpan.FromDays(3));
57
58 if (!approved)
59 {
60 fsm.Fire(OnboardingTrigger.RejectApplication);
61 return OnboardingResult.Rejected;
62 }
63
64 fsm.Fire(OnboardingTrigger.GrantApproval);
65 }
66
67 fsm.Fire(OnboardingTrigger.ActivateService);
68
69 await context.CallActivityAsync("ActivateServices", customer.Id);
70
71 // Log final state for audit
72 await context.CallActivityAsync("LogFinalState",
73 new { customer.Id, FinalState = fsm.CurrentState });
74
75 return OnboardingResult.Success;
76}
This hybrid approach gives you the best of both worlds:
- State machine validation ensures only valid transitions
- Durable orchestration handles timing, retries, and persistence
- Clear audit trail from state machine history
- Resilient execution from Durable Functions infrastructure
Beyond Customer Onboarding #
The patterns explored here apply to any long-running, stateful workflow:
- Order fulfillment workflows - Multi-warehouse inventory, shipment tracking, delivery confirmation
- Claims processing - Document collection, fraud detection, adjudication, payment
- Content moderation pipelines - Automated screening, human review, appeal processing
- Data pipeline orchestration - Extract, transform, validate, load with quality checks
- Multi-stage approval workflows - Budget approvals, hiring processes, procurement
Durable Functions transforms workflows that previously required complex state management infrastructure and custom retry logic into clean, maintainable code that’s resilient by default.