Retries & Timeouts
How retries work
Section titled “How retries work”When a task fails or times out, Sayiir automatically retries it with exponential backoff. The retry state—including the current retry count and next retry time—is checkpointed along with the workflow state.
This means that if your process crashes during a backoff period, the retry will resume correctly after recovery, preserving the retry count and backoff timing.
Configuring retries
Section titled “Configuring retries”You can configure retry behavior per task. The simplest approach is passing an integer for the retry count (uses sensible defaults: 1s initial delay, 2× backoff). For full control, use RetryPolicy.
RetryPolicy parameters:
max_retries: Maximum number of retry attemptsinitial_delay_secs: First backoff duration in secondsbackoff_multiplier: Multiplier for exponential backoff (e.g., 2.0 doubles the delay each retry)
Int shorthand
Section titled “Int shorthand”When the defaults (1s initial delay, 2× backoff) are fine, pass an integer:
from sayiir import task, Flow, run_durable_workflow
@task(timeout="10s", retries=2)def call_api(url: str) -> dict: return requests.get(url).json()With this configuration:
- First failure: retry after 1 second
- Second failure: retry after 2 seconds (1.0 × 2.0)
- Third failure: workflow fails (retries exhausted)
Full RetryPolicy
Section titled “Full RetryPolicy”When you need custom backoff timing, use RetryPolicy:
from sayiir import task, RetryPolicy
@task( timeout="10s", retries=RetryPolicy(max_retries=3, initial_delay_secs=0.5, backoff_multiplier=3.0),)def call_flaky_api(url: str) -> dict: return requests.get(url).json()With this configuration:
- First failure: retry after 0.5 seconds
- Second failure: retry after 1.5 seconds (0.5 × 3.0)
- Third failure: retry after 4.5 seconds (1.5 × 3.0)
- Fourth failure: workflow fails (retries exhausted)
import { task, flow, runDurableWorkflow } from "sayiir";
const callApi = task("call_api", async (url: string) => { const res = await fetch(url); return await res.json();}, { timeout: "10s", retry: { maxAttempts: 2, initialDelay: "1s", backoffMultiplier: 2.0, },});
const process = task("process", (data: Record<string, unknown>) => { return `processed ${Object.keys(data).length} items`;});
const workflow = flow<string>("resilient") .then(callApi) .then(process) .build();
const status = runDurableWorkflow(workflow, "job-1", "https://api.example.com/data", backend);With this configuration:
- First failure: retry after 1 second
- Second failure: retry after 2 seconds (1.0 × 2.0)
- Third failure: workflow fails (maxAttempts=2)
Using TaskMetadata
Section titled “Using TaskMetadata”use sayiir_core::task::{TaskMetadata, RetryPolicy};use std::time::Duration;
let workflow = WorkflowBuilder::new(ctx) .with_registry() .then("call_api", |url: String| async move { Ok(reqwest::get(&url).await?.json::<serde_json::Value>().await?) }) .with_metadata(TaskMetadata { timeout: Some(Duration::from_secs(10)), retries: Some(RetryPolicy { max_retries: 2, initial_delay: Duration::from_secs(1), backoff_multiplier: 2.0, }), ..Default::default() }) .then("process", |data: serde_json::Value| async move { Ok(format!("processed {} keys", data.as_object().map_or(0, |o| o.len()))) }) .build()?;Using the macro
Section titled “Using the macro”use sayiir_macros::task;use sayiir_core::BoxError;
#[task(timeout = "10s", retries = 2, backoff = "1s")]async fn call_api(url: String) -> Result<serde_json::Value, BoxError> { Ok(reqwest::get(&url).await?.json().await?)}
#[task]async fn process(data: serde_json::Value) -> Result<String, BoxError> { Ok(format!("processed {} keys", data.as_object().map_or(0, |o| o.len())))}
// Register macro-decorated tasks into the registrylet mut registry = TaskRegistry::new();CallApiTask::register(&mut registry, codec.clone(), CallApiTask::new());ProcessTask::register(&mut registry, codec.clone(), ProcessTask::new());
let workflow = WorkflowBuilder::new(ctx) .with_existing_registry(registry) .then_registered::<serde_json::Value>(CallApiTask::task_id()) .then_registered::<String>(ProcessTask::task_id()) .build()?;Timeout behavior
Section titled “Timeout behavior”When a task exceeds its configured timeout, Sayiir treats it as a failure and applies the retry policy. The task is cancelled and retried with exponential backoff, just like any other error.
@task(timeout="5s", retries=3)def slow_operation(data: str) -> str: # If this takes >5 seconds, it will be retried time.sleep(10) return dataconst slowOperation = task("slow_operation", async (data: string) => { // If this takes >5 seconds, it will be retried await new Promise((r) => setTimeout(r, 10_000)); return data;}, { timeout: "5s", retry: { maxAttempts: 3, initialDelay: "2s" },});#[task(timeout = "5s", retries = 3, backoff = "2s")]async fn slow_operation(data: String) -> Result<String, BoxError> { // If this takes >5 seconds, it will be retried tokio::time::sleep(Duration::from_secs(10)).await; Ok(data)}Durable retry state
Section titled “Durable retry state”Retry state is fully durable. If your process crashes during a backoff period, the workflow will resume with:
- The correct retry count preserved
- The remaining backoff time calculated correctly
- No duplicate executions
For example, if a task fails and is scheduled to retry in 10 seconds, but the process crashes after 3 seconds, resuming the workflow will wait the remaining 7 seconds before retrying the task.
from sayiir import run_durable_workflow, resume_workflow
# Start workflow with retriesstatus = run_durable_workflow(workflow, "job-1", "https://flaky-api.com", backend=backend)
# Process crashes during backoff...
# Resume later - retry state is preservedstatus = resume_workflow(workflow, "job-1", backend=backend)import { runDurableWorkflow, resumeWorkflow } from "sayiir";
// Start workflow with retriesconst status = runDurableWorkflow(workflow, "job-1", "https://flaky-api.com", backend);
// Process crashes during backoff...
// Resume later - retry state is preservedconst status2 = resumeWorkflow(workflow, "job-1", backend);// Start workflow with retrieslet status = runner.run(&workflow, "job-1", "https://flaky-api.com".to_string()).await?;
// Process crashes during backoff...
// Resume later - retry state is preservedlet status = runner.resume(&workflow, "job-1").await?;