Skip to content

Retries & Timeouts

When a task fails or times out, Sayiir automatically retries it with exponential backoff. The retry state—including the current retry count and next retry time—is checkpointed along with the workflow state.

This means that if your process crashes during a backoff period, the retry will resume correctly after recovery, preserving the retry count and backoff timing.

You can configure retry behavior per task using RetryPolicy. The policy specifies:

  • max_retries: Maximum number of retry attempts
  • initial_delay: First backoff duration
  • backoff_multiplier: Multiplier for exponential backoff (e.g., 2.0 doubles the delay each retry)
from sayiir import task, Flow, RetryPolicy, run_durable_workflow
import requests
@task(
timeout_secs=10,
retries=RetryPolicy(
max_retries=2,
initial_delay_secs=1.0,
backoff_multiplier=2.0
)
)
def call_api(url: str) -> dict:
return requests.get(url).json()
@task
def process(data: dict) -> str:
return f"processed {len(data)} items"
workflow = Flow("resilient").then(call_api).then(process).build()
status = run_durable_workflow(workflow, "job-1", "https://api.example.com/data")

With this configuration:

  • First failure: retry after 1 second
  • Second failure: retry after 2 seconds (1.0 × 2.0)
  • Third failure: workflow fails (max_retries=2)

When a task exceeds its configured timeout, Sayiir treats it as a failure and applies the retry policy. The task is cancelled and retried with exponential backoff, just like any other error.

@task(timeout_secs=5, retries=RetryPolicy(max_retries=3, initial_delay_secs=2.0))
def slow_operation(data: str) -> str:
# If this takes >5 seconds, it will be retried
time.sleep(10)
return data

Retry state is fully durable. If your process crashes during a backoff period, the workflow will resume with:

  • The correct retry count preserved
  • The remaining backoff time calculated correctly
  • No duplicate executions

For example, if a task fails and is scheduled to retry in 10 seconds, but the process crashes after 3 seconds, resuming the workflow will wait the remaining 7 seconds before retrying the task.

from sayiir import run_durable_workflow, resume_workflow
# Start workflow with retries
status = run_durable_workflow(workflow, "job-1", "https://flaky-api.com", backend=backend)
# Process crashes during backoff...
# Resume later - retry state is preserved
status = resume_workflow(workflow, "job-1", backend=backend)