Skip to content

Roadmap

Sayiir is a durable workflow engine. It’s great at what it does: checkpointed execution, parallel tasks, conditional branching, loops, workflow composition, retries, signals, and crash recovery — all with zero infrastructure.

This page is about what’s missing and what we’re building next. We’re honest about gaps because that’s how you decide if Sayiir is right for your use case today, or if you should wait.


These features are stable and production-ready across Rust, Python, and Node.js:

Durable checkpointing

Continuation-based recovery. No replay, no determinism constraints.

Fork/join parallelism

Run branches in parallel, merge results with a join task.

Retries & timeouts

Exponential backoff, per-task timeouts, durable retry state.

Conditional branching

Route work based on data with route. Key-based routing with optional defaults.

Signals & events

Wait for external events with optional timeouts. Signals are durably buffered.

Delays

Durable delays that don’t hold workers.

Distributed workers

Multiple workers polling a shared PostgreSQL backend.

Python & Node.js bindings

Thin wrappers around the shared Rust core. Pydantic and Zod integration.

Loops with exit conditions

Iterative workflows with LoopResult (again/done). Max iterations policy, durable checkpointing per iteration.

Workflow composition

Inline child workflows with then_flow / thenFlow. Task registries merge automatically. Build modular pipelines from reusable sub-workflows.

PostgreSQL backend

Production-grade persistence with ACID transactions, claim-based distribution, and snapshot history.

Task execution context

Read-only access to workflow ID, instance ID, task ID, and task metadata from within running tasks. All three bindings.


Status: Planned

Today, each task only sees the output of the previous task. Passing context through a fork requires workarounds — each branch must carry context in its output so the join task can reassemble it.

What it enables:

  • Any task can read/write shared workflow-level state
  • No more passing context through branches manually
  • Cleaner data flow in complex workflows

Proposed API:

@task
def search_web(query: dict, ctx: WorkflowContext) -> list[dict]:
depth = ctx.get("depth", "detailed")
results = do_search(query["topic"], depth)
ctx.set("web_results_count", len(results))
return results

Status: In progress

Production observability for understanding what’s happening: which tasks are running, how long they take, where failures occur.

  • OpenTelemetry integration (span-per-task) ✅ Shipped — see Observability & Logging
  • Structured logging with correlation IDs ✅ Shipped — instance_id, task_id, worker_id on all spans
  • Prometheus/OpenMetrics export (task latency, queue depth, worker utilization)

This ships as part of the open-source library, independent of Sayiir Server.

Status: Exploring

LLM responses benefit from token-by-token streaming. Sayiir tasks are atomic — they run to completion and get checkpointed. Streaming conflicts with this model.

What we’re exploring:

  • A @streaming_task decorator that streams partial results while still checkpointing the final output
  • Streaming as an opt-in mode that trades durability guarantees for responsiveness

This is the hardest problem on the roadmap because it touches the core execution model.


Features for high-scale and specialized production use cases.

  • Concurrency control (max N instances of workflow X)
  • Rate limiting (max N tasks/second)
  • Priority queues (urgent workflows first)
  • Dead letter queue for permanently failed tasks

Long-running workflows that loop indefinitely (monitoring, polling, recurring processing) without unbounded state growth.

  • continue_as_new(input) primitive — restart with fresh state
  • Completed tasks from previous iteration discarded, keeping snapshot size constant
  • Less critical than in replay-based engines (Sayiir has no growing history), but needed for workflows accumulating results over thousands of iterations

The checkpoint model makes this fundamentally easier than replay-based systems.

  • Detect definition hash mismatch on resume
  • Migration strategies: complete-in-place, drain-and-restart, version routing
  • No replay storms — just resume from last checkpoint
  • Cron-style recurring workflow triggers
  • Backfill support
  • Timezone-aware scheduling

  • Durable Objects as persistence backend
  • WASM compilation of core runtime
  • Cold start optimization

Single-binary durable execution with zero infrastructure. For CLI tools, edge functions, embedded systems.

  • SQLite via rusqlite with WAL mode
  • Expose to all language bindings

Commercial offering for teams that need operational tooling on top of the open-source core.

  • Web dashboard — Workflow visualization, real-time monitoring, manual interventions
  • Multi-tenancy — Namespace isolation, resource quotas, RBAC
  • Managed scheduling — Cron triggers, webhook ingestion, recurring workflows
  • Security — Encryption at rest (AES-256-GCM), mTLS, audit logging, secret management
  • Kubernetes-native — Helm chart, HPA auto-scaling, rolling upgrades, worker affinity

See the Sayiir Server page for details.


The roadmap is shaped by real use cases. If you’re hitting one of these gaps, or have a use case we haven’t considered:

  • Open an issue with your use case
  • Join the Discord to discuss design decisions
  • PRs welcome — especially for observability and workflow context