Roadmap

Sayiir is a durable workflow engine. It’s great at what it does: checkpointed execution, parallel tasks, conditional branching, loops, workflow composition, retries, signals, and crash recovery — all with zero infrastructure.

This page is about what’s missing and what we’re building next. We’re honest about gaps because that’s how you decide if Sayiir is right for your use case today, or if you should wait.

What exists today

These features are stable and production-ready across Rust, Python, and Node.js:

Durable checkpointing

Continuation-based recovery. No replay, no determinism constraints.

Fork/join parallelism

Run branches in parallel, merge results with a join task.

Retries & timeouts

Exponential backoff, per-task timeouts, durable retry state.

Conditional branching

Route work based on data with route. Key-based routing with optional defaults.

Signals & events

Wait for external events with optional timeouts. Signals are durably buffered.

Delays

Durable delays that don’t hold workers.

Distributed workers

Multiple workers polling a shared PostgreSQL backend.

Python & Node.js bindings

Thin wrappers around the shared Rust core. Pydantic and Zod integration.

Loops with exit conditions

Iterative workflows with LoopResult (again/done). Max iterations policy, durable checkpointing per iteration.

Workflow composition

Inline child workflows with then_flow / thenFlow. Task registries merge automatically. Build modular pipelines from reusable sub-workflows.

PostgreSQL backend

Production-grade persistence with ACID transactions, claim-based distribution, and snapshot history.

Task execution context

Read-only access to workflow ID, instance ID, task ID, and task metadata from within running tasks. All three bindings.

What’s coming

Workflow Context (Shared Mutable State)

Status: Planned

Today, each task only sees the output of the previous task. Passing context through a fork requires workarounds — each branch must carry context in its output so the join task can reassemble it.

What it enables:

Any task can read/write shared workflow-level state
No more passing context through branches manually
Cleaner data flow in complex workflows

Proposed API:

@task
def search_web(query: dict, ctx: WorkflowContext) -> list[dict]:
    depth = ctx.get("depth", "detailed")
    results = do_search(query["topic"], depth)
    ctx.set("web_results_count", len(results))
    return results

Structured Observability

Status: In progress

Production observability for understanding what’s happening: which tasks are running, how long they take, where failures occur.

~~OpenTelemetry integration (span-per-task)~~ ✅ Shipped — see Observability & Logging
~~Structured logging with correlation IDs~~ ✅ Shipped — instance_id, task_id, worker_id on all spans
Prometheus/OpenMetrics export (task latency, queue depth, worker utilization)

This ships as part of the open-source library, independent of Sayiir Server.

Streaming Support

Status: Exploring

LLM responses benefit from token-by-token streaming. Sayiir tasks are atomic — they run to completion and get checkpointed. Streaming conflicts with this model.

What we’re exploring:

A @streaming_task decorator that streams partial results while still checkpointing the final output
Streaming as an opt-in mode that trades durability guarantees for responsiveness

This is the hardest problem on the roadmap because it touches the core execution model.

Advanced Runtime

Features for high-scale and specialized production use cases.

Queuing Primitives

Concurrency control (max N instances of workflow X)
Rate limiting (max N tasks/second)
Priority queues (urgent workflows first)
Dead letter queue for permanently failed tasks

Eternal Workflows (ContinueAsNew)

Long-running workflows that loop indefinitely (monitoring, polling, recurring processing) without unbounded state growth.

continue_as_new(input) primitive — restart with fresh state
Completed tasks from previous iteration discarded, keeping snapshot size constant
Less critical than in replay-based engines (Sayiir has no growing history), but needed for workflows accumulating results over thousands of iterations

Workflow Versioning

The checkpoint model makes this fundamentally easier than replay-based systems.

Detect definition hash mismatch on resume
Migration strategies: complete-in-place, drain-and-restart, version routing
No replay storms — just resume from last checkpoint

Scheduling / Cron

Cron-style recurring workflow triggers
Backfill support
Timezone-aware scheduling

Edge & Serverless

Cloudflare Workers

Durable Objects as persistence backend
WASM compilation of core runtime
Cold start optimization

SQLite Backend

Single-binary durable execution with zero infrastructure. For CLI tools, edge functions, embedded systems.

SQLite via rusqlite with WAL mode
Expose to all language bindings

Enterprise — Sayiir Server

Commercial offering for teams that need operational tooling on top of the open-source core.

Web dashboard — Workflow visualization, real-time monitoring, manual interventions
Multi-tenancy — Namespace isolation, resource quotas, RBAC
Managed scheduling — Cron triggers, webhook ingestion, recurring workflows
Security — Encryption at rest (AES-256-GCM), mTLS, audit logging, secret management
Kubernetes-native — Helm chart, HPA auto-scaling, rolling upgrades, worker affinity

See the Sayiir Server page for details.

Get involved

The roadmap is shaped by real use cases. If you’re hitting one of these gaps, or have a use case we haven’t considered:

Open an issue with your use case
Join the Discord to discuss design decisions
PRs welcome — especially for observability and workflow context