Serialization & Migration
Choosing a Codec
Section titled “Choosing a Codec”A codec controls how task inputs and outputs are serialized when checkpointed. The codec is set per-backend instance, not per-workflow.
JsonCodec (default)
Section titled “JsonCodec (default)”Human-readable JSON via serde_json. Types must derive serde::Serialize and serde::Deserialize.
use sayiir_runtime::prelude::*;
// JsonCodec is the default — no explicit codec neededlet workflow = workflow! { name: "order_pipeline", registry: registry, steps: [validate, charge, ship]}.unwrap();RkyvCodec
Section titled “RkyvCodec”Zero-copy binary serialization via rkyv. Types must derive rkyv::Archive, rkyv::Serialize, and rkyv::Deserialize.
use sayiir_runtime::serialization::RkyvCodec;
let workflow = workflow! { name: "order_pipeline", codec: RkyvCodec, registry: registry, steps: [validate, charge, ship]}.unwrap();When to Use Which
Section titled “When to Use Which”| Criteria | JsonCodec | RkyvCodec |
|---|---|---|
| Readability | Human-readable, easy to inspect | Binary, requires tooling |
| Performance | Slower, larger payloads | Fast, zero-copy deserialization |
| Ecosystem | Language-agnostic (serde) | Rust-only |
| Debugging | Easy — jq on snapshots | Harder — opaque bytes |
| Best for | Development, mixed-language stacks | High-throughput Rust workloads |
Rule of thumb: Start with JsonCodec (the default). Switch to RkyvCodec when serialization shows up in profiling.
Making Types Serializable
Section titled “Making Types Serializable”Every type that flows through a task (inputs and outputs) must be serializable by the codec you choose.
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]struct OrderInput { order_id: String, amount: f64,
/// New field — old snapshots without it will use the default #[serde(default)] currency: String,}use rkyv::{Archive, Serialize, Deserialize};
#[derive(Archive, Serialize, Deserialize)]struct OrderInput { order_id: String, amount: f64,}use serde::{Serialize as SerdeSerialize, Deserialize as SerdeDeserialize};use rkyv::{Archive, Serialize as RkyvSerialize, Deserialize as RkyvDeserialize};
#[derive( SerdeSerialize, SerdeDeserialize, Archive, RkyvSerialize, RkyvDeserialize,)]struct OrderInput { order_id: String, amount: f64,}For Python, any type that json.dumps / json.loads can handle works (dicts, lists, strings, numbers, booleans, None). For Node.js, anything that survives JSON.stringify / JSON.parse.
Definition Hash — What Breaks Resumability
Section titled “Definition Hash — What Breaks Resumability”When a workflow is built, Sayiir computes a SHA-256 definition hash from the workflow’s structural shape. This hash is stored with every snapshot. On resume, the engine compares the current hash to the snapshot’s hash — if they differ, the workflow cannot resume and a DefinitionMismatch error is raised.
What’s Hashed
Section titled “What’s Hashed”The hash covers the structure of the workflow:
- Task IDs and their sequential order
- Timeout values
- Retry policies (max retries, initial delay, backoff multiplier)
- Task
versionstrings (from metadata) - Fork IDs and branch structure
- Branch IDs, keys, and default handlers
- Loop IDs, max iterations, and on_max policy
- Delay durations and IDs
- Signal names, IDs, and timeouts
What’s NOT Hashed
Section titled “What’s NOT Hashed”- Task implementations (the function body)
- The codec used
- The backend used
- Other task metadata (tags, descriptions, display names)
Breaking vs Non-Breaking Changes
Section titled “Breaking vs Non-Breaking Changes”| Change | Hash Impact | Safe for In-Flight? |
|---|---|---|
| Change task logic (same signature) | No change | Yes |
| Swap backend (InMemory → Postgres) | No change | Yes |
| Switch codec (Json → Rkyv) | No change | Yes* |
| Update task tags or descriptions | No change | Yes |
| Add a new task to the pipeline | Changes | No |
| Remove a task | Changes | No |
| Reorder tasks | Changes | No |
| Change a timeout value | Changes | No |
| Change retry policy | Changes | No |
| Add/remove a fork branch | Changes | No |
| Change loop max_iterations or on_max | Changes | No |
| Rename a task ID | Changes | No |
| Change a signal name | Changes | No |
Bump task version | Changes | No |
* Switching codecs is hash-safe but existing serialized bytes must still be decodable by the new codec. In practice, switching codecs on a live system with in-flight data requires a drain first.
Task Version — Opting In to Schema Change Detection
Section titled “Task Version — Opting In to Schema Change Detection”The definition hash does not cover task input/output types. If you change the shape of a task’s data types, the hash stays the same — and in-flight workflows may fail at runtime when they try to deserialize old cached results with the new type.
To make the engine detect these schema changes, set the version field in your task metadata. The version string is included in the definition hash, so bumping it forces a new workflow version and prevents in-flight workflows from resuming with the old schema.
from sayiir import task, TaskMetadata
@task(metadata=TaskMetadata(version="2.0"))def process_order(order: dict) -> dict: return {"status": "processed", "total": order["total"]}import { task } from "sayiir";
const processOrder = task("process_order", (order) => { return { status: "processed", total: order.total };}, { version: "2.0" });use sayiir_core::task::TaskMetadata;
let workflow = WorkflowBuilder::new(ctx) .then("process_order", |order: Order| async move { Ok(ProcessedOrder { status: "processed".into(), total: order.total }) }) .with_metadata(TaskMetadata { version: Some("2.0".into()), ..Default::default() }) .build()?;When you change a task’s input or output type in a breaking way, bump the version string. The engine will reject resuming in-flight workflows that were started with the old version, and you can drain them before deploying.
The Error
Section titled “The Error”When a hash mismatch occurs:
Workflow definition mismatch: expected hash 'a1b2c3...', found 'd4e5f6...'In Rust this is BuildError::DefinitionMismatch (at build time) or WorkflowError::DefinitionMismatch (at runtime). Python and Node.js raise equivalent exceptions.
Schema Evolution for Task Inputs/Outputs
Section titled “Schema Evolution for Task Inputs/Outputs”Even when the workflow structure doesn’t change (same definition hash), the data types flowing through tasks can evolve. Here’s how to do it safely with JsonCodec / serde.
Adding Fields
Section titled “Adding Fields”Use #[serde(default)] so existing snapshots without the new field deserialize correctly:
#[derive(Serialize, Deserialize)]struct OrderInput { order_id: String, amount: f64, #[serde(default)] priority: Option<String>, // new field — old data deserializes as None}In Python, this happens naturally with dict.get("priority").
Removing Fields
Section titled “Removing Fields”Don’t delete the field while in-flight workflows may still carry it. Instead, stop writing it:
#[derive(Serialize, Deserialize)]struct OrderInput { order_id: String, amount: f64, #[serde(skip_serializing, default)] legacy_field: Option<String>, // still readable, no longer written}Once all in-flight workflows have drained, you can remove the field entirely.
Renaming Fields
Section titled “Renaming Fields”Use #[serde(rename)] or #[serde(alias)] for backward compatibility:
#[derive(Serialize, Deserialize)]struct OrderInput { #[serde(alias = "order_id")] id: String, // renamed from order_id — old snapshots still work amount: f64,}Enum Variants
Section titled “Enum Variants”- Adding a variant is safe — existing data won’t contain it.
- Removing a variant breaks deserialization for in-flight workflows that stored it.
Migration Strategy
Section titled “Migration Strategy”Sayiir separates two concerns when deploying changes:
- Structural changes (adding/removing tasks, changing timeouts, reordering steps) — automatically gated by the definition hash. The engine rejects mismatches; you cannot accidentally resume an old workflow with a new structure.
- Schema changes (modifying the shape of task inputs or outputs) — your responsibility. The definition hash does not cover data types, so the engine won’t catch a schema mismatch until deserialization fails at runtime.
The recommended approach for both is version-pinning with draining: pin running workflow instances to their original definition, let them finish, and deploy the new version for new instances.
Safe Deployments (No Hash Change)
Section titled “Safe Deployments (No Hash Change)”These changes can be deployed at any time, even with in-flight workflows:
- Bug fixes in task logic — the function body isn’t hashed
- Infrastructure swaps — changing backends, scaling workers
- Metadata updates — tags, descriptions
- Backward-compatible schema changes — see Schema Evolution above
Just deploy as usual. In-flight workflows resume normally.
Structural Changes (Hash Changes)
Section titled “Structural Changes (Hash Changes)”When you change the workflow structure (add/remove tasks, change timeouts, reorder steps, etc.), the definition hash changes. In-flight workflows cannot resume with the new definition — the engine enforces this automatically.
Simplest approach: use a new instance ID. If your application controls instance IDs, the easiest migration path is to start new workflow instances with fresh IDs under the new definition. Old instances tied to the old definition will either drain naturally or can be cancelled. This avoids any coordination between old and new workers.
# Old definition — instances "order-100", "order-101" still running# New definition — new instances get "order-102", "order-103", ...# Old instances drain on their own; no conflict.Drain-and-restart approach: If you need the same instance IDs to carry over:
- Stop starting new workflow instances with the old definition
- Wait for in-flight workflows to complete (drain)
- Deploy the new workflow definition
- Start new instances with the new definition
graph LR
A[Stop new submissions] --> B[Drain in-flight]
B --> C[Deploy new definition]
C --> D[Resume traffic]
Breaking Schema Changes
Section titled “Breaking Schema Changes”A schema change is breaking when old serialized data cannot be deserialized into the new type — for example, renaming a field without an alias, changing a field’s type, or removing a required field.
Because the definition hash does not cover data types, the engine won’t prevent you from deploying a breaking schema change. Instead, tasks will fail at runtime with a DeserializationError (Python), CODEC_ERROR (Node.js), or RuntimeError::Codec (Rust) when they try to decode a cached result that no longer matches the new type.
Recommended approach: drain before deploying breaking schema changes.
- Stop starting new instances
- Let in-flight workflows complete (they still use the old schema)
- Deploy the new code with the updated types
- Start new instances — only new data flows through the new schema
For gradual migrations, you can also run old and new workers side by side: old workers continue processing old instances while new workers handle new ones. In distributed mode, workers already skip tasks whose definition hash they don’t recognize, so this works naturally.
Distributed Workers
Section titled “Distributed Workers”When running multiple workers, all workers must agree on the definition hash for workflows they process. During a rolling deployment:
- Workers with the old definition will skip tasks created with the new definition
- Workers with the new definition will skip tasks created with the old definition
This means you can:
- Blue/green deploy — bring up new workers alongside old ones. Old workers drain old instances, new workers handle new instances. Shut down old workers once all old instances complete.
- Drain, then deploy all workers at once — simplest if you can tolerate a brief pause in processing.
Summary
Section titled “Summary”| Scenario | Hash changes? | Automatic protection? | Recommended strategy |
|---|---|---|---|
| Bug fix in task logic | No | N/A (safe) | Deploy normally |
Add #[serde(default)] field | No | N/A (safe) | Deploy normally |
| Add/remove/reorder a task | Yes | Engine rejects mismatch | New instance IDs, or drain first |
| Change timeout or retry policy | Yes | Engine rejects mismatch | New instance IDs, or drain first |
| Rename a field without alias | No | No — fails at runtime | Drain first |
| Change a field’s type | No | No — fails at runtime | Drain first |