Serialization & Migration

Choosing a Codec

A codec controls how task inputs and outputs are serialized when checkpointed. The codec is set per-backend instance, not per-workflow.

JsonCodec (default)

Human-readable JSON via serde_json. Types must derive serde::Serialize and serde::Deserialize.

use sayiir_runtime::prelude::*;

// JsonCodec is the default — no explicit codec needed
let workflow = workflow! {
    name: "order_pipeline",
    registry: registry,
    steps: [validate, charge, ship]
}.unwrap();

RkyvCodec

Zero-copy binary serialization via rkyv. Types must derive rkyv::Archive, rkyv::Serialize, and rkyv::Deserialize.

use sayiir_runtime::serialization::RkyvCodec;

let workflow = workflow! {
    name: "order_pipeline",
    codec: RkyvCodec,
    registry: registry,
    steps: [validate, charge, ship]
}.unwrap();

When to Use Which

Criteria	JsonCodec	RkyvCodec
Readability	Human-readable, easy to inspect	Binary, requires tooling
Performance	Slower, larger payloads	Fast, zero-copy deserialization
Ecosystem	Language-agnostic (serde)	Rust-only
Debugging	Easy — `jq` on snapshots	Harder — opaque bytes
Best for	Development, mixed-language stacks	High-throughput Rust workloads

Rule of thumb: Start with JsonCodec (the default). Switch to RkyvCodec when serialization shows up in profiling.

Making Types Serializable

Every type that flows through a task (inputs and outputs) must be serializable by the codec you choose.

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct OrderInput {
    order_id: String,
    amount: f64,

    /// New field — old snapshots without it will use the default
    #[serde(default)]
    currency: String,
}

use rkyv::{Archive, Serialize, Deserialize};

#[derive(Archive, Serialize, Deserialize)]
struct OrderInput {
    order_id: String,
    amount: f64,
}

use serde::{Serialize as SerdeSerialize, Deserialize as SerdeDeserialize};
use rkyv::{Archive, Serialize as RkyvSerialize, Deserialize as RkyvDeserialize};

#[derive(
    SerdeSerialize, SerdeDeserialize,
    Archive, RkyvSerialize, RkyvDeserialize,
)]
struct OrderInput {
    order_id: String,
    amount: f64,
}

For Python, any type that json.dumps / json.loads can handle works (dicts, lists, strings, numbers, booleans, None). For Node.js, anything that survives JSON.stringify / JSON.parse.

Definition Hash — What Breaks Resumability

When a workflow is built, Sayiir computes a SHA-256 definition hash from the workflow’s structural shape. This hash is stored with every snapshot. On resume, the engine compares the current hash to the snapshot’s hash — if they differ, the workflow cannot resume and a DefinitionMismatch error is raised.

What’s Hashed

The hash covers the structure of the workflow:

Task IDs and their sequential order
Timeout values
Retry policies (max retries, initial delay, backoff multiplier)
Task version strings (from metadata)
Fork IDs and branch structure
Branch IDs, keys, and default handlers
Loop IDs, max iterations, and on_max policy
Delay durations and IDs
Signal names, IDs, and timeouts

What’s NOT Hashed

Task implementations (the function body)
The codec used
The backend used
Other task metadata (tags, descriptions, display names)

Breaking vs Non-Breaking Changes

Change	Hash Impact	Safe for In-Flight?
Change task logic (same signature)	No change	Yes
Swap backend (InMemory → Postgres)	No change	Yes
Switch codec (Json → Rkyv)	No change	Yes*
Update task tags or descriptions	No change	Yes
Add a new task to the pipeline	Changes	No
Remove a task	Changes	No
Reorder tasks	Changes	No
Change a timeout value	Changes	No
Change retry policy	Changes	No
Add/remove a fork branch	Changes	No
Change loop max_iterations or on_max	Changes	No
Rename a task ID	Changes	No
Change a signal name	Changes	No
Bump task `version`	Changes	No

* Switching codecs is hash-safe but existing serialized bytes must still be decodable by the new codec. In practice, switching codecs on a live system with in-flight data requires a drain first.

Task Version — Opting In to Schema Change Detection

The definition hash does not cover task input/output types. If you change the shape of a task’s data types, the hash stays the same — and in-flight workflows may fail at runtime when they try to deserialize old cached results with the new type.

To make the engine detect these schema changes, set the version field in your task metadata. The version string is included in the definition hash, so bumping it forces a new workflow version and prevents in-flight workflows from resuming with the old schema.

from sayiir import task, TaskMetadata

@task(metadata=TaskMetadata(version="2.0"))
def process_order(order: dict) -> dict:
    return {"status": "processed", "total": order["total"]}

import { task } from "sayiir";

const processOrder = task("process_order", (order) => {
    return { status: "processed", total: order.total };
}, { version: "2.0" });

use sayiir_core::task::TaskMetadata;

let workflow = WorkflowBuilder::new(ctx)
    .then("process_order", |order: Order| async move {
        Ok(ProcessedOrder { status: "processed".into(), total: order.total })
    })
    .with_metadata(TaskMetadata {
        version: Some("2.0".into()),
        ..Default::default()
    })
    .build()?;

When you change a task’s input or output type in a breaking way, bump the version string. The engine will reject resuming in-flight workflows that were started with the old version, and you can drain them before deploying.

The Error

When a hash mismatch occurs:

Workflow definition mismatch: expected hash 'a1b2c3...', found 'd4e5f6...'

In Rust this is BuildError::DefinitionMismatch (at build time) or WorkflowError::DefinitionMismatch (at runtime). Python and Node.js raise equivalent exceptions.

Schema Evolution for Task Inputs/Outputs

Even when the workflow structure doesn’t change (same definition hash), the data types flowing through tasks can evolve. Here’s how to do it safely with JsonCodec / serde.

Adding Fields

Use #[serde(default)] so existing snapshots without the new field deserialize correctly:

#[derive(Serialize, Deserialize)]
struct OrderInput {
    order_id: String,
    amount: f64,
    #[serde(default)]
    priority: Option<String>,  // new field — old data deserializes as None
}

In Python, this happens naturally with dict.get("priority").

Removing Fields

Don’t delete the field while in-flight workflows may still carry it. Instead, stop writing it:

#[derive(Serialize, Deserialize)]
struct OrderInput {
    order_id: String,
    amount: f64,
    #[serde(skip_serializing, default)]
    legacy_field: Option<String>,  // still readable, no longer written
}

Once all in-flight workflows have drained, you can remove the field entirely.

Renaming Fields

Use #[serde(rename)] or #[serde(alias)] for backward compatibility:

#[derive(Serialize, Deserialize)]
struct OrderInput {
    #[serde(alias = "order_id")]
    id: String,  // renamed from order_id — old snapshots still work
    amount: f64,
}

Enum Variants

Adding a variant is safe — existing data won’t contain it.
Removing a variant breaks deserialization for in-flight workflows that stored it.

Migration Strategy

Sayiir separates two concerns when deploying changes:

Structural changes (adding/removing tasks, changing timeouts, reordering steps) — automatically gated by the definition hash. The engine rejects mismatches; you cannot accidentally resume an old workflow with a new structure.
Schema changes (modifying the shape of task inputs or outputs) — your responsibility. The definition hash does not cover data types, so the engine won’t catch a schema mismatch until deserialization fails at runtime.

The recommended approach for both is version-pinning with draining: pin running workflow instances to their original definition, let them finish, and deploy the new version for new instances.

Safe Deployments (No Hash Change)

These changes can be deployed at any time, even with in-flight workflows:

Bug fixes in task logic — the function body isn’t hashed
Infrastructure swaps — changing backends, scaling workers
Metadata updates — tags, descriptions
Backward-compatible schema changes — see Schema Evolution above

Just deploy as usual. In-flight workflows resume normally.

Structural Changes (Hash Changes)

When you change the workflow structure (add/remove tasks, change timeouts, reorder steps, etc.), the definition hash changes. In-flight workflows cannot resume with the new definition — the engine enforces this automatically.

Simplest approach: use a new instance ID. If your application controls instance IDs, the easiest migration path is to start new workflow instances with fresh IDs under the new definition. Old instances tied to the old definition will either drain naturally or can be cancelled. This avoids any coordination between old and new workers.

# Old definition — instances "order-100", "order-101" still running
# New definition — new instances get "order-102", "order-103", ...
# Old instances drain on their own; no conflict.

Drain-and-restart approach: If you need the same instance IDs to carry over:

Stop starting new workflow instances with the old definition
Wait for in-flight workflows to complete (drain)
Deploy the new workflow definition
Start new instances with the new definition

graph LR
    A[Stop new submissions] --> B[Drain in-flight]
    B --> C[Deploy new definition]
    C --> D[Resume traffic]

Breaking Schema Changes

A schema change is breaking when old serialized data cannot be deserialized into the new type — for example, renaming a field without an alias, changing a field’s type, or removing a required field.

Because the definition hash does not cover data types, the engine won’t prevent you from deploying a breaking schema change. Instead, tasks will fail at runtime with a DeserializationError (Python), CODEC_ERROR (Node.js), or RuntimeError::Codec (Rust) when they try to decode a cached result that no longer matches the new type.

Recommended approach: drain before deploying breaking schema changes.

Stop starting new instances
Let in-flight workflows complete (they still use the old schema)
Deploy the new code with the updated types
Start new instances — only new data flows through the new schema

For gradual migrations, you can also run old and new workers side by side: old workers continue processing old instances while new workers handle new ones. In distributed mode, workers already skip tasks whose definition hash they don’t recognize, so this works naturally.

Distributed Workers

When running multiple workers, all workers must agree on the definition hash for workflows they process. During a rolling deployment:

Workers with the old definition will skip tasks created with the new definition
Workers with the new definition will skip tasks created with the old definition

This means you can:

Blue/green deploy — bring up new workers alongside old ones. Old workers drain old instances, new workers handle new instances. Shut down old workers once all old instances complete.
Drain, then deploy all workers at once — simplest if you can tolerate a brief pause in processing.

Summary

Scenario	Hash changes?	Automatic protection?	Recommended strategy
Bug fix in task logic	No	N/A (safe)	Deploy normally
Add `#[serde(default)]` field	No	N/A (safe)	Deploy normally
Add/remove/reorder a task	Yes	Engine rejects mismatch	New instance IDs, or drain first
Change timeout or retry policy	Yes	Engine rejects mismatch	New instance IDs, or drain first
Rename a field without alias	No	No — fails at runtime	Drain first
Change a field’s type	No	No — fails at runtime	Drain first