Sayiir vs Airflow
Apache Airflow is the most widely adopted workflow orchestration platform, especially in data engineering. It’s mature, feature-rich, and has a massive ecosystem of integrations. Sayiir approaches the same problem from a different angle: embedded workflows in your application code, not a separate orchestration platform.
This page explains when to use each, the key design differences, and how Sayiir’s model fits event-driven workflows where Airflow was designed for batch schedules.
Design Philosophy: Batch-First vs Event-Driven
Section titled “Design Philosophy: Batch-First vs Event-Driven”Airflow was designed for batch data pipelines. The core mental model is: “Run this DAG every day at 3 AM, process yesterday’s data, write results.” DAGs are defined as Python files that Airflow parses on a schedule. The scheduler decides when to run tasks based on cron expressions, time intervals, and dependencies.
Sayiir is event-driven and code-first. Workflows are defined programmatically at runtime, triggered by application events (user signups, API requests, messages from a queue). The workflow definition is normal async code that executes in response to events, not parsed files evaluated on a schedule.
What This Means in Practice
Section titled “What This Means in Practice”In Airflow, you define a DAG in a Python file that the scheduler evaluates:
# Airflow DAG file (parsed by scheduler)from airflow import DAGfrom airflow.operators.python import PythonOperatorfrom datetime import datetime, timedelta
def process_data(): # Process yesterday's data pass
def send_report(): # Email results pass
with DAG( "daily_report", schedule_interval="0 3 * * *", # Every day at 3 AM start_date=datetime(2024, 1, 1), catchup=False,) as dag: task1 = PythonOperator(task_id="process", python_callable=process_data) task2 = PythonOperator(task_id="report", python_callable=send_report) task1 >> task2In Sayiir, you define a workflow as a function and call it when an event happens:
# Sayiir workflow (triggered by application code)from sayiir import task, Flow, run_workflow
@taskasync def process_data(user_id: int): # Process this user's data pass
@taskasync def send_report(user_id: int): # Email this user's results pass
workflow = Flow("user_report").then(process_data).then(send_report).build()
# Trigger on user request@app.post("/generate-report")async def generate_report(user_id: int): result = await run_workflow(workflow, user_id) return {"status": "complete"}Airflow schedules DAGs. Sayiir embeds workflows in your application.
Infrastructure: Platform vs Library
Section titled “Infrastructure: Platform vs Library”Airflow is a platform with four core components:
- Scheduler — evaluates DAGs, schedules tasks, monitors execution
- Webserver — UI for monitoring, logs, manual triggers
- Workers — execute tasks (can be distributed via Celery or Kubernetes)
- Metadata database — stores DAG state, task history, configuration
This is a complete orchestration system with a rich UI, extensive plugin ecosystem, and powerful scheduling capabilities. It’s also substantial infrastructure to deploy and operate.
Sayiir is a library. Import it, configure storage, run. No scheduler, no webserver, no separate workers. Your application is the workflow engine.
This means:
- Zero infrastructure — no services to deploy or manage
- Embedded in your app — workflows run in the same process as your application
- Serverless-friendly — works on Lambda, Cloud Run, anywhere
- Test with zero setup — in-memory backend, no Docker needed
Airflow’s platform provides a mature UI, scheduling engine, and plugin ecosystem. Sayiir’s library model provides simplicity and portability.
DAG Definition: Static Files vs Runtime Construction
Section titled “DAG Definition: Static Files vs Runtime Construction”Airflow parses DAG files on a schedule (default: every 30 seconds). Your DAG definition must be importable and evaluable by the scheduler, which introduces constraints:
- DAG files must be valid Python that the scheduler can import
- Dynamic DAGs (generated at runtime) require workarounds (DAG serialization, DAG factories)
- Changing a DAG requires deploying new files to the scheduler
- DAG structure is evaluated at parse time, not run time
Sayiir constructs workflows programmatically at runtime. Your workflow definition is normal code that builds a graph:
# Sayiir: Build workflow dynamicallyworkflow = Flow("process_user")
if user.is_premium: workflow = workflow.then(fetch_premium_data)else: workflow = workflow.then(fetch_basic_data)
workflow = workflow.then(send_email).build()In Airflow, dynamic branching uses specialized operators (BranchPythonOperator, TaskFlow API). In Sayiir, it’s just normal code.
Data Passing: XCom vs Direct
Section titled “Data Passing: XCom vs Direct”Airflow uses XCom (cross-communication) to pass data between tasks. XCom stores serialized values in the metadata database. This has limitations:
- Size limits — XCom is backed by the database (typically Postgres). Passing large data requires workarounds (store in S3, pass reference).
- Serialization — data must be JSON-serializable by default (or custom XCom backends).
- Boilerplate — tasks must explicitly push/pull from XCom or use the TaskFlow API.
Sayiir passes data directly between tasks via checkpoints. After a task completes, its output is serialized and stored. The next task receives it as input. No size limits (other than your storage backend), no boilerplate.
# Sayiir: Direct data passing@taskasync def fetch_data() -> dict: return {"large": "data"}
@taskasync def process_data(data: dict) -> str: return f"Processed {len(data)} keys"
workflow = Flow("pipeline").then(fetch_data).then(process_data).build()Airflow’s XCom is designed for metadata, not large data. Sayiir’s checkpoints handle any serializable data.
Real-Time vs Batch
Section titled “Real-Time vs Batch”Airflow was built for scheduled batch jobs. The scheduler evaluates DAGs on intervals and triggers runs based on cron expressions. You can trigger DAGs via API, but the core model is time-based scheduling.
Sayiir is event-driven. Workflows run in response to application events (HTTP requests, queue messages, database changes). There’s no scheduler evaluating files. Your code calls run_workflow() when something happens.
If your workloads are:
- Batch-oriented — process all yesterday’s transactions, generate daily reports, hourly ETL → Airflow
- Event-driven — user signup flow, order processing, webhook handlers → Sayiir
Both can do both, but the design center is different.
UI and Observability
Section titled “UI and Observability”Airflow ships with a comprehensive web UI:
- DAG visualization
- Task logs
- Historical run data
- Manual triggers and retries
- Gantt charts, calendar view, code editor
This is a massive value-add for teams that need operational visibility without building custom tooling.
Sayiir’s open-source core has no built-in UI. You can use your existing APM tooling, structured logging, and metrics exporters. Sayiir Server (coming soon) adds a web dashboard with real-time workflow monitoring, execution history, and observability — closing this gap for teams that need operational visibility without building custom tooling.
For teams that already have observability infrastructure, Sayiir integrates naturally. For teams starting fresh, Sayiir Server will provide a comparable experience.
Plugin Ecosystem
Section titled “Plugin Ecosystem”Airflow has hundreds of integrations (operators, sensors, hooks) for AWS, GCP, Azure, databases, data warehouses, orchestration tools. If you’re building data pipelines, the Airflow ecosystem likely has a pre-built operator for your use case.
Sayiir is a general-purpose workflow library. No pre-built integrations. You write tasks using normal async Python or Rust. This means flexibility but requires more integration code.
For data engineering teams, Airflow’s ecosystem is a major advantage. For application developers, Sayiir’s simplicity is a better fit.
When to Choose Airflow
Section titled “When to Choose Airflow”Choose Airflow if you need:
- Batch-oriented data pipelines — scheduled ETL, daily reports, hourly aggregations
- Rich web UI — monitoring, logs, manual interventions, operational visibility
- Mature ecosystem — hundreds of pre-built integrations for data tools
- Proven scheduler — cron expressions, intervals, complex dependencies
- Large community — extensive documentation, active Slack, many examples
Airflow is the de facto standard for data orchestration. If you’re building data pipelines, it’s a safe choice.
When to Choose Sayiir
Section titled “When to Choose Sayiir”Choose Sayiir if you need:
- Event-driven workflows — user signups, order processing, webhook handlers
- Embedded in your application — workflows are part of your service, not a separate platform
- No infrastructure — library, not platform. Import and run.
- Real-time responsiveness — workflows triggered by application events, not a scheduler
- Code-first — workflows defined programmatically at runtime, not static DAG files
Sayiir is the simplest path to durable workflows in application code, especially for event-driven use cases.
Summary
Section titled “Summary”Airflow is a mature, feature-rich orchestration platform with a massive ecosystem and comprehensive UI. It’s the right choice for batch-oriented data pipelines, especially for data engineering teams.
Sayiir is a library for embedding durable workflows in your application. It’s the right choice for event-driven use cases where you want durability without deploying a separate orchestration platform.
Both are open source (Airflow: Apache 2.0, Sayiir: MIT). Both solve workflow orchestration. The difference is design center: batch-first platform vs event-driven library.
If your workloads are scheduled batch jobs, choose Airflow. If your workloads are event-driven application logic, choose Sayiir.
See also: Sayiir vs Temporal and Comparison Overview