Salt is a Workflow/Task Scheduler framework that does two things:
- Help reducing dependency management, by providing tools to package your workflow into executables along their dependencies;
- Provide a Framework that can be used to define distributed DAGs.
You can write dags and tasks in a supported language, define your dependencies, and package all together using Salt.
In most workflow managers (e.g. Airflow) dependency management can get extremely complex.
Salt simplifies the dependency management while still providing a DAG/Task Scheduling Framework. Your Workflow is self-contained with its dependencies and does not need anything but Salt framework to execute.
Salt development requires pdm Python Package manager to be installed.
Install the project in editable mode:
pip install -e .[dev]
DAGs and Tasks are defined in user packages using Salt framework.
Salt Framework exposes bindings for Python to let users build tasks and dags.
The tasks and dags are built into a one-file package using nuitka and then simply executed by Salt.
The scheduling logic is completely detached from the task/dag logic and is configured on the side through the Web UI.
When Salt executes the compiled tasks and dags, the underlying framework automatically communicates with Salt backend to register the DAG and link it to its configuration, so it can be scheduled by the scheduler.
To register a DAG module, user publishes it using Salt commands.
TODO The publishing process is yet to be defined.
- Allow cycles and loops in DAGs
- Add validation features for data computed by tasks, not just task success/failure.
- Allow DAGs to change their shape at run-time. Each Task is pushed only at the moment of execution, and never pre-parsed.
- This allow for DAGs to change depending on data computed during execution.
- Provide a solid data driven & event driven integration/approach
Developer [Dev's Code: graph.py / graph.cs] → Developer writes code in Python or C# using Salt framework to define Workflows and Tasks. ↓
Workflow Build [Salt Build Tool - Python] → Builds PEX/.exe/Docker/.zip [Salt Build Tool - C#] → TODO
Workflow Registry [Workflow Registry + Metadata DB] → User registers workflow using Salt Workflow-Registry command → Workflow registry stores workflow metadata in Redis backend and binary in S3 storage.
Scheduling [Scheduler] → Event/Data change triggers workflows or task runs. → Queues task (task type + binary ref + input data) Queuing happens by publishing ready-to-be-picked tasks on a table (e.g. a Redis backend or any resource that can be locked to avoid race conditions) Workers lock a task and execute it, finally storing returned data in the backend database so the scheduler can access it. Q. how do inputs work for first queued task? ↓
Workflow Pickup & Execution / Workers
[Generic Worker Fleet (K8s / Celery)]
→ Locks and picks task on Tasks page
→ Pull task binary from task resource
Caching is vital here so binary is not pulled every time.
→ Runs it (e.g., ./task.bin --input <args-id>)
Args are stored in a backend resource such as Redis. The Task framework automatically pulls these and passes it to code.
XCOM similar approach? A lot of problems with serialization especially with custom types.
→ Reports status/output
pip install salt salt build <project_path>
The previous command outputs a main.pex file built from your python workflow wheel.
This file is a standalone executable which bakes in all the dependencies needed to execute your workflow.
The build process happens in a linux docker container to reduce platform compatibility issues. Right now Salt only supports running wortkflows on Linux, but in future it could support multiple build systems and platforms.
Of course, the executable must be built with the same platform used by the workflow/task worker.
To register a Python Workflow and Start Scheduling it:
pip install salt salt register-workflow <pyproject_path>
The project must have been built already through salt build.
pip install salt
salt generate-server-code .src/salt/backend/workflow_service/grpc/protos ./src
Or by using sdk scripts in ./scripts/sdk.
Note: Protobuf Python Codegen relies on the protos folder structure to generate python imports.
Therefore, it's important to keep a mirrored sub-folder tree inside /grpc/protos so that folder tree is used to build
the imports in the generated packages.
e.g. in workflow_pb2_grpc we then get:
from salt.server import workflow_pb2 as salt_dot_server_dot_workflow__pb2instead of
import workflow_pb2The workflow registry backend takes care of ingesting, registering and storing workflow binaries.
The workflow metadata are stored in a table registry using Redis as a backend, and their binaries are stored in
a S3 bucket.
Both metadata and binaries can be found using the unique key.
The workflow registry table is then consulted by the Scheduler, which in conjunction with the scheduling configuration, will take care of executing workflows.
Scheduling must be configured through a salt.yaml file placed in the workflow folder.
The salt.yaml is evaluated and pushed to the Workflow Registry on Workflow Registration (salt register-workflow command).