⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

ginwakeup/Salt

Repository files navigation

Salt

image

Salt is a Workflow/Task Scheduler framework that does two things:

  1. Help reducing dependency management, by providing tools to package your workflow into executables along their dependencies;
  2. Provide a Framework that can be used to define distributed DAGs.

You can write dags and tasks in a supported language, define your dependencies, and package all together using Salt.

Why Salt

In most workflow managers (e.g. Airflow) dependency management can get extremely complex.

Salt simplifies the dependency management while still providing a DAG/Task Scheduling Framework. Your Workflow is self-contained with its dependencies and does not need anything but Salt framework to execute.

Getting Started

Salt development requires pdm Python Package manager to be installed.

Install the project in editable mode:

pip install -e .[dev]

Core Concepts

DAGs and Tasks

DAGs and Tasks are defined in user packages using Salt framework.

Salt Framework exposes bindings for Python to let users build tasks and dags. The tasks and dags are built into a one-file package using nuitka and then simply executed by Salt.

The scheduling logic is completely detached from the task/dag logic and is configured on the side through the Web UI.

When Salt executes the compiled tasks and dags, the underlying framework automatically communicates with Salt backend to register the DAG and link it to its configuration, so it can be scheduled by the scheduler.

To register a DAG module, user publishes it using Salt commands.

TODO The publishing process is yet to be defined.

Feature Considerations

  • Allow cycles and loops in DAGs
  • Add validation features for data computed by tasks, not just task success/failure.
  • Allow DAGs to change their shape at run-time. Each Task is pushed only at the moment of execution, and never pre-parsed.
    • This allow for DAGs to change depending on data computed during execution.
  • Provide a solid data driven & event driven integration/approach

Architecture Notes

Developer [Dev's Code: graph.py / graph.cs] → Developer writes code in Python or C# using Salt framework to define Workflows and Tasks. ↓

Workflow Build [Salt Build Tool - Python] → Builds PEX/.exe/Docker/.zip [Salt Build Tool - C#] → TODO

Workflow Registry [Workflow Registry + Metadata DB] → User registers workflow using Salt Workflow-Registry command → Workflow registry stores workflow metadata in Redis backend and binary in S3 storage.

Scheduling [Scheduler] → Event/Data change triggers workflows or task runs. → Queues task (task type + binary ref + input data) Queuing happens by publishing ready-to-be-picked tasks on a table (e.g. a Redis backend or any resource that can be locked to avoid race conditions) Workers lock a task and execute it, finally storing returned data in the backend database so the scheduler can access it. Q. how do inputs work for first queued task? ↓

Workflow Pickup & Execution / Workers [Generic Worker Fleet (K8s / Celery)] → Locks and picks task on Tasks page → Pull task binary from task resource Caching is vital here so binary is not pulled every time. → Runs it (e.g., ./task.bin --input <args-id>) Args are stored in a backend resource such as Redis. The Task framework automatically pulls these and passes it to code. XCOM similar approach? A lot of problems with serialization especially with custom types. → Reports status/output

Build a Python Workflow

pip install salt salt build <project_path>

The previous command outputs a main.pex file built from your python workflow wheel. This file is a standalone executable which bakes in all the dependencies needed to execute your workflow.

The build process happens in a linux docker container to reduce platform compatibility issues. Right now Salt only supports running wortkflows on Linux, but in future it could support multiple build systems and platforms.

Of course, the executable must be built with the same platform used by the workflow/task worker.

Register a Python Workflow

To register a Python Workflow and Start Scheduling it:

pip install salt salt register-workflow <pyproject_path>

The project must have been built already through salt build.

Generate Server gRPC Code

pip install salt

salt generate-server-code .src/salt/backend/workflow_service/grpc/protos ./src

Or by using sdk scripts in ./scripts/sdk.

Note: Protobuf Python Codegen relies on the protos folder structure to generate python imports. Therefore, it's important to keep a mirrored sub-folder tree inside /grpc/protos so that folder tree is used to build the imports in the generated packages. e.g. in workflow_pb2_grpc we then get:

from salt.server import workflow_pb2 as salt_dot_server_dot_workflow__pb2

instead of

import workflow_pb2

Workflow Registry

The workflow registry backend takes care of ingesting, registering and storing workflow binaries.

The workflow metadata are stored in a table registry using Redis as a backend, and their binaries are stored in a S3 bucket. Both metadata and binaries can be found using the unique key.

The workflow registry table is then consulted by the Scheduler, which in conjunction with the scheduling configuration, will take care of executing workflows.

Scheduler

Scheduling must be configured through a salt.yaml file placed in the workflow folder.

The salt.yaml is evaluated and pushed to the Workflow Registry on Workflow Registration (salt register-workflow command).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published