Regardless of the Data Service type, Ascend also stores metadata outside the component tables to help with orchestration. The two most important pieces of that metadata is the Data SHA (D-SHA) calculated per Ascend Partition and the Component SHA (C-SHA). These SHAs enable Ascend to take a declarative approach to orchestration. Ascend schedules internal tasks by tracking changes in the D-SHA and C-SHA and propagating those changes downstream until every component reaches an up-to-date (UTD) state.
Ascend Orchestration Automation can be simplified into the following processes on a high-level:
- If Ascend detects change in one or more input partitions, Ascend reprocesses the necessary input partitions and updates output partitions' SHAs; and/or
- If Ascend detects changes in the component config, Ascend reprocesses the whole component and updates the output partitions' hashes.
The declarative orchestration approach enables Ascend users to:
- Pause/unpause pipelines at any random point
- Recover pipelines at the point of break while avoiding unnecessary reprocessing of the whole pipeline and maintaining data integrity
- Make changes in a pipeline at up-to-date or running state while avoiding unnecessary processing
- Stop propagation of an Ascend Partition without requiring user interaction if a component outputs the same dataset mid-pipeline.
A great example of the power of declarative orchestration can be observed in an incremental pipeline where a random historical partition is modified in the data source. In such a scenario, Ascend will only trigger reprocessing of the modified partition, propagating it through the pipeline until it reaches the end or until a component completes without changing its corresponding output partition. This saves users tremendous amounts of compute resources in complex pipelines.
Updated 5 months ago