Understand component types and states used in Ascend dataflows

A Component represents an individual data set that maintains its schema, output data set, and statistics about its output data set. Components can be chained together to make an Ascend Dataflows.

Component Types

Below is a breakdown of the different component types available in a Dataflow. All pipelines start with a Read Connector, that can be transformed any number of times and optionally terminate in a Write Connector or a Data Feed.

Component TypeDescription
Read ConnectorsRead Connectors connect to and synchronize data coming into a Dataflow. Ascend supports many out-of-the-box connectors, as well as a Custom Read Connectors framework for writing your own.
TransformsTransforms manipulate data. They can be written in SQL, PySpark, Scala, or Java.
Write ConnectorsWrite Connectors output data from a Dataflow to an external system and automatically keep the data set synchronized.
Data FeedsData Feeds connect multiple Dataflows together. They also expose "data as an API" to external systems such as Jupyter, Tableau, and Zeppelin.

Component Tables for Read Connectors and Transforms

Each Ascend Read Connector or Transform has an underlying component table where we store the data set that is displayed in the records tab of the component. The data store that the component table lives in is determined by the Data Service Type.

Component States

Each component has a state associated with it and these states change as the Dataflow is processed. Below is a breakdown of all the possible states of the component.

IconState NameDescription
UpToDateUp to DateThis component has been computed in its entirety and there is currently no activity upstream which will cause it to change
RunningRunningAt least one task is actively computing data
ErrorErrorAn error occurred while computing data
WaitingAnalyzePending AnalysisUpstream components are currently processing and this component cannot determine if it needs to process until upstream components finish
WaitingUpstreamWaiting on UpstreamComputation cannot start until upstream components have been computed
ReadyReady to RunThe component is ready to start computation but is either waiting for cluster resources to be available or in a backoff period from a previous error
BlockedBlocked by UpstreamAn upstream component has an error and this component's computation is blocked until that error is resolved
WaitingUpstreamWaiting for Cluster CapacityThe computation can not begin as long as the cluster is busy and can not provide the necessary computational resources to the component
WaitingUpstreamWaiting to UpdateA data store maintenance operation (typically triggered by a component rename or configuration change) is queued and awaiting execution.
RunningUpdating MetadataA data store maintenance operation (typically triggered by a component rename or configuration change) is currently in progress.
WaitingUpstreamWaiting to SweepThe component is done processing and awaiting a final data sweep to remove any outdated data partitions.
RunningSweepingThe component is done processing and currently performing a final data sweep to remove any outdated data partitions.

Component Lifecycle State Diagram

The diagram in the Figure 1 below describe the behavior of the system.


Figure 1