Scheduled Triggers
Overview
The Scheduled Trigger component is an innovative utility designed for the Ascend platform serving as a versatile tool for orchestrating and initiating various data processes on a pre-set schedule. This component is unique in that it creates a new partition every time it runs, each accompanied by the current UTC timestamp. Its primary function is to act as a declarative element at the beginning of a data flow, similar to a read connector, but with the specific purpose of triggering other components based on time.
Unique Timestamp and Partition-Level SHA
Each execution of the Scheduled Trigger component results in the generation of a UTC timestamp and a unique partition-level SHA (Secure Hash Function). Th- timestamp is not just a piece of data; it plays a crucial role in the functionality of the component. By consistently providing a new timestamp with each run, coupled with the new partition SHA, the component ensures that there is always a fresh and unique piece of data that can be utilized in various downstream processes. The partition-level SHA is also crucial because in Ascend, the change or creation of a new partition SHA is what prompts subsequent components in the data flow to activate. It's not merely the presence of new data, but the alteration at the partition level that serves as the catalyst for downstream processes.
Creating a Scheduled Trigger
Scheduled Triggers are built on Ascend's Read Connector framework. With that in mind, you'll have to step through the creation process as if you're actually creating a Read Connector. First, you'll create a Connection by selecting "Schedule Trigger" from the Connections Catalog. Once you've created a connection, you'll then follow similar steps to creating a Read Connector.
Keep in mind no data is actually ingested. Instead, a current timestamp is generated which then acts as the trigger for running transforms and pipelines.
Working with Scheduled Triggers
Scheduled Triggers work with all Transform component types on all Data Planes and in a variety of ways. However, you should know some Transform types and use cases need special attention.
PySpark Transforms
In scenarios where you're using PySpark, the Scheduled Trigger can be set to initiate the execution of PySpark code at a predetermined time. This capability is particularly useful for automating data flows that require timely updates or processing.
SQL Transforms
SQL Transforms in Ascend require the upstream component to be present in the query. If you are not using the timestamp itself in your query, you may work around this constraint by adding the trigger name as a comment within the SQL query, ensuring that the transform recognizes the connection to the Scheduled Trigger. This approach is particularly handy in circumventing issues where a transform might not recognize an input change if it's not explicitly used in the logic of the query.
Triggering a Dataflow at a Specific Time
The Scheduled Trigger excels in scenarios where a data flow needs to commence at a precise moment. For example, if certain parts of a data flow need to run twice a day while others only once, this component can be configured to activate these processes at the designated times by creating multiple Scheduled Triggers, each with the correct scheduling requirements and linking them to the appropriate transforms.
Utilizing the SDK Call
You can also utilize Scheduled Triggers via the Ascend SDK, by calling refresh_read_connector
. This functionality is essential for cases where triggers are required not on a fixed schedule, but on-demand. This gives you a broader range of use cases like triggering data flows or specific processes in response to external events or conditions.
In summary, the Scheduled Trigger component on Ascend offers a powerful and flexible tool for orchestrating data processes. Its ability to generate new partitions with unique UTC timestamps ensures consistent triggering of downstream components, making it an invaluable asset in complex data workflows.
Updated 7 months ago