Data Quality is currently available on Snowflake, Databricks, and BigQuery Data Planes.
Data Quality checks are assertions about the dataset associated with a component. It measures objective elements such as completeness, accuracy, and consistency. Ascend provides built-in Standard and Custom Data Quality Checks using SQL queries. Once the data is ingested or transformed, the Quality Checks run before Ascend marks the component as UP TO DATE.
There are two types of Data Quality Checks in Ascend:
- A Standard Check is predefined. Standard Checks include
NOT IN, or
- A Custom Check is a custom SQL users write that returns failing records.
Data Quality Checks at the Component & Partition Level
Each Data Quality check can run on a component or partition level. Component-level checks run SQL queries on the entire dataset for the component. Partition-level checks run SQL queries on the partition whenever a new partition is added to the dataset. So it's more efficient for incremental data pipelines by avoiding repeatedly running the same queries on the entire dataset.
For Standard Checks, Ascend UI sets the Default Run On to By Partition by default. For a comprehensive overview of how partitioning with Ascend works, see Data Partitioning. For Custom Checks, Ascend UI sets the Default Run On to By Component by default. They are configurable on the UI, API, and SDK. The default is partition-level if the test level is not configured.
Data Quality Checks Threshold
For each Data Quality check, the failure effect can be warning or error for the component, and the thresholds for the failure effects are configurable. The current UI default failure effect is warning and the threshold is set to
0. This means a check fails if any rows are returned for either a Standard Check or Custom Check and the component shows as a warning status. Currently, you can only access and set failure thresholds for warning and failure behaviors of checks through the Ascend SDK or API.
Example: For a Standard Check of
NOT NULL, you can set a threshold of "10" and failure effect as warning. Ascend will check how many rows have null values. If a single row containing null values is found, no error or warning is displayed because the threshold was set to "10".
How to Set Standard Checks
Standard Checks allow you to set specific checks for each column with the option to allow for a default value of "run on" and "failure behavior." This allows greater control over prioritizing data integrity by column. When configuring a Read Connector or Transform, complete the following steps before selecting CREATE.
Step 1: Set the Default Run On and Default Failure Behavior (Optional)
The Default Run On sets the default validation target for every column check (either Each Partition or the Whole Component). The Default Failure Behavior sets the failure for every column check (either Return Warning, Raise Error, or Do Nothing).
Step 2: Select columns for running Standard Checks
Select + Add Standard Check to add a new standard check row. Using the generated schema, identify the columns to run quality checks on. Insert the column name into the new row.
Step 3: Set the quality check parameters
Standard Checks allows Ascend to construct SQL queries using a
SELECT COUNT(*). If the query returns a count outside of a specified number in either
NOT IN, or
UNIQUE, the component displays your selected default failure behavior.
Run On and
Failure Behavior columns allow you to override the default behavior previously set for specific column checks.
Once your Standard Check parameters are set, continue the configuration for your Read Connector or Transform.
How to Set Custom Checks
Ability to create custom SQL queries to run against the data. Greater control over specific Standard Checks allow you to set specific checks for each column with the option to allow for a default value of "run on" and "failure behavior." This allows greater control over prioritizing data integrity by column. When configuring a Read Connector or Transform Connector, complete the following steps before selecting CREATE.
Step 1: Set the Default Run On and Default Failure Behavior
The Default Run On sets the default validation target for every column check (either Each Partition or Whole Component). The Default Failure Behavior sets the failure for every column check (either Return Warning, Raise Error, or Do Nothing).
Step 2: Custom SQL
Create and insert your custom SQL query. Once your Custom Check query is set, continue the configuration for your Read or Transform Connector.
Reviewing Data Quality Check Errors and Warnings
Errors and warnings for Data Quality checks appear on the component itself and within the component view. Ascend also writes the failed rows to a separate table.
Component Errors and Warnings
The first is on the component itself within the graph. For any component, Return Warning will log the failure and allow the component to continue. Raise Error will log the failure and stop the component from running.
A description of the warning also appears in the details pane within the component view. First, the Data Quality tab will indicate if a check failed for the partition or component.
The error is also detailed in the Errors and Warnings tab.
Data Quality Checks Table
For any Data Plane, both Standard Checks and Custom Checks will store data quality test queries and results in a new table. The table will appear in the same database as the original data table and with the name
Ex. In Snowflake, the table name ingested is
YELLOW_CAB. The table name containing the quality test results is
Data Quality Checks through Ascend SDK
Data Quality Checks are available through the SDK. Additionally, you can access and set thresholds for warning and failure behaviors of checks.
We're always happy to help with any other questions you might have! Send us an email, ping us on Slack, or chat with us to your right .
Updated 6 months ago