Data Shares
Data Shares boost collaboration via compartmentalization & self-service data discovery, enabling subscriptions across Data Planes of different types.
Data Share functionality is only available in Data Plane/Gen 2 environments. Legacy environments will continue to show Data Feeds.
Overview
Data Shares improve upon Data Feeds by enabling data subscription across different Data Planes. In cases where the subscribing Data Plane has the same connection ID and Data Plane type as the publishing Data Plane, the Data Share can be configured to not materialize data, thus behaving like a Data Feed. When importing data across different Data Plane types, a copy of the data is materialized in the subscribing Data Plane.
Terminology
- Data Share: The publishing end of a Data Share Connection.
- Data Share Connector: The subscribing end of a Data Share Connection.
Materialized vs. Non-Materialized Data Shares
Materialized and non-materialized Data Shares are two different ways to manage data sharing between publishing and subscribing data services. Understanding the difference between them is important for optimizing data processing and ensuring proper access controls.
Materialized Data Shares
A Materialized Data Share Connector makes a copy of the input data in the subscriber Data Service. This setting is supported for any combination of Data Plane/Data Plane types being spanned by the Data Share Connection. The copy operation is carried out by running a Spark job for each partition of the Data Shares output data.
Non-Materialized Data Shares
A non-materialized Data Share Connector does not make a copy of the input data, instead reading it directly off the Data Plane of the publishing Data Share when processing jobs for downstream components. Only Data Share Connections within the same Data Plane type support this mode.
Configuring Data Share Materialization
Data Share materialization can only be configured at time of creation of the Data Share Connector. Once the Connector has been created, the materialization setting cannot be changed.
When creating a Data Share Connector from the UI, the setting will default to non-materialized
if both Data Share and Data Share Connector are in the same Data Plane, and to materialized
otherwise.
For non-materialized Data Share Connectors spanning different Data Planes, the creator of the Data Share Connector is responsible for ensuring that the Data Share Connector's credentials have read access to the Data Share's Data Plane.
Scheduled Data Shares
Scheduled Data Shares represent an advanced feature for managing data flow between different data pipelines. They serve as a powerful tool for time-based production pipelines. Scheduled Data Shares operate by subscribing to data in a similar fashion to their continuous counterparts.
Materialization and Scheduled Data Shares
Creating a scheduled Data Share Connector is only available for materialized Data Shares. They behave similar to a Read Connector, capturing the latest available data upstream at predetermined time intervals. They remain up-to-date until a new period begins or a refresh is manually triggered.
Limitations of Scheduled Data Shares
Despite the flexibility and power of scheduled Data Shares, certain limitations exist. For example, these connectors do not automatically reprocess when any upstream change occurs, whether it be configuration or schema modifications. To propagate such changes, a manual refresh of the downstream is required.
For example, if an upstream Transform is updated with new code, the data share subscriber will not automatically update and will need to be manually refreshed.
Scheduled Data Shares offer substantial advantages in data management, particularly for time-based production pipelines. However, understanding their functionality and limitations is essential for optimizing their use. Always consider these factors when implementing Scheduled Data Shares in your data architecture.
Data Type Compatibility When Crossing Data Plane Types
When crossing Data Plane types, certain data types of the source Data Plane may not be natively supported in the destination Data Plane. The Ascend platform handles these situations in one of three ways:
- if applicable, it will attempt to convert the data to a supported format
- known cases of incompatibility are detected and result in an error prior to the compute job being run.
- in unhandled/unknown cases of incompatibility, an error will likely occur during the compute job when attempting to write the output data
Access Control Model
Data Shares are by default accessible to all other Data Services but can be configured to limit access to explicitly selected Data Services only.
Legacy Data Feeds and Data Shares
When Data Shares are enabled in your environment, you will not be able to create any new Data Feed Publishers. However, existing Data Feed connections will continue to work, and you can continue to subscribe to existing Data Feed, which will appear alongside available Data Shares as options to subscribe to.
Updated 10 months ago