Replication Strategies for Blob Store

Ascend's replication strategies for S3, ADLS, & GCS: manage file versions, retain deletions, and switch strategies.

🚧

Available in Gen2 environments only.

The feature is available exclusively in Gen2 environments and Ascend Cloud.

For blob store sources, Ascend automatically ingests data by replicating an exact copy of the up-to-date source data. However, if you're using Amazon S3, Azure (ADLS), or Google Cloud Storage (GCS) Blob Read Connectors, Ascend can also ingest data while retaining deleted files.

Data Replication Strategies for S3, ADLS, and GCS

For Amazon S3, Azure ADLS, and Google Cloud Storage there are three data replication strategies in Ascend, each detailing a specific approach to managing file versions and deletions.

  1. Default Strategy: This is the strategy that applies to all blob store data sources. For S3, ADLS, and GCS, the default strategy applies when the field is left empty.

    • Matches the current state of the source.
    • If files are deleted from the source, they're also removed from Ascend.
  2. Keep Latest File Versions:

    • Modified files in the source are re-ingested into Ascend, replacing the previous versions.
    • Last versions of deleted files are retained in Ascend.
  3. Keep All File Versions:

    • Every version of a file from the source is preserved in Ascend.
    • File versions are differentiated using a timestamp in the partition name (e.g., file1.parquet@2023-01-01T12:00:00).
    • All versions of deleted files are also retained.

Changing Strategies

When you change the replication strategy, Ascend takes a snapshot of the last state. This snapshot becomes your new starting point. If you want to change strategies after a Read Connector is created and data is ingested, you'll need to create a new Read Connector.

Selecting a Replication Strategy

For Amazon S3, Azure ADLS, and Google Cloud Storage, select either Keep Latest File Versions or Keep All File Versions. No selection will use the default strategy.

locating data replication

🚧

Not compatible with Object Aggregation

Data replication strategies are not compatible with Object Aggregation, which means that if you try to create a connector, you will be unable to have both features active simultaneously. See Object Aggregation Strategies for Blob Store.