Write Connector Partitioning
Write Connectors will leverage the partitioning strategy of their upstream Transform, if supported, to determine how to optimize their writing strategy. If a Write Connector does not support the partitioning strategy of its upstream Transform, it will fall back to a Full Reduction. The native blob storage Write Connectors support all partitioning types. Native Data Warehouse write connectors support only Full Reduction and Timestamp Partitioning methods.
Write Connector from a Full Reduction Transform
With an upstream Full Reduction, the Write Connector will generate 1 task to write out the data. Anytime the upstream partitions change, the full data set will be re-written to destination.
Write Connector from a Timestamp Partitioning Transform
With an upstream Timestamp Partitioned, the Write Connector will generate 1 task per partition to write out. This strategy allows for incremental loading and updating of the end destination. Additionally, this strategy allows for creating folders in which the folder name contains a date value from the data. This pattern is common when preparing data for querying in engines such as Athena, Presto, Redshift Spectrum, and Databricks.
Maintaining Timestamp Partitions through Mapping Transforms
In order to take advantage of timestamp partitioning in a Write Connector (for example, to use a column value as part of a date), the Write Connector does not need to be directly downstream of a Transform that partitions the data by timestamp. Any number of Mapping transforms may follow as long as the partitioning column is carried through and the partition structure (each partition containing values for a particular timestamp granule) is maintained.
The below screenshot demonstrates an example configuration.
In this configuration, event_month is the column name from upstream. The date format string used here follows the definitions from Java SimpleDateFormat.
Below is a screenshot that shows the folders and files created by the write connector in the sample S3 location.
Write Connector from a Mapping Transform
With an upstream Mapping Transform, the Write Connector will maintain the same number of partitions in the destination. Partitions with 0 records are discarded.
Updated 9 months ago