Partitioning with a Python Connector

Creating partitions and partition trees and subtrees within a custom Ascend Python Connetor.

How Ascend Creates Partitions with Python Connectors

When ingesting data, Ascend places that data inside partitions to optimize the scheduling of processing. Partitions can be arranged into trees and sub trees, allowing Ascend to only schedule processing for unprocessed partitions. Check out Data Partitioning to learn how Ascend uses different partition strategies.

To do this, Ascend first parses the data looking for objects containing the metadata in yield. For each yield, Ascend will check for is_prefix: true to return objects as a partition under a prefix. By default, metadata is set as none and will result in placing objects in a root level partition, obtaining all the files and folders, and returning the folders as prefixes and files as non-prefix within a single partition.

How to Specify Your Partition Structure

In order to create partitions, you'll need to specify metadata. Within the Python Read Connector, list-objects only reads files as objects in a single partition by default. This is because metadata is defaulted to none. So, with list_objects, yield, and metadata specified, Ascend can create partitions according your needs.

Within yield, the function is_prefix gives you the opportunity to specify and set up your partition hierarchy. A value of false means there is no hierarchy and all the objects are treated as leaves in a non-existent hierarchy. A value of true means there is a hierarchy and composite partitions.

To create composite partitions, you'll want to use if metadata is None: and elif metadata ["name"]== "prefix/": statements before specifying yield statements. The first set of yield statements generate the top partitions, with the specified prefix name. Next, the elif statement tells Ascend, if a specified partition prefix name exists, place objects with the following prefixes within that partition. Then, any yield statements following an elif statement tells Ascend to create leaf partitions and place objects within leaf partitions matching the prefix.

## To specify a hierarchy, first create composite partions and/or stand alone partitions.
if metadata is None:
  yield {'name': "2019/", 'is_prefix': True}
  yield {'name': "2020/", 'is_prefix': True}
  yield {'name': "2021/01", 'is_prefix': False, "fingerprint": "dfgh"}
  
## Next, add partitions to the composite partion   
elif metadata["name"]== "2019/":
  yield {'name': "2019/01", 'is_prefix': True}
  yield {'name': "2019/02", 'is_prefix': True}
  
elif metadata["name"]== "2020/":
	yield {'name': "2020/01", 'is_prefix': True}
	yield {'name': "2020/02", 'is_prefix': True, "fingerprint": "dfgh"}
	yield {'name': "2020/03", 'is_prefix': True}
  
## You can continue these iterative statements to create as many
## composite partitions and partitions as needed.

To create more partitions, you need multiple yields, or an iterative cycle of yields For every non prefix (every leaf), we'll call the read function and read this as a new partition (it'll show up in the Partition iD of the component)

Partion Hierarchy Constraints

When implementing a partition hierarchy, we have two constraints:

  • Name of Objects: When no metadata is specified, the name defaults to NONE. Specifying name means every child you return on this list_objects call has to be the parent name plus the suffix.
  • Prefix Relationships: When you return a set of child partitions, they should not relate to each other. One child partition should not be able to be prefixed for another child partition. A file and folder can have the same name on the same level, but no two files or two folders can share a name.