Partitioning with a Python Connector
Creating partitions and partition trees and subtrees within a custom Ascend Python Connetor.
How Ascend Creates Partitions with Python Connectors
When ingesting data, Ascend places that data inside partitions to optimize the scheduling of processing. Partitions can be arranged into trees and sub trees, allowing Ascend to only schedule processing for unprocessed partitions. Check out Data Partitioning to learn how Ascend uses different partition strategies.
To do this, Ascend first parses the data looking for objects containing the metadata in yield
. For each yield
, Ascend will check for is_prefix: true
to return objects as a partition under a prefix. By default, metadata
is set as none
and will result in placing objects in a root level partition, obtaining all the files and folders, and returning the folders as prefixes and files as non-prefix within a single partition.
How to Specify Your Partition Structure
In order to create partitions, you'll need to specify metadata
. Within the Python Read Connector, list-objects
only reads files as objects in a single partition by default. This is because metadata
is defaulted to none
. So, with list_objects
, yield
, and metadata
specified, Ascend can create partitions according your needs.
Within yield
, the function is_prefix
gives you the opportunity to specify and set up your partition hierarchy. A value of false
means there is no hierarchy and all the objects are treated as leaves in a non-existent hierarchy. A value of true
means there is a hierarchy and composite partitions.
To create composite partitions, you'll want to use if metadata is None:
and elif metadata ["name"]== "prefix/":
statements before specifying yield
statements. The first set of yield
statements generate the top partitions, with the specified prefix name. Next, the elif
statement tells Ascend, if a specified partition prefix name exists, place objects with the following prefixes within that partition. Then, any yield
statements following an elif
statement tells Ascend to create leaf partitions and place objects within leaf partitions matching the prefix.
## To specify a hierarchy, first create composite partions and/or stand alone partitions.
if metadata is None:
yield {'name': "2019/", 'is_prefix': True}
yield {'name': "2020/", 'is_prefix': True}
yield {'name': "2021/01", 'is_prefix': False, "fingerprint": "dfgh"}
## Next, add partitions to the composite partion
elif metadata["name"]== "2019/":
yield {'name': "2019/01", 'is_prefix': True}
yield {'name': "2019/02", 'is_prefix': True}
elif metadata["name"]== "2020/":
yield {'name': "2020/01", 'is_prefix': True}
yield {'name': "2020/02", 'is_prefix': True, "fingerprint": "dfgh"}
yield {'name': "2020/03", 'is_prefix': True}
## You can continue these iterative statements to create as many
## composite partitions and partitions as needed.
To create more partitions, you need multiple yields, or an iterative cycle of yields For every non prefix (every leaf), we'll call the read function and read this as a new partition (it'll show up in the Partition iD of the component)
Partion Hierarchy Constraints
When implementing a partition hierarchy, we have two constraints:
- Name of Objects: When no
metadata
is specified, thename
defaults to NONE. Specifyingname
means every child you return on thislist_objects
call has to be the parentname
plus the suffix. - Prefix Relationships: When you return a set of child partitions, they should not relate to each other. One child partition should not be able to be prefixed for another child partition. A file and folder can have the same name on the same level, but no two files or two folders can share a name.
Updated about 1 year ago