Python SDK

Ascend Python SDK

Installation & Usage

Requirements: Python 3.8+

pip install

You can install the package directly via pip.

pip install ascend-io-sdk

Then import the package:

import ascend

Getting Started

  1. Follow the installation procedure
  2. Create credentials (access key + secret key) in the Ascend UI.
  3. Instantiate a client by passing in credentials to the Client.
from ascend.sdk.client import Client
from ascend.protos.api.api_pb2 import DataService

client = Client(hostname="trial.ascend.io", access_key="youraccesskey", secret_key="yoursecretkey")
client.create_data_service(DataService(id="demos", name="demos", description="test data service"))

Authorization

Credentials can be passed in to the client directly with the keyword arguments of access_key or secret_key as in the following:

from ascend.sdk.client import Client

client = Client(hostname="trial.ascend.io", access_key="youraccesskey", secret_key="yoursecretkey")

Alternatively, the client can read credentials from a config file, located in ~/.ascend/credentials. The file should have the following structure:

[trial]
ascend_access_key_id=<your_key_id_here>
ascend_secret_access_key=<your_secret_key_here>

where the [trial] should be replaced with your Ascend subdomain, [subdomain].

Then, constructing the client, you do not need to pass in the credentials directly:

from ascend.sdk.client import Client

client = Client(hostname="trial.ascend.io")

Overview

The SDK Python API comes in 2 flavors -

  1. A low-level CRUD API to access, create, update, and delete resources in Ascend.
  2. A high-level declarative API to declare the desired end-state of Dataflows and DataServices.

The use-cases for 1.) and 2.) are different, and complementary to each other.

  • The low-level API is useful for iterating on a Dataflow, and accessing detailed information about components such as schema, records, partitions, lineage, etc. Resources in this API are defined as protobuf objects, and the CRUD operations are available as instance methods on the SDK Client.
  • The high-level API is useful when you want to export and commit your Dataflow or DataService to source-control, or have it be part of a CICD pipeline. Resources in this API are defined as Dataflow resource definition objects. Utility classes such as DataflowApplier or DataServiceApplier are used to apply/sync these definitions with your account. Under the hood, the high-level API is implemented on top of the low-level API.

πŸ“˜

Review the detailed SDK Client Reference for complete client API documentation.

Getting started with Dataflow resource definitions

There are a couple of ways to get started. You could write a Dataflow from scratch based on the API reference and the examples in this tutorial. However, the typical workflow for a new user is to first create DataServices and Dataflows on the Ascend UI, and then download/export these to their local file-system.

Download an existing Dataflow

To download a Dataflow:

from ascend.sdk.client import Client
from ascend.sdk.render import download_dataflow

client = Client("myhost.ascend.io")
download_dataflow(client,
                  data_service_id="my_data_service",
                  dataflow_id="my_dataflow",
                  resource_base_path="./")

This generates a Dataflow definition in a Python file named my_dataflow.py, and stores inline code from Transforms and ReadConnectors as separate files in a sub-folder named my_dataflow/ under resource_base_path.

Feel free to inspect and refactor the generated files. You'll notice that the Dataflow definition consists roughly of:

  1. A series of import statements to pull in the required SDK and protobuf classes.
  2. A Dataflow resource definition that includes component definitions, data feeds, data feed connectors, and component groups. Note how relationships between resources are expressed. For instance, dependencies between components are expressed via fields such as input_id (for WriteConnectors) and input_ids (for Transforms).
  3. Creation of a SDK Client instance.
  4. Create a DataflowApplier and apply the constructed Dataflow definition.
    Once downloaded, this Python script can be run any number of times to achieve the desired end-state for the Dataflow.

The steps for downloading a DataService are similar:

from ascend.sdk.client import Client
from ascend.sdk.render import download_data_service

client = Client("myhost.ascend.io")
download_data_service(client,
                      data_service_id="my_data_service",
                      resource_base_path="./")

This generates a DataService definition in a Python file my_data_service.py under resource_base_path, creates a sub-folder for each Dataflow, and stores inline code for Transforms and ReadConnectors in these sub-folders.

Getting started with Protobuf classes

The Ascend model classes and types are all generated from Google's protobufs. For developers that are used to protobufs, these classes should feel familiar. For developers new to Protobuf classes, however, this section provides a brief overview.

Constructing an Object

Attributes to an object can be passed into the constructor with keyword syntax, like the following:

from ascend.protos.api.api_pb2 import DataService

ds = DataService(id="demos", name="demos", description="test data service")

Alternatively, fields can be set directly through assignment:

from ascend.protos.api.api_pb2 import DataService

ds = DataService()
ds.id = "demos"
ds.name = "demos"
ds.description = "test data service"

Nested Types

Fields for a nested type can be nested in the constructor, as so:

from ascend.protos.external.external_pb2 import Aws, Credentials

b = Aws.S3.Bucket(name="mybucket", credential_id=Credentials.Id(value="id-3"))

However, it is not allowed to assign the nested field directly:

b = Aws.S3.Bucket()
b.credential_id = Credentials.Id(value="id-3") # WRONG!

In this example, to set credential_id, either set the value directly or use CopyFrom:

b = Aws.S3.Bucket()
b.credential_id.value = "id-3"
b.credential_id.CopyFrom(Credentials.Id(value="id-3"))

For more information on setting nested types, please see the Protobuf Python reference on embedded messages.

Working with Lists

Working with lists is similar to working with nested types, in that the list is already instantiated.

from ascend.protos.api.api_pb2 import Transform

t = Transform()
t.inputs.append(Transform.Input(uuid="fake-uuid")) # Append one value
t.inputs.extend([Transform.Input(uuid="fake-uuid2"), Transform.Input(uuid="fake-uuid3")]) # Append a whole list
len(t.inputs) # returns 3

For more information on working with lists, please see the Protobuf Python reference on repeated fields and repeated message fields.

Ascend SDK API Reference

In addition to the SDK Client, the SDK distribution contains 4 top-level modules for interacting with Ascend.

Module definitions

Ascend resource definition classes.

Classes

Component(id: str)

Components are the functional units in a Dataflow.
There are three main types of Components: ReadConnectors, Transforms, and WriteConnectors.

ComponentGroup(id: str, name: str, component_ids: List[str], description: str = None)

A component group

Create a ComponentGroup definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • component_ids: list of ids of components that make up the group
    Returns: a new ComponentGroup instance.
DataFeed(id: str, name: str, input_id: str, description: str = None, shared_with_all: bool = False, data_services_shared_with: List[str] = [], hidden_from_host_data_service: bool = False)

A data feed

Create a DataFeed definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_id: input component id
  • shared_with_all: if set to true, this data feed is shared with all data services,
    including the host data service
  • data_services_shared_with: a list of ids of data services that we want to share
    data with (not including the host data service)
  • hidden_from_host_data_service: if set to true, the host data service cannot subscribe
    to the contents of this data feed
    Returns: a new DataFeed instance.

Note - if shared_with_all is set to true, the values of data_services_shared_with and
hidden_from_host_data_service are ignored

DataFeedConnector(id: str, name: str, input_data_service_id: str, input_dataflow_id: str, input_data_feed_id: str, description: str = None)

A data feed connector

Create a DataFeedConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_data_service_id: id of the DataService that the input DataFeed belongs to
  • input_dataflow_id: id of the Dataflow that the input DataFeed belongs to
  • input_data_feed_id: input DataFeed id
    Returns: a new DataFeedConnector instance.
DataService(id: str, name: str, description: str = None, dataflows: List[definitions.Dataflow] = [])

A DataService is the highest-level organizational structure in Ascend, and the primary
means of controlling access. It contains one or more Dataflows and can communicate with
other Data Services via Data Feeds.

Create a DataService definition.

Parameters:

  • id: DataService identifier
  • name: name
  • description: brief description
  • dataflows: list of dataflows
    Returns: a new DataService instance.
Dataflow(id: str, name: str, description: str = None, components: List[definitions.Component] = [], data_feed_connectors: List[definitions.DataFeedConnector] = [], data_feeds: List[definitions.DataFeed] = [], groups: List[definitions.ComponentGroup] = [])

A Dataflow is a representation of the operations being performed on data, expressed
as a dependency graph. Operations are expressed declaratively via components, and are
broadly categorized as read connectors (data ingestion), transforms (transformation),
and write connectors. Data feeds expose the results of transforms to other dataflows
and data services. Data feed connectors are subscriptions to data feeds. Component groups
offer users a convenient way to organize components that have a similar purpose.

Create a Dataflow definition.

Parameters

  • id: identifier for the Dataflow that is unique within its DataService
  • name: user-friendly name
  • description: brief description
  • components: list of components
  • data_feed_connectors: list of data feeds
  • data_feeds: list of data feed connectors
  • groups: list of component groups
    Returns: a new Dataflow instance.
Definition()

Base class for all definitions.

ReadConnector(id: str, name: str, container: ascend.protos.io.io_pb2.Container, description: str = None, pattern: ascend.protos.pattern.pattern_pb2.Pattern = None, update_periodical: ascend.protos.core.core_pb2.Periodical = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None, last_manual_refresh_time: google.protobuf.timestamp_pb2.Timestamp = None, aggregation_limit: google.protobuf.wrappers_pb2.Int64Value = None, bytes: ascend.protos.component.component_pb2.FromBytes = None, records: ascend.protos.component.component_pb2.FromRecords = None)

A read connector

Create a ReadConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • container (proto): a typed representation of the source of data
  • pattern (proto): defines the set of objects in the source that will be ingested
  • update_periodical: specifies source update frequency
  • assigned_priority: scheduling priority for the read connector
  • last_manual_refresh_time:
  • aggregation_limit:
  • bytes: defines the operator for parsing bytes read from the source (example - json
    or xsv parsing)
  • records: defines the schema for records read from a database or warehouse
    Returns: a new ReadConnector instance.

Note - a read connector can have either bytes or records defined, but not both.

Transform(id: str, name: str, input_ids: List[str], operator: ascend.protos.operator.operator_pb2.Operator, description: str = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None)

A transform

Create a Transform definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_ids: list of input component ids
  • operator: transform operator definition - example - SQL/PySpark/Scala transform
  • assigned_priority: scheduling priority for transform
    Returns: a new Transform instance.
WriteConnector(id: str, name: str, input_id: str, container: ascend.protos.io.io_pb2.Container, description: str = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None, bytes: ascend.protos.component.component_pb2.ToBytes = None, records: ascend.protos.component.component_pb2.ToRecords = None)

A write connector

Create a WriteConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_id: input component id
  • container (proto): a typed representation of the destination of data
  • assigned_priority: scheduling priority of write connector
  • bytes: defines the operator for formatting bytes written to the object store destination
  • records: indicates that the destination is a structured record store like a database
    or a data warehouse
    Returns: a new WriteConnector instance.

Note - a write connector can have either bytes or records defined, but not both.

Module applier

Ascend resource applier classes - used for applying resource definitions.

Classes

DataServiceApplier(client: ascend.sdk.client.Client)

DataServiceApplier is a utility class that accepts a DataService definition and
'applies' it - ensuring that the DataService is created if it does not already exist,
deleting any existing Dataflows, components, and members of the DataService that
are not part of the supplied definition, and applying any configuration changes needed
to match the definition.

Creates a new DataServiceApplier.

Methods
apply(self, data_service: ascend.sdk.definitions.DataService, delete=True, dry_run=False)

Create or update the specified DataService.

Parameters:

  • data_service: DataService definition
  • delete (optional): If set to True (which is the default) - delete any Dataflow
    that is not part of data_service. At the Dataflow level, remove any components
    that are not part of the Dataflow definition.
  • dry_run (optional): If set to True (False by default) - skips any create or
    update operations
DataflowApplier(client: ascend.sdk.client.Client)

DataflowApplier is a utility class that accepts a Dataflow definition and 'applies'
it - ensuring that a Dataflow is created if it does not already exist and binding it to
the DataService identified by data_service_id, deleting any components and members of
the Dataflow that are not part of the supplied definition, and applying any configuration
changes needed to match the definition.

Creates a new DataflowApplier.

Methods
apply(self, data_service_id: str, dataflow: ascend.sdk.definitions.Dataflow, delete=True, dry_run=False)

Accepts a Dataflow definition, and ensures that it is created, bound to the DataService
identified by data_service_id, and has all of the constituent elements included in the
Dataflow definition - components (read connectors, transforms, write connectors), component
groups, data feeds, and data feed connectors. The specified DataService must already exist.

Parameters:

  • data_service_id: DataService id
  • dataflow: Dataflow definition
  • delete: if set to True (default=True) - delete any components, data feeds,
    data feed connectors, or groups not defined in dataflow
  • dry_run: If set to True (default=False) - skips any create or update operations
ComponentApplier.build(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str)

ComponentApplier is a utility class that accepts a Component definition (ReadConnector, WriteConnector, Transform) and 'applies'
it - ensuring that the Component is created if it does not already exist and binding it to
the DataService identified by data_service_id and the DataFlow identified by
dataflow_id, crating or applying any configuration changes needed to match the definition.

Creates a new ComponentApplier that has the apply function.

Methods
apply(self, data_service_id: str, dataflow_id: str, component: Component)

Accepts a Component definition, and ensures that it is created, bound to the DataService
identified by data_service_id and the Dataflow identified by dataflow_id, and has
all of the properties in the Component definition The specified DataService and Dataflow must already exist.

Parameters:

  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • component: Component Definition (one of ReadConnector, Transform, WriteConnector)

Module render

Utility functions to download Ascend resource definitions.

Functions

from ascend.sdk.render import download_data_service

download_data_service(client: ascend.sdk.client.Client, data_service_id: str, resource_base_path: str = '.')

Downloads a DataService and writes its definition to a file named {data_service_id}.py
under resource_base_path. Inline code for PySpark and SQL Transforms are written as separate
files to sub-folders - resource_base_path/{dataflow_id}/ with the file name derived from
the id of the component to which the code belongs. Raises ValueError if resource_base_path
does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • resource_base_path: base path to which DataService definition and resource
    files are written
from ascend.sdk.render import download_dataflow

download_dataflow(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str, resource_base_path: str = '.')

Downloads a Dataflow and writes its definition to a file named {dataflow_id}.py under
resource_base_path. Inline code for PySpark and SQL Transforms are written as separate
files to a sub-folder - resource_base_path/{dataflow_id}/ with the file names derived
from the id of the component to which the code belongs. Raises a ValueError if
resource_base_path does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • resource_base_path: path to which Dataflow definition and resource files
    are written
from ascend.sdk.render import download_component

download_component(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str, component_id: str, resource_base_path: str = '.')

Downloads a Component and writes its definition to a file named {component_id}.py under
resource_base_path. Inline code for PySpark and SQL Transforms are written as separate
files in the same folder with the file names derived from the id of the component to
which the code belongs. Raises a ValueError if resource_base_path does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • component_id: Component id
  • resource_base_path: path to which Component definition and resource files are written

Classes

InlineCode(code: str, attribute_path: Tuple[str, ...], resource_path: str, base_path: str, base64_encoded: bool = False)

Represents the result of extraction of inline code from a component - such as
a sql statement or PySpark code. Contains all of the metadata needed to render
a component definition, with the inline code written to a separate file, and a reference
to this code stored in the component.

Parameters:

  • code: inline code
  • attribute_path: path of the attribute in the component definition that contains
    the inline code, represented as a tuple of path components. For instance, the
    path for sql code in a Transform is ("operator", "sql_query", "sql")
  • resource_path: file path to which inline code is written
  • base_path: base path of dataflow or data service resource definition
  • base64_encoded: if set to True, inline code is base64 encoded

Module builder

Utility functions to create Ascend resource definitions from the corresponding protobuf
objects. The protobuf objects can be obtained via the 'imperative' API accessible through
the SDK client -
client.get_data_service(<data_service_id>)
client.get_dataflow(<dataflow_id>)

Functions

data_service_from_proto(client: ascend.sdk.client.Client, proto: ascend.protos.api.api_pb2.DataService) ‑> ascend.sdk.definitions.DataService

Build a DataService definition from the API DataService protobuf representation.
This is a utility function that helps bridge the declarative and imperative forms
of the SDK by transforming a DataService proto message to a DataService definition
that can be 'applied'.

Parameters:

  • client: SDK client
  • proto: DataService protobuf object
    Returns: a DataService definition
dataflow_from_proto(client: ascend.sdk.client.Client, data_service_id: str, proto: ascend.protos.api.api_pb2.Dataflow) ‑> ascend.sdk.definitions.Dataflow

Build a Dataflow definition from the API Dataflow protobuf representation.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • proto: Dataflow protobuf object
    Returns: a Dataflow definition

Updated 22 days ago



Python SDK


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.