SDK Definition Files

Using Dataflow / DataService Definitions

This section covers using the higher-level API of the Ascend SDK. Ascend represents the pipeline as Python classes which can then be "applied" to Ascend to true up state. For example, a single Ascend Dataflow is represented as a Dataflow class, with a name, id, embedded transforms, etc. This definition can be tweaked and then applied back to Ascend as part of a build/deploy process.

There are a couple of ways to get started. You could write a Dataflow definition from scratch based on the API reference and the examples in this tutorial. However, the typical workflow for a new user is to first create DataServices and Dataflows on the Ascend UI, and then download/export these definitions to their local file-system.

Download an existing Dataflow

To download a Dataflow:

from ascend.sdk.client import Client
from ascend.sdk.render import download_dataflow, TEMPLATES_V2

client = Client("myhost.ascend.io")
download_dataflow(client,
                  data_service_id="my_data_service",
                  dataflow_id="my_dataflow",
                  resource_base_path="./",
                  template_dir=TEMPLATES_V2)

This generates a Dataflow definition in a Python file named my_dataflow.py, and stores inline code from Transforms and ReadConnectors as separate files in a sub-folder named my_dataflow/ under resource_base_path.

Feel free to inspect and refactor the generated files. You'll notice that the Dataflow definition consists roughly of:

  1. A series of import statements to pull in the required SDK and protobuf classes.
  2. A Dataflow resource definition that includes component definitions, data feeds, data feed connectors, and component groups. Note how relationships between resources are expressed. For instance, dependencies between components are expressed via fields such as input_id (for WriteConnectors) and input_ids (for Transforms).
  3. Creation of a SDK Client instance.
  4. Create a DataflowApplier and apply the constructed Dataflow definition.
    Once downloaded, this Python script can be run any number of times to achieve the desired end-state for the Dataflow.

The steps for downloading a DataService are similar:

from ascend.sdk.client import Client
from ascend.sdk.render import download_data_service, TEMPLATES_V2

client = Client("myhost.ascend.io")
download_data_service(client,
                      data_service_id="my_data_service",
                      resource_base_path="./",
                      template_dir=TEMPLATES_V2)

This generates a DataService definition in a Python file my_data_service.py under resource_base_path, creates a sub-folder for each Dataflow, and stores inline code for Transforms and ReadConnectors in these sub-folders.

Ascend SDK Modules Reference

There are 4 modules to use as part of interacting with Ascend through these "definition"-based APIs.

Module NameDescription
definitionsFor use to express the definition of the key parts of Ascend, such as a Dataflow, DataService, etc.
applierFor 'applying' the definitions back to Ascend (typically used as part of "push"-ing or deploying to Ascend.
renderFor getting a template of the existing resources in an Ascend environment to bootstrap / facilitate development.
builderUtility functions to create Ascend Definitions from protobuf API responses.

Module definitions

Ascend resource definition classes.

Classes

Component(id: str)

Components are the functional units in a Dataflow.
There are three main types of Components: ReadConnectors, Transforms, and WriteConnectors.

ComponentGroup(id: str, name: str, component_ids: List[str], description: str = None)

A component group

Create a ComponentGroup definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • component_ids: list of ids of components that make up the group
    Returns: a new ComponentGroup instance.
DataFeed(id: str, name: str, input_id: str, description: str = None, shared_with_all: bool = False, data_services_shared_with: List[str] = [], hidden_from_host_data_service: bool = False)

A data feed

Create a DataFeed definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_id: input component id
  • shared_with_all: if set to true, this data feed is shared with all data services,
    including the host data service
  • data_services_shared_with: a list of ids of data services that we want to share
    data with (not including the host data service)
  • hidden_from_host_data_service: if set to true, the host data service cannot subscribe
    to the contents of this data feed
    Returns: a new DataFeed instance.

Note - if shared_with_all is set to true, the values of data_services_shared_with and
hidden_from_host_data_service are ignored

DataFeedConnector(id: str, name: str, input_data_service_id: str, input_dataflow_id: str, input_data_feed_id: str, description: str = None)

A data feed connector

Create a DataFeedConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_data_service_id: id of the DataService that the input DataFeed belongs to
  • input_dataflow_id: id of the Dataflow that the input DataFeed belongs to
  • input_data_feed_id: input DataFeed id
    Returns: a new DataFeedConnector instance.
DataService(id: str, name: str, description: str = None, dataflows: List[definitions.Dataflow] = [])

A DataService is the highest-level organizational structure in Ascend, and the primary
means of controlling access. It contains one or more Dataflows and can communicate with
other Data Services via Data Feeds.

Create a DataService definition.

Parameters:

  • id: DataService identifier
  • name: name
  • description: brief description
  • dataflows: list of dataflows
    Returns: a new DataService instance.
Dataflow(id: str, name: str, description: str = None, components: List[definitions.Component] = [], data_feed_connectors: List[definitions.DataFeedConnector] = [], data_feeds: List[definitions.DataFeed] = [], groups: List[definitions.ComponentGroup] = [])

A Dataflow is a representation of the operations being performed on data, expressed
as a dependency graph. Operations are expressed declaratively via components, and are
broadly categorized as read connectors (data ingestion), transforms (transformation),
and write connectors. Data feeds expose the results of transforms to other dataflows
and data services. Data feed connectors are subscriptions to data feeds. Component groups
offer users a convenient way to organize components that have a similar purpose.

Create a Dataflow definition.

Parameters

  • id: identifier for the Dataflow that is unique within its DataService
  • name: user-friendly name
  • description: brief description
  • components: list of components
  • data_feed_connectors: list of data feeds
  • data_feeds: list of data feed connectors
  • groups: list of component groups
    Returns: a new Dataflow instance.
Definition()

Base class for all definitions.

ReadConnector(id: str, name: str, container: ascend.protos.io.io_pb2.Container, description: str = None, pattern: ascend.protos.pattern.pattern_pb2.Pattern = None, update_periodical: ascend.protos.core.core_pb2.Periodical = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None, last_manual_refresh_time: google.protobuf.timestamp_pb2.Timestamp = None, aggregation_limit: google.protobuf.wrappers_pb2.Int64Value = None, bytes: ascend.protos.component.component_pb2.FromBytes = None, records: ascend.protos.component.component_pb2.FromRecords = None, compute_configurations: List[ascend.protos.expression.expression_pb2.StageComputeConfiguration] = None)

A read connector

Create a ReadConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • container (proto): a typed representation of the source of data
  • pattern (proto): defines the set of objects in the source that will be ingested
  • update_periodical: specifies source update frequency
  • assigned_priority: scheduling priority for the read connector
  • last_manual_refresh_time:
  • aggregation_limit:
  • bytes: defines the operator for parsing bytes read from the source (example - json
    or xsv parsing)
  • records: defines the schema for records read from a database or warehouse
  • compute_configurations: custom configuration overrides by stage
    Returns: a new ReadConnector instance.

Note - a read connector can have either bytes or records defined, but not both.

Transform(id: str, name: str, input_ids: List[str], operator: ascend.protos.operator.operator_pb2.Operator, description: str = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None)

A transform

Create a Transform definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_ids: list of input component ids
  • operator: transform operator definition - example - SQL/PySpark/Scala transform
  • assigned_priority: scheduling priority for transform
    Returns: a new Transform instance.
WriteConnector(id: str, name: str, input_id: str, container: ascend.protos.io.io_pb2.Container, description: str = None, assigned_priority: ascend.protos.component.component_pb2.Priority = None, bytes: ascend.protos.component.component_pb2.ToBytes = None, records: ascend.protos.component.component_pb2.ToRecords = None, compute_configurations: List[ascend.protos.expression.expression_pb2.StageComputeConfiguration] = None)

A write connector

Create a WriteConnector definition.

Parameters:

  • id: identifier that is unique within Dataflow
  • name: user-friendly name
  • description: brief description
  • input_id: input component id
  • container (proto): a typed representation of the destination of data
  • assigned_priority: scheduling priority of write connector
  • bytes: defines the operator for formatting bytes written to the object store destination
  • records: indicates that the destination is a structured record store like a database
    or a data warehouse
  • compute_configurations: custom configuration overrides by stage
    Returns: a new WriteConnector instance.

Note - a write connector can have either bytes or records defined, but not both.

Module applier

Ascend resource applier classes - used for applying resource definitions.

Classes

DataServiceApplier(client: ascend.sdk.client.Client)

DataServiceApplier is a utility class that accepts a DataService definition and
'applies' it - ensuring that the DataService is created if it does not already exist,
deleting any existing Dataflows, components, and members of the DataService that
are not part of the supplied definition, and applying any configuration changes needed
to match the definition.

Creates a new DataServiceApplier.

Methods
apply(self, data_service: ascend.sdk.definitions.DataService, delete=True, dry_run=False)

Create or update the specified DataService.

Parameters:

  • data_service: DataService definition
  • delete (optional): If set to True (which is the default) - delete any Dataflow
    that is not part of data_service. At the Dataflow level, remove any components
    that are not part of the Dataflow definition.
  • dry_run (optional): If set to True (False by default) - skips any create or
    update operations
DataflowApplier(client: ascend.sdk.client.Client)

DataflowApplier is a utility class that accepts a Dataflow definition and 'applies'
it - ensuring that a Dataflow is created if it does not already exist and binding it to
the DataService identified by data_service_id, deleting any components and members of
the Dataflow that are not part of the supplied definition, and applying any configuration
changes needed to match the definition.

Creates a new DataflowApplier.

Methods
apply(self, data_service_id: str, dataflow: ascend.sdk.definitions.Dataflow, delete=True, dry_run=False)

Accepts a Dataflow definition, and ensures that it is created, bound to the DataService
identified by data_service_id, and has all of the constituent elements included in the
Dataflow definition - components (read connectors, transforms, write connectors), component
groups, data feeds, and data feed connectors. The specified DataService must already exist.

Parameters:

  • data_service_id: DataService id
  • dataflow: Dataflow definition
  • delete: if set to True (default=True) - delete any components, data feeds,
    data feed connectors, or groups not defined in dataflow
  • dry_run: If set to True (default=False) - skips any create or update operations
ComponentApplier.build(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str)

ComponentApplier is a utility class that accepts a Component definition (ReadConnector, WriteConnector, Transform) and 'applies'
it - ensuring that the Component is created if it does not already exist and binding it to
the DataService identified by data_service_id and the DataFlow identified by
dataflow_id, crating or applying any configuration changes needed to match the definition.

Creates a new ComponentApplier that has the apply function.

Methods
apply(self, data_service_id: str, dataflow_id: str, component: Component)

Accepts a Component definition, and ensures that it is created, bound to the DataService
identified by data_service_id and the Dataflow identified by dataflow_id, and has
all of the properties in the Component definition The specified DataService and Dataflow must already exist.

Parameters:

  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • component: Component Definition (one of ReadConnector, Transform, WriteConnector)

Module render

Utility functions to download Ascend resource definitions.

Functions

from ascend.sdk.render import download_data_service

download_data_service(client: ascend.sdk.client.Client, data_service_id: str, resource_base_path: str = '.')

Downloads a DataService and writes its definition to a file named {data_service_id}.py
under resource_base_path. Inline code for Transforms and ReadConnectors are written as separate
files to sub-folders - resource_base_path/{dataflow_id}/components/ with the file name derived from
the id of the component to which the code belongs. Creates resource_base_path if
resource_base_path does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • resource_base_path: base path to which DataService definition and resource files are written
  • template_dir: directory that contains templates for rendering. Defaults to v1 template directory.
from ascend.sdk.render import download_dataflow

download_dataflow(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str, resource_base_path: str = '.')

Downloads a Dataflow and writes its definition to a file named {dataflow_id}.py under
resource_base_path. Inline code for Transforms and ReadConnectors are written as separate
files to a sub-folder - resource_base_path/components/ with the file names derived
from the id of the component to which the code belongs. Creates resource_base_path if
resource_base_path does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • resource_base_path: path to which Dataflow definition and resource files are written
  • template_dir: directory that contains templates for rendering. Defaults to v1 template directory.
from ascend.sdk.render import download_component

download_component(client: ascend.sdk.client.Client, data_service_id: str, dataflow_id: str, component_id: str, resource_base_path: str = '.')

Downloads a Component and writes its definition to a file named {component_id}.py under
resource_base_path. Inline code for PySpark and SQL Transforms are written as separate
files in the same folder with the file names derived from the id of the component to
which the code belongs. Creates resource_base_path if resource_base_path does not exist.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • dataflow_id: Dataflow id
  • component_id: Component id
  • resource_base_path: path to which Component definition will be written are written
  • template_dir: directory that contains templates for rendering. Defaults to v1 template directory.

Classes

InlineCode(code: str, attribute_path: Tuple[str, ...], resource_path: str, base_path: str, base64_encoded: bool = False)

Represents the result of extraction of inline code from a component - such as
a sql statement or PySpark code. Contains all of the metadata needed to render
a component definition, with the inline code written to a separate file, and a reference
to this code stored in the component.

Parameters:

  • code: inline code
  • attribute_path: path of the attribute in the component definition that contains
    the inline code, represented as a tuple of path components. For instance, the
    path for sql code in a Transform is ("operator", "sql_query", "sql")
  • resource_path: file path to which inline code is written
  • base_path: base path of dataflow or data service resource definition
  • base64_encoded: if set to True, inline code is base64 encoded

Module builder

Utility functions to create Ascend resource definitions from the corresponding protobuf
objects. The protobuf objects can be obtained via the 'imperative' API accessible through
the SDK client -
client.get_data_service(<data_service_id>)
client.get_dataflow(<dataflow_id>)

Functions

data_service_from_proto(client: ascend.sdk.client.Client, proto: ascend.protos.api.api_pb2.DataService) ‑> ascend.sdk.definitions.DataService

Build a DataService definition from the API DataService protobuf representation.
This is a utility function that helps bridge the declarative and imperative forms
of the SDK by transforming a DataService proto message to a DataService definition
that can be 'applied'.

Parameters:

  • client: SDK client
  • proto: DataService protobuf object
    Returns: a DataService definition
dataflow_from_proto(client: ascend.sdk.client.Client, data_service_id: str, proto: ascend.protos.api.api_pb2.Dataflow) ‑> ascend.sdk.definitions.Dataflow

Build a Dataflow definition from the API Dataflow protobuf representation.

Parameters:

  • client: SDK client
  • data_service_id: DataService id
  • proto: Dataflow protobuf object
    Returns: a Dataflow definition