Ascend Developer Hub

Primary Use Cases

Zeppelin notebooks offer a web-based interactive development environment for coding and accessing data. They're generally used to support a wide range of workflows in data science, scientific computing, and machine learning. The integration of Zeppelin notebooks with Ascend is a natural extension and there are two primary ways in which you can use them to access data on the Ascend platform:

  • External data access via the Records API
  • External data access via the Bytes API

Records API

The Records API is provided via the Ascend Python SDK which is a complete library of methods for externally accessing the various components and respective data partitions within the Ascend environment. Please note: The Ascend Python SDK is based on version 3.

Leveraging the Python SDK a user can use a Zeppelin notebook to navigate and access dataflows within Ascend as well as stream records from any component within a dataflow into the notebook for additional processing.

The Ascend SDK for Python, including documenting on installation, authorization, and usage can be found here. With this SDK you can read data from Ascend Components and Data Feeds, and examine Dataflow metadata.

Bytes API

The Bytes API is a high-speed byte-level API used for externally accessing data directly within Ascend via an S3 compatible interface. The data is formatted as parquet and is most useful in environments having separate Spark infrastructure or applications that can natively ingest parquet.

Leveraging the Bytes API a user can use a Zeppelin notebook with native Spark to access parquet data directly from Ascend that can be used with Spark dataframes natively. This provides users a more flexible (more than SQL) interface and high-bandwidth path for processing fragment data with custom code in a secure manner. For example, a Data Scientist can read data from Ascend into a Zeppelin notebook and perform further analysis via a dataframe in Spark or even Pandas. Other use cases for the Bytes API include:

  • Any external spark infrastructure such as Databricks Spark SQL, Tensorflow or Delta Lake.
  • Any application that can natively work with parquet formatted data such as Presto

Service Account Configuration

Before you can begin using either the Ascend python SDK to access the Records API or the Bytes API you will need to create a service account first. You'll be using this service account in your notebook to securely access various components and data within your Ascend environment. Go to the Service Accounts documentation for more information on creating and managing them. Once you have your Service Account created along with the keys you can proceed to the examples below.

Records API (Python SDK) Examples

Data Feed Example

Data Feeds - This example shows how to:

  • List available Data Feeds
  • Connect to a Data Feed
  • Stream Records
  • Connect to a Data Service
  • List the Dataflows in a Data Service
  • Get a Dataflow from a Data Service
  • List Components in a Dataflow
  • Get a Component from a Dataflow
  • Streaming the Records Generated by a Component

Bytes API Examples using PySpark

Accessing the Ascend Structured Data Lake from Pyspark

Here is a demonstration of a notebook using Ascend's Structured data lake to access component data with Pyspark.

Ascend Structured Data Lake from Pyspark - This example shows how to:

  • Read data from a data feed
  • Read data from a component

Updated 9 months ago


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.