Allowing external data access into Ascend can be beneficial for a number of use cases, but it's important to understand the two primary means one can use to accomplish this:
- External data access via the Web API
- External data access via the Structured Data Lake (SDL) API
The Web API allows access to Ascend dataflow components and their associated data from most HTTP clients. We recommend using the Ascend Python SDK, rather than coding directly to the API, which are subject to change. Please note: The Ascend Python SDK requires Python 3.
A common use case for the Web API is to use a Python notebook such as Jupyter or Zeppelin for navigating and accessing dataflows within Ascend as well as streaming records from any component within a dataflow into the notebook for additional processing. Over time additional capabilities such as a Queryable Dataflows SDK and CLI (via python SDK) will leverage the Web API.
The Ascend Structured Data Lake (SDL) provides direct access to data in Ascend via a high-speed byte-level S3-compatible API. The data is formatted as Parquet and is most useful in environments having separate Spark infrastructure or applications that can natively ingest Parquet.
A common use case for SDL is to use a Spark notebook such as Databricks, Jupyter or Zeppelin for accessing Parquet data directly from Ascend for use with Spark dataframes natively. This provides users a more flexible (more than SQL) interface and high-bandwidth path for processing fragment data with custom code in a secure manner. For example, a Data Scientist can read data from Ascend into a notebook and perform further analysis via a dataframe in Spark or even Pandas.
Other use cases include:
- Any external spark infrastructure such as Azure Databricks, Amazon EMR, or Google Dataproc.
- Any application that can natively work with parquet formatted data such as Presto
- Most S3-compatible clients, such as Amazon's
Updated about a month ago