Allowing external data access into Ascend can be beneficial for a number of use cases, but it's important to understand the two primary means one can use to accomplish this:
- External data access via the Records API
- External data access via the Bytes API
The Records API is provided via the Ascend Python SDK which is a complete library of methods for externally accessing the various components and respective data partitions within the Ascend environment. Please note: The Ascend Python SDK is based on version 3.
The most common use case is to use a notebook such as Jupyter or Zeppelin for navigating and accessing dataflows within Ascend as well as streaming records from any component within a dataflow into the notebook for additional processing. Over time additional capabilities such as a Queryable Dataflows SDK and CLI (via python SDK) will become available and will likely use the Records API. More details on configuration and usage can be found in the Jupyter and Zeppelin sections of the documentation.
The Bytes API is a high-speed byte-level API used for externally accessing data directly within Ascend via an S3 compatible interface. The data is formatted as parquet and is most useful in environments having separate Spark infrastructure or applications that can natively ingest parquet.
The most common use case is to use a notebook such as Jupyter or Zeppelin for accessing parquet data directly from Ascend that can be used with Spark dataframes natively. This provides users a more flexible (more than SQL) interface and high-bandwidth path for processing fragment data with custom code in a secure manner. For example, a Data Scientist can read data from Ascend into a notebook and perform further analysis via a dataframe in Spark or even Pandas. More details on configuration and usage can be found in the Jupyter and Zeppelin sections of the documentation. Other use cases include:
- Any external spark infrastructure such as Databricks Spark SQL, Tensorflow or Delta Lake.
- Any application that can natively work with parquet formatted data such as Presto