Zeppelin File-Based Access
Zeppelin notebooks offer a web-based interactive development environment for coding and accessing data. They're generally used to support a wide range of workflows in data science, scientific computing, and machine learning. The integration of Ascend and a Zeppelin notebook creates a direct link to live data, without having to first go through any intermediate storage. Ascend's File-Based Access powers this integration.
Prerequisites
- A Python development environment with Zeppelin Notebooks installed
- An Ascend Access Key ID and Secret, either through a Service Account or a Developer Access Key.
Reading Data to a Pandas DataFrame
Ensure development environment is setup with Pandas, PyArrow, and s3fs
Version Incompatibility in s3fs and pyarrow
There is currently an issue between s3fs 0.5 and pyarrow. We recommend installing s3fs 0.4.2.
Follow the below "Recipe" to see the annotated code:
Reading Data to a PySpark DataFrame
Ensure development environment is setup with PySpark
Follow the below "Recipe" to see the annotated code:
Updated 9 months ago