Zeppelin File-Based Access

500

Zeppelin notebooks offer a web-based interactive development environment for coding and accessing data. They're generally used to support a wide range of workflows in data science, scientific computing, and machine learning. The integration of Ascend and a Zeppelin notebook creates a direct link to live data, without having to first go through any intermediate storage. Ascend's File-Based Access powers this integration.

Prerequisites

Reading Data to a Pandas DataFrame

Ensure development environment is setup with Pandas, PyArrow, and s3fs

🚧

Version Incompatibility in s3fs and pyarrow

There is currently an issue between s3fs 0.5 and pyarrow. We recommend installing s3fs 0.4.2.

Follow the below "Recipe" to see the annotated code:

Reading Data to a PySpark DataFrame

Ensure development environment is setup with PySpark

Follow the below "Recipe" to see the annotated code: