Pandas DataFrame Properties

When creating a Python Read Connector, you must choose a connector code interface. With the Pandas DataFrame interface, Ascend reads in Pandas DataFrames.

Prerequisites

  • A Custom Python Connection

Required Python Functions

The following table describes the functions available when creating a new Python Read Connector utilizing Pandas DataFrame interface. Create a Python Read Connector using the information below and these step-by-step instructions.

FunctionUseDescription
contextCreates the session that your code will work within. Passes in a string from the Python Connection.We recommend completing the session setup with context, e.g. create the database connection, the HTTP session, etc. User input credentials are only available through this function.
list_objectsCreates a list of all data fragments identified by the fingerprint value.Ascend runs the list_objects function every time the read connector refreshes and only processes data fragments that either:
- Have a name that does not already exist in the previous refresh, or
- Have a name that exists in the previous refresh but has a fingerprint.

Each dictionary has three key values:
name - A string value associated with the name of each partition

fingerprint - A uniquely identifiable string associated with each partition

is_prefix - A boolean that represents whether or not the current partition holds any child partitions
read_pandas_dataframeReads the data and returns a Pandas DataFrame.Pandas DataFrames are loaded into the Python processing memory.

🚧

Out of Memory Exception

Because Pandas DataFrames are loaded into processing memory, large amounts of data can result in an out of memory exception.

Recursive list_objects

Metadata is a Python dictionary that defines a partition. Metadata is used in both list_objects and read_bytes. To trigger the recursive behavior within in list_objects and create partitions, set is_prefix to True. If a previously created partition is not recalled when generating list_objects, all previous partition metadata will be deleted.

🚧

When constructing your Python code, list_objects must return the partition metadata for all the partitions you expect to be in the component.

Example Pandas DataFrame Code

The following code example describes reading a spreadsheet for Google Sheets.

# This example reads a spreadsheet from Google Sheets to explain the functions to implement

import pandas as pd
from typing import Dict, Any, Iterator


def context(credentials) -> Dict[str, Any]:
   """
     Sets up the context for reading and listing data from data source.
     This is where the Python Connection information will be passed through. 
     Avoid opening a database connection. 
     """
  service_account_info = json.loads(credentials)

  creds = service_account.Credentials.from_service_account_info(service_account_info, scopes=SCOPES)
  g_sheet = build('sheets', 'v4', credentials=creds)
  drive = build('drive', 'v3', credentials=creds)

  return {
      'g_sheet_client': g_sheet,
      'drive_client': drive,
  }

def list_objects(context: Dict[str, Any], metadata) -> Iterator[Dict[str, Any]]:

  yield {'name': 'example_id', 'fingerprint': 'fingerprint', 'is_prefix': False}


def read_pandas_dataframe(context: Dict[str, Any], metadata) -> pd.DataFrame:
  """
    # Returns a Pandas DataFrame.
    """
  data = [['Scott', 50], ['Jeff', 45], ['Thomas', 54], ['Ann', 34]]
  # Create the pandas DataFrame
  return pd.DataFrame(data, columns=['Name', 'Age'])

Parsers

Ascend natively parses Pandas DataFrames and does not require additional parser configurations.