Bytes Connector Code Properties

When creating a Python Read Connector, you must choose a connector code interface. With the Bytes Connector Code interface, Ascend reads in a stream of bytes.

Prerequisites

  • A Custom Python Connection

Required Python Functions

The following table describes the functions available when creating a new Python Read Connector utilizing Bytes Connector Code interface. Create a Python Read Connector using the information below and these step-by-step instructions.

FunctionUseDescription
contextCreates the session that your code will work within. Passes in a string from the Python Connection.We recommend completing the session setup with context, e.g. create the database connection, the HTTP session, etc. User input credentials are only available through this function.
list_objectsCreates a list of all data fragments identified by the fingerprint value.Ascend runs the list_objects function every time the read connector refreshes and only processes data fragments that either:
- Have a name that does not already exist in the previous refresh, OR
- Have a name that exists in the previous refresh but has a fingerprint.

Each dictionary has three key values:
name - A string value associated with the name of each partition

fingerprint - A uniquely identifiable string associated with each partition

is_prefix - A boolean that represents whether or not the current partition holds any child partitions.
read_bytesWhere the raw data for a Read Connector is processed and scanned. Generates the rows and columns for 1 individual data fragment.The read_bytes function will run once for every data partition that needs to be processed from the list_objects function (e.g. read_bytes will run 10 times if the list_objects yielded 10 data fragments.)

The read_bytes function will not be run multiple times for the same partion. Additionally, if read_bytes throws an uncaught exception, the entire read connector will error out.

It's acceptable to yield data points in a format other than CSV, as long as the Parser is configured properly.

πŸ‘

If you want new lines to be interpreted within the string of bytes, you must explicitly return new lines with \n.

Recursive list_objects

Metadata is a Python dictionary that defines a partition. Metadata is used in both list_objects and read_bytes. To trigger the recursive behavior within in list_objects and create partitions, set is_prefix to True. If a previously created partition is not recalled when generating list_objects, all previous partition metadata will be deleted.

🚧

When constructing your Python code, list_objects must return the partition metadata for all the partitions you expect to be in the component.

Example Bytes Connector Code

The following code example describes reading a spreadsheet for Google Sheets.

# This example reads a spreadsheet from Google Sheets to explain the functions to implement

import csv
import io
import json

from google.oauth2 import service_account
from googleapiclient.discovery import build

SPREADSHEET_ID = '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms'
RANGE_NAME = 'Class Data!A2:E'

SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly', 'https://www.googleapis.com/auth/drive.metadata.readonly']


def context(credentials):
      """
     Sets up the context for reading and listing data from data source.
     This is where the Python Connection information will be passed through. 
     Avoid opening a database connection. 
     """
  service_account_info = json.loads(credentials)

  creds = service_account.Credentials.from_service_account_info(service_account_info, scopes=SCOPES)
  g_sheet = build('sheets', 'v4', credentials=creds)
  drive = build('drive', 'v3', credentials=creds)

  return {
      'g_sheet_client': g_sheet,
      'drive_client': drive,
  }


def list_objects(context, metadata):
  """
    This custom read connector processes 1 google spreadsheet with the ID listed above.
    """
  fingerprint = context['drive_client'].files().get(fileId=SPREADSHEET_ID, fields='modifiedTime').execute()['modifiedTime']

  yield {'name': SPREADSHEET_ID, 'fingerprint': fingerprint, 'is_prefix': False}


def read_bytes(context, metadata):
  """
    Returns a byte stream that represents all data to return for a given ReadConnector configuration
    """
  sheet = context['g_sheet_client'].spreadsheets()
  result = sheet.values().get(spreadsheetId=metadata['name'], range=RANGE_NAME).execute()
  values = result.get('values') or []

  for row in values:
    strbuf = io.StringIO()
    w = csv.writer(strbuf)
    w.writerow(row)
    yield strbuf.getvalue() '\n'

Parsers

Some Python Read Connector parsers have additional required or optional properties. For more information, see
Read Connector Parsers.