Bytes Connector Code Properties
When creating a Python Read Connector, you must choose a connector code interface. With the Bytes Connector Code interface, Ascend reads in a stream of bytes.
Prerequisites
- A Custom Python Connection
Required Python Functions
The following table describes the functions available when creating a new Python Read Connector utilizing Bytes Connector Code interface. Create a Python Read Connector using the information below and these step-by-step instructions.
Function | Use | Description |
---|---|---|
context | Creates the session that your code will work within. Passes in a string from the Python Connection. | We recommend completing the session setup with context , e.g. create the database connection, the HTTP session, etc. User input credentials are only available through this function. |
list_objects | Creates a list of all data fragments identified by the fingerprint value. | Ascend runs the list_objects function every time the read connector refreshes and only processes data fragments that either:- Have a name that does not already exist in the previous refresh, OR - Have a name that exists in the previous refresh but has a fingerprint. Each dictionary has three key values: name - A string value associated with the name of each partitionfingerprint - A uniquely identifiable string associated with each partitionis_prefix - A boolean that represents whether or not the current partition holds any child partitions. |
read_bytes | Where the raw data for a Read Connector is processed and scanned. Generates the rows and columns for 1 individual data fragment. | The read_bytes function will run once for every data partition that needs to be processed from the list_objects function (e.g. read_bytes will run 10 times if the list_objects yielded 10 data fragments.)The read_bytes function will not be run multiple times for the same partion. Additionally, if read_bytes throws an uncaught exception, the entire read connector will error out.It's acceptable to yield data points in a format other than CSV, as long as the Parser is configured properly. |
If you want new lines to be interpreted within the string of bytes, you must explicitly return new lines with
\n
.
Recursive list_objects
list_objects
Metadata is a Python dictionary that defines a partition. Metadata is used in both list_objects
and read_bytes
. To trigger the recursive behavior within in list_objects
and create partitions, set is_prefix
to True
. If a previously created partition is not recalled when generating list_objects
, all previous partition metadata will be deleted.
When constructing your Python code,
list_objects
must return the partition metadata for all the partitions you expect to be in the component.
Example Bytes Connector Code
The following code example describes reading a spreadsheet for Google Sheets.
# This example reads a spreadsheet from Google Sheets to explain the functions to implement
import csv
import io
import json
from google.oauth2 import service_account
from googleapiclient.discovery import build
SPREADSHEET_ID = '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms'
RANGE_NAME = 'Class Data!A2:E'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly', 'https://www.googleapis.com/auth/drive.metadata.readonly']
def context(credentials):
"""
Sets up the context for reading and listing data from data source.
This is where the Python Connection information will be passed through.
Avoid opening a database connection.
"""
service_account_info = json.loads(credentials)
creds = service_account.Credentials.from_service_account_info(service_account_info, scopes=SCOPES)
g_sheet = build('sheets', 'v4', credentials=creds)
drive = build('drive', 'v3', credentials=creds)
return {
'g_sheet_client': g_sheet,
'drive_client': drive,
}
def list_objects(context, metadata):
"""
This custom read connector processes 1 google spreadsheet with the ID listed above.
"""
fingerprint = context['drive_client'].files().get(fileId=SPREADSHEET_ID, fields='modifiedTime').execute()['modifiedTime']
yield {'name': SPREADSHEET_ID, 'fingerprint': fingerprint, 'is_prefix': False}
def read_bytes(context, metadata):
"""
Returns a byte stream that represents all data to return for a given ReadConnector configuration
"""
sheet = context['g_sheet_client'].spreadsheets()
result = sheet.values().get(spreadsheetId=metadata['name'], range=RANGE_NAME).execute()
values = result.get('values') or []
for row in values:
strbuf = io.StringIO()
w = csv.writer(strbuf)
w.writerow(row)
yield strbuf.getvalue() '\n'
Parsers
Some Python Read Connector parsers have additional required or optional properties. For more information, see
Read Connector Parsers.
Updated about 1 month ago