Ascend Developer Hub

Custom Read Connectors

Creating a Custom Read Connector

Prerequisites:

  • Python code that can read in data and produce rows and columns
  • Access credentials
  • Data schema for the dataset
  • Business logic on how incremental data should be processed

Select Custom

Click the Custom icon on the Connectors panel

Custom Code and PIP Install

Ascend requires you to implement 3 python functions in your custom read connector:

  • context: Creates the session that your code will work within.
  • list_objects: Creates a list of all data fragments identified by the fingerprint value.
  • read_bytes: Generates the rows and columns for 1 individual data fragment.

πŸ‘

context function

We recommend to do all the session setup here, e.g. create the database connection, the HTTP session etc. This is the only function within which user input credentials will be made available.

# context function creates the session for rest of the code to work with.

def context(credential):
    return {'credentials': credential}

πŸ‘

list_objects function

Ascend runs the list_objects function every time the read connector refreshes and only processes data fragments that either:

  1. Have a name that does not already exist in the previous refresh.
    -- Or --
  2. Have a name that exists in the previous refresh but has a fingerprint.
# list_objects function creates a list of all data fragments identified by the fingerprint value

def list_objects(ctx, metadata):
    if metadata:
    obj = {'name': 'fragment1', 'fingerprint': 'md51', 'is_prefix': False}
    yield obj
    obj = {'name': 'fragment2', 'fingerprint': 'md52', 'is_prefix': False}
    yield obj
  else:
    obj = {'name': 'folder1', 'fingerprint': 'fmd51', 'is_prefix': True}
    yield obj

πŸ‘

read_bytes function

  1. The read_bytes function will run once for every data fragment that needs to be processed from the list_objects function (e.g. read_bytes will run 10 times if the list_objects yielded 10 data fragments.)
  2. The read_bytes function will not be run multiple times for the same fragment. Additionally, if read_bytes throws an uncaught exception, the entire read connector will error out.
  3. It's acceptable to yield data points in a format other than CSV, as long as the Parser is configured properly.
# read_bytes function generates the rows and coulumns for 1 individual data fragment

def read_bytes(ctx, metadata):
    yield ",".join(metadata['name'], metadata['fingerprint'])

Below is a screenshot where 3 Python functions are be implemented as code in the Ascend editor.

Ascend also supports installing of any pip packages via the PIP PACKAGES TO INSTALL area as shown below. Please contact Ascend at [email protected] if you want to connect to private packages and libraries.

Test Locally

The ability to test the code locally can be extremely helpful and expedite the connector development process. Please follow the instructions to download the wrapper in order to test the code locally.

Parsers & Schema

Data formats currently available are: Avro, Grok, JSON, Parquet and XSV. However, you can create your own parser functions or define a UDP (User Defined Parser) to process a file format.

For custom read connectors, you will need to create the schema with column names and data types as well as specifying default values etc.

Updated 11 months ago

Custom Read Connectors


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.