Read Connector

Create New Read Connector

Figure 1Figure 1

Figure 1

The first thing on the top is a highlighted box with the AWS S3 connection, with an EDIT button which can use to modify the connection.

CONNECTOR INFO (Figure 1)

  • Name (required): The name to identify this connector with.
  • Description (optional): Description of what data this connector will read.

Connector Configuration

Figure 2Figure 2

Figure 2

  • BROWSE CONNECTION: Click this button (Figure 3) to explore resource and locate assets to ingest. This will give you access to the S3 bucket in a modal dialog (Figure 2), where you can navigate to the desired asset to be imported. Select the assets you want to ingest and press confirm.
Figure 3Figure 3

Figure 3

  • OBJECT PATTERN MATCHING (required): The pattern used to identify eligible files
    • Glob: Glob applies a pattern matching algorithm.
    • Match: Matches the pattern precisely character-for-character.
    • Prefix: Matches the pattern with the specified prefix.
    • Regex: Regex applies a pattern matching algorithm.
  • OBJECT PATTERN (required): The pattern used to identify eligible files. Example: find all the files with Prefix Flights_.
  • PARSER: Data formats currently available are: JSON, ORC, Paraquet, CSV, Text and Avro.
Figure 4Figure 4

Figure 4

Let's look at the following table to learn about the configurations that are parser specific.

CONFIGURATIONS

Description

CSV

JSON

Parquet

ORC

Text

Avro

BUCKET

Which S3 bucket to ingest asset from.Which S3 bucket to ingest asset from.

Required

Required

Required

Required

Required

Required

Enable Multiline values

In most CSV files, each record stands on its own line, with each field separated by a comma. But there can be cases where
we have a multi-line field, for example lyrics of a song. Fields including line breaks are enclosed in double-quotes. This
checkbox informs the parser that there can be multi-line fields in the ingested data.

Optional

N/A

N/A

N/A

N/A

N/A

TIMESTAMP FORMAT

Format of the timestamp type field in the CSV file.

Optional

Optional

N/A

N/A

N/A

N/A

FIELD DELIMITER

Field delimiter to be used if not comma.

Optional

N/A

N/A

N/A

N/A

N/A

QUOTE CHARACTER

A single character used for escaping quoted values where the separator can be part of the value. If you would like to turn of
quotations, you need to set not null but an empty string.

Optional

N/A

N/A

N/A

N/A

N/A

ESCAPE CHARACTER

A single character used for escaping quotes inside an already quoted value.

Optional

N/A

N/A

N/A

N/A

N/A

COMMENT PREFIX

A single character used for skipping lines beginning with this character. By default, it is disabled.

Optional

N/A

N/A

N/A

N/A

N/A

Files have header row

Tells the parser that the first line has the headers which should be used as column names in the schema.

Optional

N/A

N/A

N/A

N/A

N/A

DATE FORMAT

Format of the date type field in the CSV file.

Optional

Optional

N/A

N/A

N/A

N/A

PATH DELIMITER

Example, a line(\n) de-limited file

Optional

Optional

Optional

Optional

Optional

Optional

Figure 5Figure 5

Figure 5

Generate Schema

Once you click on the GENERATE SCHEMA button, the parser will create a schema and a data preview will be populated as in the image below.

  • Add schema column: Add a custom column to the generated schema
Figure 6Figure 6

Figure 6

Component Pausing

Figure 7Figure 7

Figure 7

Update the status of the read connector by marking it either Running to mark it active or Paused to pause the connector from running.

Refresh Schedule

The refresh schedule specifies how often Ascend checks the data location to see if there's new data. Ascend will automatically kick off the corresponding big data jobs once new or updated data is discovered.

Figure 8Figure 8

Figure 8

Processing Priority (optional)

When resources are constrained, Processing Priority will be used to determine which components to schedule first.

Figure 9Figure 9

Figure 9

Higher priority numbers are scheduled before lower ones. Increasing the priority on a component also causes all its upstream components to be prioritized higher. Negative priorities can be used to postpone work until excess capacity becomes available.

Updated a day ago


Read Connector


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.