GCS Read Connector

Create a New Read Connector

After you have an Ascend Read Connection set up, then you continue to set up the Read Connector in your dataflow.

1978

Figure 1

  • NAME: Read connector name to identify the specific data being collected.
  • DESCRIPTION: Additional information about the connector.
2232

Figure 2

  • BROWSE CONNECTION: Click this button to explore resource and locate assets to ingest. This will give you access to the GCS bucket where you can navigate to the desired resource. Select the assets you want to ingest and press CONFIRM.
1092

Figure 3

  • OBJECT PATTERN MATCHING: The pattern used to identify eligible files
    • Glob: Glob applies a pattern matching algorithm.
    • Match: Matches the pattern precisely character-for-character.
    • Prefix: Matches the pattern with the specified prefix.
    • Regex: Regex applies a pattern matching algorithm.
  • OBJECT PATTERN: The pattern used to identify eligible files.
  • PARSER: Data formats currently available are: JSON, ORC, Paraquet, CSV, Text and Avro.
  • BUCKET: GSC Bucket the resource is coming from.
  • PATH DELIMETER: Delimiting character to represent a desired path.

Let's look at the following table to learn about the configurations that are parser specific.

CONFIGURATIONSDescriptionCSVJSONParquetORCTextAvro
Enable Multiline valuesIn most CSV files, each record stands on its own line, with each field separated by a comma. But there can be cases where
we have a multi-line field, for example lyrics of a song. Fields including line breaks are enclosed in double-quotes. This
checkbox informs the parser that there can be multi-line fields in the ingested data.
OptionalN/AN/AN/AN/AN/A
TIMESTAMP FORMATFormat of the timestamp type field in the CSV file.OptionalOptionalN/AN/AN/AN/A
FIELD DELIMITERField delimiter to be used if not comma.OptionalN/AN/AN/AN/AN/A
QUOTE CHARACTERA single character used for escaping quoted values where the separator can be part of the value. If you would like to turn of
quotations, you need to set not null but an empty string.
OptionalN/AN/AN/AN/AN/A
ESCAPE CHARACTERA single character used for escaping quotes inside an already quoted value.OptionalN/AN/AN/AN/AN/A
COMMENT PREFIXA single character used for skipping lines beginning with this character. By default, it is disabled.OptionalN/AN/AN/AN/AN/A
Files have header rowTells the parser that the first line has the headers which should be used as column names in the schema.OptionalN/AN/AN/AN/AN/A
DATE FORMATFormat of the date type field in the CSV file.OptionalOptionalN/AN/AN/AN/A

Generate Schema

Once you click on the GENERATE SCHEMA button, the parser will create a schema and a data preview will be populated as in the image below.

  • Add schema column: Add a custom column to the generated schema.
1188

Figure 4

Refresh Schedule

The refresh schedule specifies how often Ascend checks the data location to see if there's new data. Ascend will automatically kick off the corresponding big data jobs once new or updated data is discovered.

746

Figure 5

Processing Priority (optional)

680

Figure 6

  • Assigned Priority: Is used to determine which components to schedule first. Higher priority numbers are scheduled before lower ones. Increasing the priority on a component also causes all its upstream components to be prioritized higher. Negative priorities can be used to postpone work until excess capacity becomes available.