After you have an Ascend Read Connection set up, then you continue to set up the Read Connector in your dataflow.
The first thing on the top is a highlighted box with the new Kafka connection, with an EDIT button which can use to modify the connection.
CONNECTOR INFO (Figure 1)
- Name (required): The name to identify this connector with.
- Description (optional): Description of what data this connector will read.
BROWSE CONNECTION: Click this button (Figure 2) to explore resource and locate assets to ingest. This will give you access to the Kafka topics in a modal dialog. Select the topic you want to ingest and press confirm.
Starting Position: You can chose to read all the messages from the beginning (Earliest) or just after the last one (Latest).
Consumer Configs: Override the configs at the connection level for this connector.
- Single Topic: Topic to subscribe to
- Multi-Topic: Provide a Topic Pattern to subscribe to all the topics matching the reg-ex.
Because we are sharing this consumer connection group at the Kafka connection level, they should not be consuming from the same topics with 2 different Read Connectors on the same connection because they will commit offsets on each other.
- Key/Value Deserializer: Before transmitting the entire message to the broker, to let the producer know how to convert the message into byte array we use serializers. Similarly, to convert the byte array back to the object we use the deserializers by the consumer.
- String Deserializer: Convert byte array(containing string) to string
- Binary Deserializer (B64 Encoded): Convert byte array (containing formats other than strings like AVRO) to Binary and then base64 encode it.
Once you click on the GENERATE SCHEMA button, the parser will create a schema and a data preview will be populated as in the image above. (Figure 3)
NOTE: Every Kafka connector will always have the same schema unlike some of the other connector types you may be aware of. Visit this for more information on the schema.
- Add schema column: Add a custom column to the generated schema
When resources are constrained, Processing Priority will be used to determine which components to schedule first.
Update the status of the read connector by marking it either Running to mark it active or Paused to pause the connector from running.
The refresh schedule specifies how often Ascend checks the data location to see if there's new data. Ascend will automatically kick off the corresponding big data jobs once new or updated data is discovered.
Higher priority numbers are scheduled before lower ones. Increasing the priority on a component also causes all its upstream components to be prioritized higher. Negative priorities can be used to postpone work until excess capacity becomes available.
Updated 11 months ago