5.11.2023 Release Notes

These are the release notes for May 11, 2023

:rocket: FEATURES :rocket:

  • Gen2 Environments will now scale down to zero compute nodes when Ascend Clusters are inactive.
    • Cluster pools must be configured to allow scaling to zero. Inactive clusters are those which can be terminated because the cluster termination timeout was reached without a job being run.
  • Support partition by column in BigQuery Read Connector and My Tables when the data source is a BigQuery View.
  • Add an option in Snowflake Data Plane configuration that allows configuring the warehouse used by Ascend to run metadata queries.
    • Metadata queries that are sent to this warehouse include the queries that enable the records preview tab in components, and Data Plane usage metric gathering queries run in the backend.
    • By default, Ascend will use the warehouse defined in the connection used for the Data Plane for metadata queries. Now, you can also configure the warehouse used for metadata queries in the Data Plane configuration for each Data Service.

:sparkles: ENHANCEMENTS :sparkles:

  • Enhance the Observe "Jobs Usage" dashboard by adding ability to group usage by cluster name. This applies to usage collected from different Data Plane types. If a cluster name is not defined, it will be grouped under the cluster name "UNKNOWN".
  • Enable Ascend Cluster(s) to accept some types of work immediately after launch, by not waiting for executors to be ready during launch.
  • Improve performance of component execution and Ascend Control Plane/scheduler in Gen1 environments by disabling automatic collection of string statistics (does not affect Gen2 Data Plane environments).
  • Optimize Snowflake Data Plane warehouse usage queries:
    • Collect usage with a single "show warehouses" query per Snowflake account.
    • Avoid sending "show warehouses" usage queries to Snowflake accounts that have no active warehouses.
  • Add support for an optional infer_schema method in Snowpark transforms that allows overriding how schema is inferred.
    • The arguments to this function are the same as for the transform method, except that empty DataFrames corresponding to each input are passed in as arguments.
  • Add data version field to support triggering a full reprocessing of Snowflake Read and Write connectors.
  • Import all partitions separately in BigQuery My Tables if the source table is partitioned in BigQuery.
  • Jobs running on BigQuery are now tagged with labels. These labels include the Data Service, Dataflow, and Component associated with the query.
  • Modify list query for BigQuery My Tables to avoid Out Of Memory Error.
  • Schema generation runs faster in BigQuery Read Connectors when the source is a BigQuery View.
  • Improve usage metric reporting for Databricks Data Plane by excluding inactive clusters and SQL warehouses from usage metric polling queries.

:wrench: BUGFIXES :wrench:

  • Modify behavior for schema inference in Snowpark transform(s) so that input DataFrames are empty (vs. containing 1 dummy record). This aligns the behavior with existing types of transform components (i.e. PySpark Transforms).
  • Fix a bug where a Dataflow's component graph wouldn't display some Data Feed subscriber components.
  • Fix Ascend Cluster error that occurred when configuring the cluster for local mode with no executors (when the max # of executors and the min # of executors are both set to 0).
  • Fix an issue where a new Ascend Cluster could be not created because it had the same index as an existing cluster.
  • Mitigate BigQuery SQL Transform error Resources exceeded during query execution... Too many subqueries or query is too complex encountered during batch partition query execution.