Free Trial Guided Tour: SparkSQL Transform

Introduction

If meteorologists wrote SQL, we figure it would look a little something like this. If you close your eyes, you can imagine a world where an ML algorithm is used (perhaps as a user-defined function or UDF) to even forecast the next day’s weather, as the result of a SQL query. We like meteorologists, so we don’t want to put them out of business, just make their jobs easier!

What This Demonstrates

We might not be predicting the future weather (yet!), but this demonstrates interesting features of SparkSQL available to you in Ascend.

  • A common-table-expression (CTE) to pull in a hardcoded list of meteorologists, and the days they cover for the weather report (John covers on Sunday, etc.);
  • An inline VALUES clause, which lets you create rows within SQL without even having to create an actual table;
  • The analytic function LAG with a window function, to look at the previous row’s values - and find compare it with that day’s values;
  • Some rolling totals that add up cumulative precipitation amounts by month, and by year, independently;
  • Standard JOIN syntax, combining data from an Ascend Transform, {{Weather_Transform}}, and the CTE on the fly;
  • The same number of records as the original input. Note that the aggregates were calculated without the need for a GROUP BY clause!

Instructions

  1. Right-click on “Weather Transform” > Choose “Create new Transform”
  2. Replace the NAME of the Transform with “Meteorology Analytics”
  3. Replace the DESCRIPTION with “**Shows meteorologist assignments, cumulative precipitation values, and previous day’s weather” for each day.**” (minus the quotes). Notice the description includes markdown tags (** to bold the text)!
  4. Replace the QUERY text with the SQL below
  5. Click the “Create” button
  6. Have fun checking out meteorology analytics and being a SQL-wielding weather reporter!
WITH Meteorologist_Assignments AS (
   SELECT * FROM VALUES
       (1,'John'),
       (2,'Jane'),
       (3,'Bob'),
       (4,'Bill'),
       (5,'Jill'),
       (6,'Sue'),
       (7,'Fred')
   AS t(day_of_week,meteorologist)
)
SELECT
   w.Date,   
   w.Weather,
   NVL(LAG(Weather, 1) OVER (ORDER BY Date),'Not Reported') AS Previous_Days_Weather,
   CASE
       WHEN w.Weather = NVL(LAG(Weather, 1) OVER (ORDER BY Date),'Not Reported') THEN False
       ELSE True
       END AS Weather_Changed,
   w.Precipitation as Days_Precipitation,
   SUM(w.precipitation) OVER (PARTITION BY DATE_TRUNC('MONTH',w.weather_date_ts) ORDER BY Date) AS Monthly_Cumulative_Precipitation,
   SUM(w.precipitation) OVER (PARTITION BY DATE_TRUNC('YEAR',w.weather_date_ts) ORDER BY Date) AS Yearly_Cumulative_Precipitation,
   ma.Meteorologist
FROM
   {{Weather_Transform}} as w
   INNER JOIN Meteorologist_Assignments AS ma
   ON DAYOFWEEK(w.weather_date_ts) = ma.day_of_week
WHERE
   w.Weather IS NOT NULL
ORDER BY
   Date