PASS logo

November 15-18

Seattle & online

2021 Summit video library

Building an Azure Data Science Environment in Azure Synapse Analytics Workspaces

Bradley Ball

In session we will build an end to end data platform workload. We will utilize the new Azure Synapse Analytics Workspaces environments to access Azure Data Factory and the Integrated Hosted Runtime to load data to a Data Lake, Azure Synapse Analytics and Polybase to ingest the data, Azure Synapse Spark Worker Pools and Python to generate additional data sets, and Power BI to visualize our results. We are using data provided by RetroSheets.org to create a dataset of play by play data and box scores for Major League Baseball games from 1922 to 2018. We examine how winning and losing seasons, home or away games, opponents, interleague play, and even temperature at opening pitch effect a team. We set out to discover the answer to questions every baseball fan wants to know! What is the best day of the week to see your favorite team win?