Today we have Spyros Sakellariadis, joining us again to continue our series of how to use Power BI to monitor real-time IoT events. Take it away Spyros!
By Spyros Sakellariadis
Who hasn’t wanted to have a personalized view of the vehicle traffic on the route to and from their office, for example? Let’s do it by getting data from a public data feed into Azure and Power BI! At the same time, this will serve as an example of how to get data from thousands of public data feeds and visualize them with Power BI.
In a previous post, I showed how to get data from IoT sensors into Azure and Power BI based upon a simple infrastructure that we published in an open source project called ConnectTheDots. The constraint in that scenario is that those sensors need to be ones which you can access and program, and probably own. There is lots of valuable information, however, in public data sources which it would be great to be able to munge and view.
To keep things simple, I’m going to use the same basic infrastructure to get the traffic data into Power BI, as I used in the previous post – with a twist. Instead of connecting the data sources (sensors in the previous blog, the WSDOT web service in this one) to a gateway running on a Raspberry Pi, we will run that same gateway code in Azure. Once we have pulled data from the WSDOT web service and sent it to an Azure Event hub, the rest is basically the same as we did last time. Easy as pie.
Acquiring the data
Since I live in the Seattle area, I am interested in knowing about the traffic in the state of Washington. I really, really care about the traffic on the two bridges separating Seattle and Bellevue (on highways I-90 and I-520), as I have to drive over them daily. I thought it might be a daunting task to collect that data, but fortunately the Washington State Department of Transportation provides data for 170 permanent traffic recorder locations in Washington State, including those two highways. Most are updated hourly, some as frequently as every 20 seconds. This data is accessed through a single gateway and API, as described on their website at http://wsdot.wa.gov/Traffic/api/. If you want to understand the traffic on the roadways, selecting the Traffic Flow link shows the function calls and fields retrieved, and the Help link shows how to call the service:
The site also contains a button to request the access code to include in the web service call. If you just paste that URL into your browser, you will get a batch of data similar to the following, refreshed every 20 seconds:
76, 4/29/2015 3:36:22 PM, Olympic, 005es13330, 005, Tacoma Ave, 133.3, 47.231150214,-122.441859377, 4, NB
267, 4/29/2015 3:36:22 PM, Olympic, 016es00965, 016,36th St., 9.65, 47.291790549, -122.564682487, 1, EB
282, 4/29/2015 3:36:22 PM, Olympic, 016es00418, 016, Jackson DS, 4.18, 47.258535035, -122.527057529, 1, EB
At this point, we have our data source. Now it is just a matter of getting it into Azure and Power BI.
Getting the data into Azure
To get the data from the WSDOT web service we wrote a simple Azure worker role that pulls data from the WSDOT URL shown above every 20 seconds, parses the data stream, and immediately sends it to an Azure instance of the ConnectTheDots gateway. Sample code, which you will need to modify, is available on the TrafficFlow branch of ConnectTheDots on GitHub.
In the TrafficFlow worker role we parsed the data as JSON and shortened the names provided by the WSDOT schema. The first row of the data shown above would be parsed and formatted as follows:
"LocationDescription": "Tacoma Ave ",
And here is an export to Excel of a few of the 5,742,180 records received in 3 weeks (yes, 5.7M records!):
Now we just have to get this data into Azure, but it is worth going on one last tangent before showing how to construct the Power BI dashboard. As I mentioned earlier, I am interested in the traffic on the I-520 bridge as I have to commute over it. We can take the data for the last few weeks and filter it for (Road Name = 520 AND Location Description = Midspan), map the Flow Value to a color, and voila, the eastbound traffic pattern emerges:
Note one thing, though, that will be relevant when we create the Power BI dashboard – the time reported by the WSDOT web service and forwarded to our Event Hub is UTC, so the above table is really showing traffic around 1pm Seattle time, not 8pm.
Creating the Power BI dashboard
Once our Azure worker role has sent the data to an Azure Event Hub, we can create a Stream Analytics job that queries the Event Hub with an Output to Power BI, just like we did in the IoT example from the previous blog. In this case, the query we have created is as follows:
WHEN Value <2 THEN 'Light'
WHEN Value <3 THEN 'Moderate'
WHEN Value <4 THEN 'Heavy'
DATETIMEFROMPARTS(DATEPART(year,TimeCreated),DATEPART(month,TimeCreated), DATEPART(day,TimeCreated), DATEPART(hour,TimeCreated)-7,DATEPART(minute,TimeCreated), DATEPART(second,TimeCreated),00) AS TimeCreatedPST
TIMESTAMP BY TimeCreated
The only tricky thing in the SQL query is to convert the UTC time into PST and adjust for daylight savings time. It would be more flexible to do this in the worker role in C#, where we could query the difference between UTC and PST and apply that instead of hard-coding 7 hours into the statement above, but we decided it was best to keep everything in UTC in the worker role to make it easier to combine with results from multiple time zones.
When creating the output to Power BI you need to specify a DATASET NAME, which will appear in the Dataset list on the Microsoft Power BI dashboard. I chose to call the database “WSDOTdata”, and the corresponding table “WSDOTtable”.
For detailed instructions on creating a Power BI dashboard, please refer to the previous blog. The essential steps to create a first report are repeated here:
1. Log in to http://app.powerbi.com, then create a dashboard by clicking “+” in the left menu and save. I created the Dashboard “TrafficFlow”
2. Select the Dataset, in my case “WSDOTdata”. You will see a blank dashboard with a field picker on the right. Given the way we formatted JSON after the Azure worker role pulled it from WSDOT, this is how it appeared in the field picker in Power BI:
3. Add fields Axis = timecreatedpst and Value=”∑ Max of value”.
4. Add Legend=trafficcategory.
5. On the same page, create three slicers, one for roadname, one for locationdescription, and one for direction. Save the report.
6. Using the slicers, select various roads, locations, and directions and pin each of them successively to the dashboard. For my first chart, I selected I-5 at 40th Ave W, going Northbound, and got the following chart:
This corresponds to almost equal ‘Light’ and ‘Moderate’ traffic, as you might expect for the early rush hour timeframe. My nemesis, I-520, is somewhat worse, showing mostly ‘Moderate’ traffic:
Here is a dashboard with four locations shown:
Now this is beginning to tell me something useful! From my office, if I want to leave now to catch the ferry at Mukilteo, I can either go west on I-520 (Moderate traffic) or I-90 (mostly Stop and Go), then north either on I-405 (also mostly Stop and Go) or on I-5 (mostly Moderate). Decision clear… I-520 to I-5. Mind you, if I wait an hour, I-520 will be jammed too, and I might as well wait till after dinner before heading out.
There are thousands of public data sources you can access, and getting that data into Power BI is really not that different from accessing sensor data. Find a few, and start working with them along the lines shown in this post to see how easy it is to extract and present valuable insights. Even better, start thinking about how you can do more with the insights from the data, for example, sending alerts or notifications based upon various criteria, or spawning actions such as dispatching a technician, placing an order, or changing the settings on a thermostat.
As I mentioned last time, for further reading, here are some useful links on the Power BI side: