Skip to main content

Building our report on the opioid crisis for the Gartner bakeoff

Headshot of article author Will Thompson

Introduction

You might have read in our previous blog post that Microsoft wa invited to take part in the Gartner Data and Analytics summit’s “Bakeoff” competition. We wanted to give you all a feel for the demo we presented at the bakeoff and describe some of the steps we went through in preparing the data and report for it.

First up, here’s a video showing the different parts of the solution we created:

You can interact with the report and download the .pbix file from the Data Stories Gallery here! Now, onto some of the steps for building the report.

Data preparation and modeling

Gartner provided the data we were to use for the demo. It came from a variety of sources (CDC, Medicare and Medicaid programs, and Census bureau) and in a variety of formats (CSV, Excel, etc.). Understanding, cleaning and prepping the data model was, as is so often the case in BI projects, the bulk of the work!

Drug overdose data

The drug overdoses information was in a common structure, where indicators (measures) were all included in one column and distinguished by the values of another column. Typically, Power BI data should have columns for each metric or measure you want to analyze. This is easy to rearrange in Power BI using the ‘Pivot’ transformation:

clip_image002

You’ll also notice that not every state reports the detail of which drugs were involved. This meant that in some visuals we could only show the overall values across all states, or pick certain drug types only for some states.

The drug overdoses information was also, unusually, reported for a 12-month rolling period. To help report on this we created a ‘LastDayOfMonth’ column based on the year and month columns. This made it easy to baseline everything at a low grain and use the built-in date hierarchies and time intelligence functions that Power BI provides.

clip_image004

Once loaded, we also created a measure that ensured we’d only report the 12-month-ending value and not aggregate over any time periods:

DODs =
CALCULATE (
    SUM ( Overdoses[Drug Overdose Deaths] ),
    LASTDATE ( Overdoses[LastDayOfMonth] )
)

This sums the Drug Overdose Deaths column filtered to just the last date for whichever period is being viewed in the visual. The SUM() is required because it operates on the [Drug Overdose Deaths] column but should always only include one date anyway.

At the visual layer, we added this LastDayOfMonth hierarchy to the columns

clip_image006

One of the advantages of this DAX formula is that it returns only the last value from the date hierarchy but allows aggregation over other attributes, e.g. State. The DOD value is correct even when drilling down through the date hierarchy, but the total row is calculated correctly at the bottom of the table, and other aggregations such as Region still work correctly.

clip_image008

During the demo we added a Quick Measure for the YoY% change:

Drug Overdose Deaths YoY% =
IF (
    ISFILTERED ( 'Overdoses'[LastDayOfMonth] ),
    ERROR ( "Time intelligence quick measures can only be grouped or filtered by the Power BI-provided date hierarchy or primary date column." ),
    VAR __PREV_YEAR =
        CALCULATE (
            [DODs],
            DATEADD ( 'Overdoses'[LastDayOfMonth].[Date], -1, YEAR )
        )
    RETURN
        DIVIDE ( [DODs] - __PREV_YEAR, __PREV_YEAR )
)

The ISFILTERED function here ensures that the measure is being filtered by the date hierarchy, creates a variable (__PREV_YEAR) for the drug overdose deaths a year before the current value, and divides the difference by the previous value to give percentage change. It’s a common pattern and a good example of using DAX’s VAR feature, which allows variables to be defined and reused across a calculation to aid readability.

Census data

The Census data presented another interesting structure:

clip_image010

The different values (population estimate and deaths) are over multiple columns this time, one for each year. To join this with other data we needed ‘Year’ to appear in a column of its own. Note that the population estimates are only available for 2016 and 2017, but deaths are available for 2015, too. To extract the year we applied the Unpivot transformation to the columns and used Split by Number of Characters to extract the year. This is the result:

clip_image011

And then pivoted the data:

clip_image012

This left some rows with blanks for the population estimate, which we managed through the reporting layer.

De-normalizing and joining data

We built several de-normalized tables to help with joining data from the various healthcare sources and the Bing News and Twitter data. This helped eliminate some unmatched rows such as the “YC” and “US” values in the Overdose table that correspond to New York City and the overall US, meaning we wouldn’t be double counting in any totals. In the Labelers table, we also removed the variants of “Inc.”, “LLC” etc. to ease readability in the report.

This let us use the semantic model by creating relationships across the tables rather than having to merge the queries together.

clip_image014

We also added some ‘helper’ tables that we used in slicers and demographic data for the Census and state representatives. With more time we had wanted to further de-normalize some of the facts out into other tables and set up relationships on more fields into the Bing and Twitter data. There were still several ‘dirty’ elements in the data such as products that only seemed valid for certain time periods, and labelers that seemed to have changed name at some point in time. We'd have needed more time to fully understand the structure of the data for these to really know how best to model it.

Integrating Bing News and Twitter data

Microsoft provides several Solution Templates that are pre-configured sets of technologies that address specific business requirements. You can find all the solution templates here: https://appsource.microsoft.com/en-us/marketplace/apps?product=pbi-solutiontemplates clip_image016

When you set one up, it takes you through a set of configuration steps that set up Azure SQL DB, Azure machine learning, Power BI Desktop files and other components. For the Bing News and Twitter templates, the resulting .PBIX file contained all the information we needed for the report pages. Out of the box, the report looked like this: clip_image018

The bottom right hand visuals already show some of the drug names and news articles we wanted to match up with the structured data. These were the only visuals we needed for the final bakeoff report.

Integrating this data with my existing PBIX file was easy – in the Query Editor we could select the queries and choose Copy:

clip_image019

And then paste them into the Query Editor in the other file. This copies all the dependent queries (which includes parameters for the database and server name). We had to then create a mapping table and relationships to the other data and add the right visuals to the report.

With more time we would have used more of the Sentiment Scoring capabilities in other parts of the report and fed more of the specific drug manufacturer and prescriber names into the Bing News template to pull articles that referred to them.

You can read more about data preparation and modelling in our documentation:

Visualizations and reports

Theme and layout

Beautiful layout and styling of a report is important to consider and easy to achieve in Power BI. Once we’d settled on a color scheme and rough layout for the report pages, we used 2 key features to apply them: themes and shapes.

The theme determines various styling options across visuals in your report. We used it to set the color of data points, the styling of the title of visuals, and other formatting properties such as data labels. We chose colors based on the Power BI and Gartner brands (yellow-gold and dark blue) that also happened to work well together. We chose a few shades of each and some greys to use as secondary colors. This is good best practice – pick some muted tones and one or two accent colors to highlight key information. The theme file itself then contains the default colors to use along with the various other format options:

clip_image021

Themes are defined using a JSON file, but there are a few tools from the Power BI Community that make it much easier to create them. For example, https://powerbi.tips/tools/report-theme-generator-v3/.

We’ve uploaded the theme file to the Themes Gallery here:

https://community.powerbi.com/t5/Themes-Gallery/Gartner-Bakeoff-Opioid-Crisis-Theme/m-p/377908#M93

So you can download it and reuse or edit it if you like it!

We also made extensive use of shapes on the canvas to create the blocks of color to separate the navigation area, titles and main report area. If you enable the Selection Pane you’ll see all the different elements on the pages. Some have quite a few shapes there! We just used the out-of-the-box line shape but gave the line itself 0pt weight – all we wanted was the background of the shape to block out areas:

clip_image023

State details report page tooltip

We were fortunate with the timing of our March release to include the report page tooltip feature that we used on the maps in my report and creating these is simple.

clip_image024

You create a page in your report to use as the tooltip; you can even use the pre-defined ‘Tooltip’ size to make something that’s a common size. We chose a Custom size as we wanted a little more space to fit the callouts and bar chart together: clip_image025

Then you set the page to be available as a tooltip in the Page Information card (you can see this in the image above). Finally, add the field that you want to link it on to the Tooltip area of the filter pane – in this case, State:

clip_image026

Natural language query with Q&A

Part of the script Gartner provided was to build two or three visuals that had been used in the initial dashboard. We started here, and tried out the specific questions through Q&A. These were the results basically without any further tuning! All we did was name the columns in the data appropriately and tag the state field as being geographic data.

image

You’ll also see that Q&A prompts you as you type with the fields and values that match your partial word. Ohio was identified as a state in several columns ([State], [State Name], [Entity Value], [User Location]) and as a partial match to the Prescriber name.

image

For demo purposes we left the Q&A model there, but we could have improved it by adding synonyms, normalizing some of the repeated state fields, etc.

You can learn more about building reports here:

Navigation and Storytelling

Navigation bookmarks

We used Power BI’s bookmarks feature to produce the callout and navigation controls on the left of the reports:

image

Bookmarks let you save the visibility state of any elements on your report and can be bound to a shape. Look at the Selection pane and you’ll see, for example:

imageThe selected shapes, textboxes and callouts correspond to a particular bookmark.

There are lots of items in the selection pane because we’ve layered a few visuals together for each of the two navigation ‘tabs’. There is a background shape, the text (both selected and deselected), and a transparent clickable object placed at the top of the Z order that is used to link to the bookmark. For example the ‘(0) BM1 – Click’ shape is just a transparent shape. e found the Link card in the formatting pane and set it to direct to the appropriate bookmark:

clip_image042

Storytelling with bookmarks

Once you’ve uncovered some insights and created a report, you’ll want to share and communicate with others. In this report we found some surprising data points about one labeler, Purdue Pharmaceutical. The data show that Purdue had the highest reimbursement amount through the Medicaid program by a significant amount. From investigating the Bing News articles that mention Purdue Pharmaceutical we saw that they were being sued by two states for mis-marketing the capabilities of OxyContin, their oxycodone-based painkiller. We added a textbox and two lines to call out those two states as they are significant outliers in the chart:

clip_image044

They only showed up as outliers when cross-filtering on Purdue’s data, so we stored that as a bookmark too, and used the Selection pane to control the visibility of the textbox and lines for the last bookmark.

You can learn more about bookmarks at Use bookmarks to share insights and build stories in Power BI

Real-time Twitter dashboard

We wanted to include some real time data in the demo and pulling tweets in to a dashboard is an easy way to monitor this for any keywords.

clip_image046

You first need to create a Streaming Dataset in Power BI to receive the tweets. From the workspace that you want to add the dataset to, click the Create button at the top right of the page and choose Streaming dataset:

clip_image048On the pane that appears choose API and click Next.

Give your dataset a name and set up the values that you’ll capture. We used:

  • datetime (DateTime)
  • tweet (Text)
  • username (Text)
  • name (Text)
  • countOfTweets (Number)
  • retweets (Number)
  • isPowerBI (Number)
  • timeMinute (DateTime)
  • timeHour (DateTime)
  • location (Text)

Make sure you enable the Historic Data Analysis option too, so that a Power BI dataset is created for your data to be stored in.

We used Microsoft Flow to monitor Twitter and pump data into Power BI. Once you sign in to Flow, click My Flows and choose Create from blank:

clip_image050

You’ll want to search for the Twitter connector:

clip_image052

And choose the Twitter – When a new tweet is posted trigger. You’ll have to enter search terms you want to monitor (in my case we used ‘opioid crisis’). Then you add an Action and choose the Power BI – Add rows to a dataset action:

clip_image054

Here’s what mine looked like:

clip_image056

We used some custom expressions for some fields to see if the tweet included the #PowerBI hashtag, and to extract certain parts of the time.

  • indexOf(triggerBody()?['TweetText'],'#PowerBI')
  • addMinutes(startOfHour(utcNow()), int(formatDateTime(utcNow(),'mm')))
  • startOfHour(utcNow())

Save your flow and you should see tweets start to flow through to Power BI. You can then use all the normal report creation tools to build visuals and pin them to your dashboard.

If you want to write any further measures against the model, you can use Power BI Desktop. Connect to the model in the Power BI service and use the Report Measures feature to add the calculation at the report layer. For example, we created:

lastTweet = LASTNONBLANK(RealTimeData[tweet], MAX(RealTimeData[datetime]))

to use in a visual that always displayed the most recent tweet.

You can learn more about real time dashboards in Power BI at Real-time streaming in Power BI

Download the report

You can interact with the report and download the .pbix file from the Data Stories Gallery here! Let us know if you find any other interesting insights in the comments below.