Skip to main content

Data Cleansing with R in Power BI

Headshot of article author Sharon Laivand

Do you want to detect outliers in your data? Want a predictive way to complete missing values in your data, or do you want to perform other advanced analytics scenarios as part of data cleansing?

Implementing advanced analytics data cleansing scenarios in Power BI is now easier than ever!

This is done by adding R steps in Power Query as part of the Power BI Desktop July Update.

Read this post if you want to understand how it works.

 

What’s the story?

Data analysts spend most of their time preparing data before analyzing and reporting insights. This stage is called data cleansing, and is usually composed of two parts.

The first part is ensuring that the data is technically correct. This means that the data is formatted uniformly, for example the data is formatted as decimal numbers, and organized in your preferred structures for analysis.

The second part of data cleansing, and sometimes the more challenging one, is making the data consistent. This means having all the data points in place, correcting outliers, and normalizing to uniform scales. The R language and toolset includes thousands of libraries that can help with data cleansing, so we have added R to our own data cleansing and transformation tool: Power Query. Now that R is supported in Power Query, it also can be used to make general advanced analytics tasks in the data cleansing stage. For example, R can be used to find clusters in the dataset.

 

How does this work?

In the Power BI Desktop, go to the query editor by selecting Edit Queries.

edit query

In the query editor, select the Transform tab.

image

In the right side of the Transform tab, select the new Run R Script button.

image

By clicking the R button, you can add your own R script as another Power Query step.

image
 

Go ahead and try it

An example of missing data completion with R in Power Query is explained in the R in Query Editor documentation page.

This example shows how the R ‘mice’ library completed randomly missing values in the SMI (Swiss Market Index) end of day values for the years of 1991-1998.

The visualization below shows via the red line how a spiky missing value was completed in a predictive manner.

image

The demo .pbix file is available for download, as is the demo dataset with missing values.

 

Licensing and Restrictions

The only prerequisite to adding R steps for Power Query in Power BI Desktop is installing a local R environment such as Microsoft R client or RStudio.

Applying the R steps in Power Query to data refresh scenarios will require the installation of the Power BI Personal Gateway to be the host for the R steps execution.

 

Give us feedback

Your feedback is critical! Tell us that you think by sending an email to rinpbifb@microsoft.com

Keep enjoying the PoweR in BI!