Skip to main content

Easier set up of datasets in Power BI

Headshot of article author Kay Unkroth

Connecting datasets to their data sources is arguably one of the most important tasks BI authors must accomplish after publishing a Power BI Desktop file to the Power BI service. It is perhaps also one of the most complex tasks as connections to on-premises data sources generally require a data gateway. Accordingly, improving the configuration experience for gateway connections is an important goal.

Ideally, your gateway infrastructure already covers all connectivity requirements. Yet, this is not always the case. You might have no enterprise gateways in your Power BI environment, or the existing gateways might be missing data source definitions, or they aren’t configured to allow cloud data sources while your dataset combines on-premises and cloud sources. The solutions are relatively straightforward. If you don’t have access to an enterprise gateway, perhaps install a personal gateway. If existing gateways are missing data source definitions, add them—or ask a gateway administrator to do so. If you must combine on-premises and cloud data, make sure you enable the checkbox Allow user’s cloud data sources to refresh through this gateway cluster in the gateway configuration.

The good news is that recent user interface (UI) changes for gateway connections can help guide Power BI users through these solution steps. For example, if there are no enterprise gateways available, you can click on the Install now button under Gateway connection in the datasets settings to deploy a personal gateway and unblock your refresh scenario, as the following screenshot illustrates.

Install Now

On the other hand, if you have enterprise gateways and are just missing some data source definitions, you can either add them directly or pass the data source information on to your gateway administrator to get them added, as mentioned earlier. As the screenshot below reveals, the Status column clearly indicates those gateways that meet the configuration requirements for a given dataset. For gateways that do not, you can use the Actions button to view the data sources and resolve the configuration issues. Of course, you only need one correctly configured gateway to establish the data connection. In this example, the correct choice would be Gtw1.

Gtw Status

If you wanted to pick Gtw2, for any reason, you must fix the identified configuration issue by adding the missing data source. Assuming you are the gateway administrator, you can click on Add to gateway for the desired data source, which takes you to the Add Data Source page for the selected gateway and prepopulates the configuration fields with the information obtained from the data source definition in the dataset. You only need to provide the authentication details to complete the configuration.

AddDS

When you click Add, Power BI creates the data source definition and takes you back to the dataset settings page, where you can now pick the desired gateway and click Apply, as in the following screenshot.

Pick Gtw

At this point, you might be wondering about the purpose of the Use a data gateway toggle switch. This switch is On and cannot be changed if Power BI detects a connection to an on-premises data source in the dataset because on-premises data sources always require a data gateway. On the other hand, if a dataset relies only on cloud sources, then the switch is Off by default but can be turned On. For performance reasons, it is a good idea to use the default and leave the switch Off, but if you require more control, such as over privacy settings, you can force the connection to go through a data gateway. Note that you must then add the data source definition to the desired gateway like an on-premises data source. Note also that there are some limitations. For example, organizational authentication is generally unavailable for cloud sources registered like an on-premises data source in a data gateway. You might have to use Basic or Windows authentication against the cloud source. For this reason and given that the cloud data would have to be transferred to the on-premises gateway and then back to Power BI in the cloud, it is best to refresh datasets against cloud sources without the involvement of a data gateway. So, leave the Use a data gateway toggle switch Off and provide the required credentials under Data source credentials on the dataset settings page as you normally do for cloud sources.

Even datasets that use both on-premises and cloud sources do not require the cloud sources to go through a data gateway because Power BI can retrieve on-premises and cloud data in separate queries—provided the dataset does not merge or append the data in the same query. So, even though the Use a data gateway toggle switch is On in this case because the dataset uses an on-premises data source, you don’t need to add the cloud sources to the gateway. Unfortunately, however, the Power BI UI cannot determine if you wanted to add the cloud sources or not and so suggests both on-premises and cloud sources, as in the following screenshot. A future UI update will de-emphasize the need to add cloud sources because they should only be added to a gateway in rare cases.

OPAndCloud

There still is, however, the one caveat that the dataset might merge or append on-premises and cloud data in the same query. Now a gateway must process the data before sending it to the Power BI dataset. One option is to add all data sources explicitly to the desired gateway, but what do you do if the cloud source uses an unsupported authentication method? In this case, enable the checkbox Allow user’s cloud data sources to refresh through this gateway cluster in the gateway configuration and then configure the credentials for the cloud source under Data source credentials. With cloud credentials provided, you can refresh the dataset using the Refresh now option or set a schedule to refresh the data periodically.

cloudcreds

And that’s it for a quick journey across the most recent improvements concerning the configuration of the gateway connection for a dataset. As you no doubt will notice, there is still room for more. Among other things, it should be easier to switch back and forth between the dataset settings and gateway management pages. The Data source credentials should identify more clearly where credentials are and are not required. As mentioned, cloud sources should be highlighted differently than on-premises sources because their gateway connection requirements are very different. And lastly, the Gateway connection section should also be displayed for datasets working in DirectQuery mode. So stay tuned for more improvements in future service releases.