Skip to main content

Power BI + Dremio = A Seamless BI Experience for Cloud Data Lakes

Headshot of article author Arun Ulagaratchagan

With the continuing rise of the public cloud, many companies are now centralizing their data in massive cloud data warehouses and data lakes powered by storage services, such as Azure Data Lake Storage (ADLS). The challenge is to deliver insights on these data warehouses and data lakes rapidly and cost-efficiently.

 

One common approach to delivering insights is to build rich data pipelines, yet copying data is difficult, expensive, time-consuming, and not always possible. It is very hard to build, secure, and maintain these solutions. Another approach is to deliver reports and analyses directly on cloud data lake storage. This approach can help to reduce complexity while at the same time accelerating insights delivery. Decision makers and knowledge workers don’t need to wait for weeks or months to get access to a new or modified dataset. They can immediately interact with the data in a self-service fashion.

 

In partnership with Dremio, Power BI enables organizations to create reports and analyses directly on cloud data lakes. Dremio is a cloud data lake engine that executes SQL queries directly on ADLS . The data does not need to be moved or copied into a data warehouse. Power BI datasets in DirectQuery mode can consume the data through Dremio, as the following diagram depicts.

 

 

A variety of technical innovations make it possible to support the high-concurrency, low-latency query patterns that are typical of Power BI DirectQuery workloads. These capabilities minimize the amount of data that must be read from ADLS, and the amount of processing that must take place, in order to respond to a query from Power BI, thereby resulting in fast query performance and low infrastructure costs.

 

In addition to delivering fast query performance and lowering infrastructure costs, Dremio also makes it easy for Power BI report authors to connect to their data sources. When previewing a physical or virtual dataset in Dremio, a user can simply click on the Power BI button to start Power BI Desktop and automatically connect to the previewed dataset. As the following screenshot illustrates, you can find the Power BI button in the top-right corner of the Dremio user interface .

 

 

Note that report authors can also connect to Dremio from Power BI Desktop just like any other data source. The Dremio connector is available in the Get Data dialog under the Database category. For data at cloud scale, keep in mind that it is important to select DirectQuery mode to avoid data imports. When users interact with a report based on a DirectQuery connection to Dremio, Power BI sends the SQL queries to Dremio, which Dremio then executes at interactive speed. In this way, Power BI and Dremio enable report authors to create interactive reports on live data in the lake without having to wait for data or cache refreshes.

 

The most efficient way to share a report with others is to publish the report in the Power BI service. Having published a report, the data connection must be configured so that Power BI can query the data source. Currently, this requires the deployment of a Power BI Gateway for the Power BI service to communicate with Dremio, but Power BI and Dremio are currently working on enabling direct cloud-to-cloud connectivity without the need of a data gateway. Direct cloud-to-cloud connectivity between Power BI and Dremio is the next big milestone to enable organizations to deliver insights on cloud data lake storage in a self-service fashion and with low infrastructure complexity. So, stay tuned for even more improvements and innovations from Power BI and Dremio.

 

For more information on how to use Power BI and Dremio, check out one of the following resources: