We are excited to announce support of Hybrid Tables in Power BI Premium, which enables us in turn to announce an equally exciting and groundbreaking enhancement to incremental refresh – incremental refresh augmented with real-time data!!! Now you can have blazing fast performance in import mode and the latest data changes in the data warehouse reflected in user reports without having to perform a dataset refresh. For technical details concerning incremental refresh, see the article Incremental refresh for datasets in Power BI in the product documentation.
Up until now, dataset creators sometimes had to make hard tradeoffs between query performance and data freshness. Import mode delivers best performance, yet data freshness poses challenges if new data arrives at a very high cadence. It is both resource intensive and inefficient to import data into a dataset in very short intervals. DirectQuery mode, on the other hand, delivers data freshness, yet at the expense of report performance. Because Power BI doesn’t import the data but translates the report queries into data source queries, the latest data changes are quicky picked up, but the query/response roundtrips between Power BI and the data source take time and slow down the reports.
As the following diagram illustrates, a hybrid table can help to strike the right balance between query performance and data freshness. A hybrid table is essentially a (large) table that has one or multiple import-mode partitions as well as another partition in DirectQuery mode. If the DirectQuery partition is sufficiently small in comparison to the import-mode partitions, the query/response roundtrips between Power BI and the data source should still be reasonably fast while access to the bulk of the data is already super-fast in import mode. Import and DirectQuery data is presented to users as a single table with business definitions and calculations.
Support of hybrid tables in Power BI Premium offers the following benefits:
- Blazing fast performance in import mode – The import-mode partitions deliver extremely fast query performance as the data imported during data refresh is readily available in local memory to answer client queries.
- Latest updates from the data warehouse always included – DirectQuery requests are sent to the data source, so the query results include the latest data updates. The performance depends on how long the data source takes to respond with the results, but the queries are faster if only a small slice of the data is queried, such as data updates that occurred after the last refresh time.
- Better resource utilization especially for very large datasets – With the latest data updates available in real-time, fewer data refreshes are needed to pick up latest changes. With fewer refresh cycles consuming your Premium capacity resources, more of these resources are available to deliver a good report performance and user experience. You no longer need to refresh your datasets at a very high cadence if the data in the data warehouse changes frequently!
- Incremental refresh policies with real time support – Incremental refresh applies a sophisticated partitioning scheme in import mode so that refreshes complete much, much faster than if you imported the entire data volume into a single table partition every time. Despite this complexity, however, it is relatively straightforward to configure an incremental refresh policy in Power BI Desktop. The report creator does not require any partitioning skills. And now, with hybrid tables, report creators can easily add a DirectQuery partition to get the latest data in real time as well. Just configure incremental refresh as usual and enable the Get the latest data in real time with DirectQuery (Premium only) checkbox!
The following screenshot shows how to configure an incremental refresh policy with real time in Power BI Desktop. Having published the dataset to a Premium Per User (PPU) workspace or a workspace on a Premium capacity, Power BI will automatically apply the refresh policy and partition the table as a hybrid table during data refresh. For details regarding configuration steps, see the article Incremental refresh for datasets in Power BI in the product documentation.
Although the configuration is as simple as enabling a single Get the latest data in real time with DirectQuery (Premium only) checkbox, note that the presence of a DirectQuery partition on a table limits the table to the features available in DirectQuery mode. Complex Power Query expressions may not fold, which means the DirectQuery partition cannot use these expressions. And calculated columns can also cause folding issues. Moreover, dimension-type tables with relationships to your hybrid table must be in dual mode to ensure good performance. For this reason, we recommend recreating your dataset in DirectQuery mode first before you configure an incremental refresh policy with real time. The dataset in DirectQuery mode reassures you that there are no folding or compatibility issues. Then convert the desired fact table into import mode and the dimension tables into dual mode to ensure good performance, and then configure your fact table as a hybrid table.
As mentioned earlier, don’t forget to perform at least one data refresh after you published your dataset to the Power BI service so that Power BI can apply the incremental refresh with real time policy. This is an important step because the DirectQuery partition does not exist in the dataset until Power BI applied the policy and partitioned the table. This only happens during scheduled or manual data refresh in the Power BI service.
As soon as the DirectQuery partition has been added, it is also a good idea to configure automatic report page refresh if desired in the Power BI service so that your report always shows the latest data updates from the data source. Automatic page refresh causes your report visuals to query for new data, at a predefined cadence. For details, refer to Automatic page refresh in Power BI in the product documentation.
Of course, you can also apply an incremental refresh with real time policy to a table programmatically by using the Analysis Services client libraries with XMLA endpoints in Power BI Premium. The following code snippet demonstrates how to do this. Just make sure you set the RefreshPolicyMode to Hybrid and IncrementalPeriodsOffset to -1 so that there is no lag or leading period to the rolling refresh window head, as hybrid policies only support refreshes of complete days. It is also a good idea to trigger a refresh along with the policy update so that the Power BI service can partition the table accordingly.
using System; using TOM = Microsoft.AnalysisServices.Tabular; namespace Hybrid_Tables { class Program { static void Main(string[] args) { using (var server = new TOM.Server()) { server.Connect("powerbi://api.powerbi.com/v1.0/myorg/AdventureWorksTarget"); TOM.Database database = server.Databases.FindByName("AdventureWorks"); if (database == null) { throw new ApplicationException("Database cannot be found!."); } if(database.CompatibilityLevel < 1565) { database.CompatibilityLevel = 1565; database.Update(); } TOM.Model model = database.Model; TOM.Table salesTable = model.Tables["Sales"]; // Apply a RefreshPolicy with Real-Time to the Sales table. TOM.RefreshPolicy hybridPolicy = new TOM.BasicRefreshPolicy { Mode = TOM.RefreshPolicyMode.Hybrid, IncrementalPeriodsOffset = -1, IncrementalGranularity = TOM.RefreshGranularityType.Day, RollingWindowGranularity = TOM.RefreshGranularityType.Month, IncrementalPeriods = 2, RollingWindowPeriods = 4, SourceExpression = "let\n" + " Source = Sql.Database(\"awsampledw.database.windows.net\", \"AdventureWorksDW\"),\n" + " dbo_FactInternetSales = Source{[Schema=\"dbo\",Item=\"FactInternetSales\"]}[Data],\n" + " #\"Filtered Rows\" = Table.SelectRows(dbo_FactInternetSales, each [Date] >= RangeStart and [Date] < RangeEnd)\n" + "in\n " + " #\"Filtered Rows\"" }; salesTable.RefreshPolicy = hybridPolicy; model.RequestRefresh(TOM.RefreshType.Full); model.SaveChanges(); } } } }
But wait, there is more! Hybrid tables support is also available without an incremental refresh policy! For example, you can partition a table programmatically via XMLA endpoints to apply any partitioning scheme you like. For example, you could add a DirectQuery partition to a table for old (and possibly very large) historic data. This is the opposite of the incremental refresh approach where newer data is fetched from the data warehouse. If your reports query newer data much more frequently than the historic data, it might make sense to keep the historic data in the data warehouse and import the new data. This is not a real-time scenario, yet it is a useful strategy for big data solutions where the historic data volume is just too large for import mode, but you still want it available in a report. A future blog post is planned to show in greater detail how to add a DirectQuery partition to a table programmatically, so stay tuned!
And that’s it for this exciting announcement and a whirlwind tour through hybrid tables in Power BI Premium. We hope that you can take advantage of this groundbreaking new capability in your enterprise BI solutions to deliver blazing fast report performance to your users while at the same time incorporating the latest data changes from the data warehouse side as they occur, without having to resort to a high cadence of resource-consuming data refresh cycles. And as always, please provide us with feedback as you try out this feature. We would love to hear from you!