Announcing general availability of automatic aggregations

Less than a year ago, we announced support for automatic aggregations in public preview. Today, we are excited to announce General Availability (GA), along with several important improvements. By using automatic aggregations in Power BI Premium, you can unlock cloud-scale data volumes for interactive analysis. The fundamental idea is to import data into the dataset only at the aggregated level while leaving the detail data at the source to minimize data duplication. In this way, most BI queries can benefit from Power BI’s blazing fast query performance in import mode, while occasional detail queries can still work transparently against the data source in DirectQuery mode. And given that most user-generated BI queries are aggregated in nature, you should enable automatic aggregations to deliver an excellent interactive user experience over big data in a very economical and effective way.

There are basically two options to add aggregations to a DirectQuery dataset. You can define aggregations manually by following the steps as described for User-defined aggregations in the product documentation. These aggregations are static and won’t adjust to changing query workloads unless you modify them. Alternatively, you can enable automatic aggregations with a single click on a toggle button on the Dataset Settings page, as shown in the following screenshot. Automatic aggregations are system-managed and dynamic. Power BI analyzes a query log that tracks all the BI queries for the dataset over a seven-day window to automatically create and adapt aggregations according to the actual usage patterns. We call this AI-driven process automatic aggregations training. So, you no longer require query-optimization skills to set up and benefit from aggregations. In comparison to pure DirectQuery reports without aggregations, you will see a significant boost in query performance thanks to the blazing-fast aggregations caches that Power BI maintains automatically for you on an ongoing basis.

While it is easy to enable automatic aggregations on the Dataset Settings page, we learned during the public preview period that customers want more insights and control over the automatic aggregations training process. For this reason, we are introducing the following improvement with the GA release:

Additional information about requirements and unsupported configurations in the user interface. For example, your dataset might use a data source enabled for single sign-on (SSO) in which case the UI will let you know that any tables that use this data source will be skipped. In another common situation, you might upload a new dataset, enable automatic aggregations, and expect query performance to improve immediately. However, Power BI did not yet have a chance to track query patterns because no one used the dataset yet. With an empty query log, Power BI cannot determine any useful aggregations, and additional information in the user interface will make that clear. The following screenshot illustrates this situation.
Ability to launch automatic aggregations training manually. By default, Power BI performs training as part of the first scheduled data refresh operation for your selected frequency (Day or Week), but you might want to launch it manually in some cases, too. For example, you might have started with an empty query log, then interacted with a report to generate some queries, and now you don’t want to wait until the next day to generate aggregations. To perform training immediately, click on Train and Refresh Now. Note that you are launching both, a training process to process the query log and create, update, or drop aggregations tables as needed as well as a refresh process to import the aggregations data into these tables.
More information in the refresh history. The same information displayed in the UI about requirements and unsupported configurations, you can also find in the data refresh history. For example, if you enable SSO for a data source later and a dataset with automatic aggregations happens to use this data source, you will find a corresponding warning about SSO-enabled data sources from now on in the refresh history. Training and refresh still succeed, but the query performance might be less optimal because automatic aggregations skip tables that use SSO-enabled data sources.
Incremental automatic aggregations training. Automatic aggregations training can put significant load on the data source as special queries are sent to the data source to determine aggregations to be included in the cache. The training operation has a 60-minute time limit, which might not be sufficient depending on the data source performance. If training is unable to process the entire query log within the time limit, a notification is logged in the refresh history. More importantly, Power BI now resumes training where it left off the next time training is launched. The training cycle completes and replaces the existing automatic aggregations when the entire query log is processed.
Tabular Object Model (TOM) and Tabular Model Scripting Language (TMSL) support coming soon. Advanced BI pros will be delighted to fine tune automatic aggregations beyond what the UI provides. For example, the UI always performs training and refresh together to ensure the generated aggregations tables do in fact get filled with data. But you can soon also use TOM and TMSL to launch training separately. For example, you might want to perform automatic aggregations training repeatedly without data refresh until the incremental training has indeed finished the entire query log, and then perform a single data refresh operation. You can also define the desired query coverage, set min and max rows limits for tables that should be included, and exclude entire tables if desired, as the following code snippet demonstrates. Note that you need new AS client libraries, which we are planning to release in the coming months.

TOM.Model model = database.Model;

model.AutomaticAggregationOptions = "{ \"queryCoverage\": 0.9, \"detailTableMinRows\": 100M }";


TOM.Table table = model.Tables[tableName];

table.ExcludeFromAutomaticAggregations = true;


model.SaveChanges();

With these improvements and APIs, we hope that you can take your big data reports and solutions to the next level by enabling automatic aggregations for every eligible DirectQuery dataset. In addition to Azure Synapse Analytics, Snowflake, and Google BigQuery, we are actively working on expanding the list of supported data sources so that even more DirectQuery datasets can take advantage of this exciting capability to boost query performance based on blazing-fast aggregations caches that Power BI maintains automatically for you. So, stay tuned for even more improvements and innovations in this area. For additional details, check out the Automatic aggregations overview in the product documentation.

Blog consacré à Microsoft Power BI