Skip to main content

Aggregations for petabyte-scale BI available in the Power BI service

Headshot of article author Christian Wade

Possibly the biggest scalability feature (literally) in the history of Microsoft BI is actually here! The October Power BI Desktop feature summary announced aggregations built on composite models is now supported in the Power BI service. It works on both shared (Pro) and Premium capacity.

Here is a summary of reasons to use aggregations:

  • Query performance over big data – as users interact with visuals on Power BI reports, DAX queries are submitted to the Power BI dataset. Boost query speeds by caching data at the aggregated level, using a fraction of the resources required at the detail level. Unlock big data in a way that would otherwise be impossible.
  • Data refresh optimization – reduce cache sizes and refresh times by caching data at the aggregated level. Speed up the time to make data available for users.
  • Achieve balanced architectures – allow the Power BI in-memory cache to handle aggregated queries, which it does effectively. Limit queries sent to the data source in DirectQuery mode, helping stay within concurrency limits. Queries that do get through tend to be filtered, transactional-level queries, which data warehouses and big-data systems normally handle well.

Aggregations is in public preview. Here are the remaining items before we make it generally available.

  • Incremental refresh with aggregations/composite models is not supported in the Power BI service just yet. It will be soon. [Update: As of November 2018, aggregations combined with incremental refresh in the service works for SQL (Azure SQL, Azure SQL DW, SQL Server, APS), Oracle and Teradata data sources.]
  • Datasets are currently disallowed from creating aggregations with RLS in the same dataset. We are actively working on allowing this.

Here is a recording of the trillion row demo. It shows how aggregations enable “clicky clicky draggy droppy” data analysis over massive datasets. We’d like to remind everyone that you don’t actually need a trillion rows to make use of this feature! Any dataset that is expensive to refresh in terms of time, money, memory usage, etc. can likely benefit greatly from aggregations.

This video is a deep dive on how to set up aggregations both for dimensional and big-data models.

The aggregations documentation is available here: