Skip to main content

Aggregations for petabyte-scale BI is generally available!

Headshot of article author Josh Caplan

Today, we are announcing general availability (GA) of one of the most significant scalability features in Microsoft BI, Aggregations. Aggregations empower organizations to enable every user to perform their own analysis atop full fidelity, petabyte scale datasets. At the same time, those organizations now have better control over their BI architecture, striking a better balance between cost, performance, and data fidelity. In addition, we are announcing that aggregations can be used for enterprise-grade semantic models in large organizations with fine-grain security requirements by leveraging Power BI’s row level security feature (RLS).

Providing users a way to do analysis over large volumes of data has traditionally been challenging. Large volumes require extra compute power and specialized skills to effectively model and analyze. To unblock users, organizations have often relied on transforming big datasets into smaller, more manageable datasets, which can then be analyzed using traditional BI reporting and visualization tools. When data is transformed from big to small, the details of the data tend to get lost as dimensionality is traded away for size and performance. While insights are lost, management complexity and costs grow as development teams work to populate these multiple smaller datasets.

Aggregations enhance Power BI’s DirectQuery feature, which enables you to create a Power BI dataset directly over a data warehouse or datalake without needing to copy that data into Power BI. The semantic model within the Power BI dataset creates a user friendly view of the data, enabling those users to do their own analysis by simply dragging and dropping fields onto a report. Users are not required to understand how the data is stored, or query languages like T-SQL or Spark SQL. And, the performance of these queries often depends on how quickly the underlying datastore can serve them. Aggregations takes this a step further by allowing you to accelerate these queries using Power BI’s fast in-memory caching layer. With in-memory caching, you can accelerate common queries to sub second speed, while taking pressure off the underlying data source so that it can serve up detailed data when required. Users can access all their data in one Power BI dataset.

Datasets with aggregations and RLS are supported by the Power BI service on both shared (Pro) and Premium capacities.

Be sure to watch this webcast, which provides a detailed discussion on how to work with aggregations and RLS, and a review of aggregations on big data:

 

 

The updated aggregations’ documentation is available at https://aka.ms/Aggregations