Power BI is an AI and BI platform that allows you to transform your data into actionable analytics. A key pillar of this platform are dataflows – our self service data prep solution that helps you collect, clean, combine and enrich your data. To improve the scale of these analyses, we are turning on the enhanced compute engine for all new dataflows by default in all new capacities provisioned – the next step in our roadmap for enhancing the speed and performance of your dataflows. This change is currently being rolled out and we expect it to be complete by end of October. The enhanced compute engine in Power BI Dataflows enables Power BI Premium subscribers to:
- Speed up refresh operations when computed entities or linked entities are involved
- Enable DQ connectivity over dataflows leveraging the compute engine
- Achieve improved performance in the transformation steps of dataflows when entities are cached within the compute engine
How it works:
This enhanced compute engine improves performance for multiple scenarios by loading dataflow entity data into a SQL-based cache. Using SQL clustered columnstore indices and other optimizations, we target up to a 20x improvement in query processing. Computed entities and DirectQuery connections against the dataflow in Premium can then be fulfilled by reading from the cache instead of reading from storage and flat files as Dataflows in Power BI Pro do.
How to leverage the gains:
For new capacities which make use of dataflows, the engine will be enabled by default. However, to best take advantage of this, there are a few things you can do to ensure your dataflows workloads will benefit from optimized performance. You can tune the performance of the workload through the capacity settings for dataflows. Below is an overview of each setting and some high level guidance:
- The Compute Engine Memory (%) allows you to configure the percentage of memory allocated to the compute engine. The default value is 30%, meaning that the compute engine is permitted to utilize 30% of your dataflow memory. This can be useful when you have a lot of computed entities in your dataflows and need to do many complex computations.
- The Max Memory setting on your Premium capacity’s Dataflow workload sets the amount of memory that the dataflows workload will utilize. As you scale up dataflows, you will find that you can scale this percentage from the default of 20% to the full 100%. In this case, the Premium capacity can scale dynamically to allocate as much memory as possible to Dataflows. This can be helpful when you have complex data prep needs with large scale data sizes.
- The Container size of Dataflows configures the virtual containers that are assigned to each entity dynamically, based on your data prep logic. Upping this setting from the default values can help improve performance on slower refreshes, mitigate timeouts, or boost entity computation performance. To determine whether or not this change is needed, use the Power BI Premium Capacity Metrics app to analyze Dataflow workload performance.
- Make sure that you are building dataflows according to best practices and guidelines:
- Separate your blocks of work into dataflows, such as ingestion, transformation, enrichment, and consumption
- Wherever a computed entity is leveraged, such as the transform and consume steps, we’ll use the enhanced compute engine.
And that’s it. Again, once you have configured this change, you should see a performance improvement in any computed entity that performs complex operations, such as joins or group by operations for dataflows created from existing linked entities on the same capacity. You’ll also unlock DirectQuery capabilities if you need them.
If you do nothing:
For existing capacities – Your dataflows continue to perform and work as is. However, we strongly encourage you to take a look at enabling this feature, particularly if you are working with millions of rows of data.
For new Premium capacities – Your dataflows will have this feature enabled. Do review memory capacity for the workload settings to better understand what levers you have to optimize performance.
What’s coming soon:
Our announcement of Power BI Premium Gen 2 continues our roadmap item to increase performance and scale of dataflows while simultaneously making performance management easier with automatic dataflows engine configuration and on the fly optimizations. As we plan this enhanced experience, we’re listening and anxious to get your feedback to make this experience as enjoyable as possible. Have comments, feedback, or ideas for future improvements? We’d love to hear from you. You can vote on new features or upvote existing ideas here.
Next Steps:
- Create a dataflow
- Configure Power BI Premium Dataflow Workloads
- Developing complex dataflows
- Dataflows best practices and guidance