Skip to main content

Query parallelization helps to boost Power BI dataset performance in DirectQuery mode

Headshot of article author Kay Unkroth

We are excited to announce significant query processing improvements in the Power BI engine around query parallelization in DirectQuery mode, which can help to accelerate query response times, specifically for DAX and MDX expressions that generate multiple storage engine (SE) queries. If you know how DirectQuery works and how DAX and MDX queries fetch data through the storage engine, you might find the below details interesting. On the other hand, if you prefer to stay at a higher level, rest assured that Power BI uses reasonable defaults to achieve a good balance between query performance and data source load. If you notice that your DirectQuery reports happen to run faster, it might just be due to these query processing improvements.

The fundamental idea is to maximize query performance by parallelizing as many SE queries per DAX or MDX query as possible. This query parallelization reduces the impact of data source delays and network latencies on query performance. Query parallelization is enabled on Power BI Premium and Embedded SKUs with eight or more vCores. Power BI Pro and Power BI Premium and Embedded capacities with fewer vCores continue to process DAX and MDX queries sequentially.

Modern cloud-scale data warehouses can support a very high degree of concurrent connections and large numbers of parallel queries. Moreover, features such as automatic concurrency scaling enable these data warehouses to increase their compute power on an as-needed basis to meet the demand of even the highest peaks in query volume. For example, performance tests show that automatic concurrency scaling can help to improve overall query performance by up to approximately 80%. A query that once ran for over 3 minutes could now complete in about 33 seconds! And thanks to query parallelization on Power BI Premium, you can now take full advantage to accelerate your DirectQuery queries.

When optimizing query parallelization in DirectQuery mode, keep the following factors in mind, which influence the user experience (5) depicted in the below diagram:

  1. The number of Power Query evaluations, Power BI engine jobs, and active data source connections, which ultimately determine the number of simultaneous queries that Power BI can send to the data source.
  2. The total number of data-source queries in response to a user interaction. The visuals you place on a report canvas generate DAX queries that Power BI each translates into one or more storage engine queries, which, in DirectQuery mode, result in individual simultaneous or sequential data source queries.
  3. The response time of the data source for each individual query. Make sure that the achieved degree of query parallelization doesn’t overwhelm the data source as this could negatively impact DAX or MDX query performance.
  4. The Power BI lag time between a received response and the next sequential SE query. The max achievable degree of SE query parallelization also depends on the design of the DAX or MDX query. For example, a complex DAX expression might fetch some data first, process it, and then fetch some more based on the results. Make sure you optimize your DAX expressions with query parallelization in mind.

Note, however, that query parallelization is not unlimited. Power BI provides several safeguards to protect both the Power BI service and the data sources from overutilization:

  1. For starters, the maximum might be constrained if an on-prem data gateway is included in the communication path. For DirectQuery specifically, the MashupDQPoolContainerMaxCount setting determines the max number of simultaneous jobs, in other words, the max number of queries that the data gateway can run in parallel. For details, see Adjust gateway performance based on server CPU in the product documentation.
  2. In Power BI Desktop, you can configure a max number of simultaneous evaluations and a max number of concurrent jobs to constrain how many Power Query operations can be executed concurrently. For details, see Evaluation configuration settings in the product documentation.
  3. On a per-dataset basis, you can also set a DataSourceDefaultMaxConnections parameter in the dataset metadata to define how many simultaneous connections Power BI can open to each data source. For details, see DataSourceDefaultMaxConnections Property in the product documentation.
  4. In advanced datasets that define data sources directly in the metadata, you can also override the DataSourceDefaultMaxConnections parameter by setting a MaxConnections parameter for individual data sources. For details, see MaxConnections Property in the product documentation.
  5. Also, on a per-dataset basis, you can now define a max number of SE queries that a single DAX query can run in parallel. Set the Model.MaxParallelismPerQuery parameter in the dataset metadata to a non-zero value. For details, see MaxParallelismPerQuery Property in the product documentation.

And there you have it! A single Dax query cannot run more SE queries in parallel than the above 6 criteria allow. In other words, it is the minimum of the above 6 criteria that determines the max degree of SE query parallelism for a single Dax query.

Also note that—during the preview phase—the default value for the Model.MaxParallelismPerQuery parameter is 1, which effectively deactivates SE query parallelization for a single Dax query. If you want to enable SE query parallelization for individual DAX and MDX queries, refer to the following code snippet to configure the Model.MaxParallelismPerQuery parameter. This parameter can only be configured through TOM and TMSL.

using TOM = Microsoft.AnalysisServices.Tabular;

partial class Program
{
    static void Main(string[] args)
    {
        using (var server = new TOM.Server())
        {
            string workspaceUrl = "<URL to workspace>";
            string databaseName = "<database name>";

            // Connect to the dataset.
            server.Connect(workspaceUrl);

            TOM.Database database = server.Databases.FindByName(databaseName);
            if (database == null)
            {
                throw new ApplicationException("Database cannot be found!");
            }

            if (database.CompatibilityLevel < 1569)
            {
                database.CompatibilityLevel = 1569;
                database.Update();
            }

            TOM.Model model = database.Model;

            // The max number of parallel SE query for a single Dax query.
            model.MaxParallelismPerQuery = 10;
            model.SaveChanges();
        }

        Console.WriteLine("\nDone! Press any key to exit.");
        Console.Read();
    }
}

 

At General Availability (GA), we plan to change the default value for the Model.MaxParallelismPerQuery parameter to 0, for Automatic, to let Power BI choose the right degree of SE query parallelization based on the CPU configuration of the Premium SKUs, so you don’t have to. At that time, you no longer need to set this parameter manually to benefit from SE query parallelization. However, you will still be able to use the above code if you wish to control the configuration manually. For example, you could set the parameter back to 1 to disable SE query parallelization and achieve the sequential behavior for backward compatibility.

Power BI already processes simultaneous DAX queries in parallel, and with the new Model.MaxParallelismPerQuery parameter you can increase the query parallelization in Power BI even father to take full advantage of your cloud-scale-data-warehouse horsepower and accelerate your DirectQuery queries as best as possible. On the other hand, you can also constrain Power BI to sequential query execution by setting the max number of simultaneous jobs, the DataSourceDefaultMaxConnections in your dataset, or the MaxConnections parameter for your data source to 1 to accommodate data sources that prefer to answer a single query at a time. Whatever the query parallelization requirements, you can fine-tune the configuration of your datasets in Power BI to meet the requirements in your environment. So, please try out the new query parallelization capability and provide us with feedback if you want to help deliver additional enhancements to boost the performance of your BI workloads in DirectQuery mode. We are looking forward to hearing from you!