We have some exciting news to share with you today! We have completed the replica synchronization feature and finalized the APIs for Dataset Scale-Out to give you more control over the scale-out configuration and replica synchronization behavior. Specifically, you no longer need to enable Scale-Out at the workspace level by using a burdensome XMLA request. The XMLA command introduced in the initial public preview announcement is deprecated and will no longer work. You can now enable Scale-Out on a dataset-by-dataset basis using the Power BI REST API for datasets. The Power BI REST APIs do not require XMLA. You also no longer need to synchronize read-only replicas manually if you want to take advantage of automatic replica synchronization, which is enabled by default. But of course, you can still disable automatic synchronization, as demonstrated later in this article, if you prefer to synchronize the read/write and read-only replicas of a dataset manually to maintain controlled refresh isolation. In fact, any datasets that you already have in a scale-out enabled workspace remain scale-out enabled with automatic synchronization disabled so that the datasets maintain their existing behavior.
Let’s take a quick tour through the configuration of Dataset Scale-Out. For additional information, see the initial blog post Announcing the Public Preview of Power BI Dataset Scale-Out published in January, and also refer to Power BI Dataset Scale-Out in the product documentation for detailed step-by-step instructions to configure scale-out and test refresh isolation.
The following explanations assume that Dataset Scale-Out is enabled in your Power BI organization (which it is by default), that the workspace of your datasets resides on a Premium per User (PPU), a Power BI Premium (A or P SKU) or a Fabric capacity (F SKU), that your datasets are configured to use the large storage format as in the screenshot below, and that you have installed the Power BI Management cmdlets.
The urls in the screenshots above indicate that this article’s workspace has the Id 50007062-0bed-46f4-a1b4-f24dcfb0912b and the dataset name is AdventureWorks. The PowerShell sample script uses the Get-PowerBIDataset to resolve the dataset name into the dataset Id. The workspace id and the dataset id are important parameters for the Get Dataset In Group Power BI REST API to check the current dataset configuration. The Invoke-PowerBIRestMethod provides a straightforward way to submit a corresponding Get request and retrieve the results in JSON format, including the queryScaleOutSettings properties which represent the scale-out configuration. As the screenshot below reveals, dataset scale-out is currently disabled for this article’s dataset because the maxReadOnlyReplicas count is zero.
In order to enable scale-out for an individual dataset, you must set the maxReadOnlyReplicas parameter to a non-zero value. A value of -1 lets Power BI create as many read-only replicas as the Power BI capacity supports, yet you can also explicitly set the replica count to a value that is lower than the capacity maximum. However, Dataset Scale-Out is still currently limited to a single read-only replica. We will remove this limitation at a later stage during the public preview. Setting maxReadOnlyReplicas to -1 is recommended. The following screenshot shows the Patch request to set maxReadOnlyReplicas to -1. Submitting this REST API request enables scale-out for this dataset.
In the screenshot above, it is also worth pointing out that the autoSyncReadOnlyReplicas parameter is set to true, by default. So, Power BI synchronizes the dataset replicas automatically. This is also a recent Dataset Scale-Out improvement. The initial public preview release always required you to sync the replicas manually, as documented in Sync a read-only dataset scale-out replica. We now have lifted this limitation. Yet, as mentioned in the beginning of this article, if you prefer to keep syncing manually by using the dataset syncStatus and sync REST APIs, you can disable auto sync by setting the autoSyncReadOnlyReplicas to false, as in the following screenshot.
Of course, you can also disable scale-out for this dataset again by setting the maxReadOnlyReplicas parameter back to 0. If you furthermore check the sync status of a scale-out-disabled dataset, you will be informed that read-only replicas have been disabled accordingly, as in the following screenshot. As a side note, if you have worked with the sync APIs in the initial preview release, you might notice that the urls of the /syncStatus and /sync APIs have also changed to /queryScaleOut/syncStatus and /queryScaleOut/sync. The initial /syncStatus and /sync APIs are also deprecated and should no longer be used.
And that’s it for a quick update to the public preview of Dataset Scale-Out. The next big rock is to increase the number of read-only replicas up to the maximum a given capacity size can possibly support. So, stay tuned and update the configuration of your scale-out-enabled datasets now, as demonstrated above. And as always, please provide us with feedback if you want to help deliver additional enhancements. We hope you are as excited about Dataset Scale-Out as we are. We are looking forward to hearing from you!