Power BI enables organizations to adopt a data-driven culture where every person can get value from data. With the massive amounts of self-service data generated in Power BI, our Power BI customers tell us about a number of emerging challenges:
- How to allow self-service but still govern data efficiently
- How to help users discover the right data to use
- How to reduce data duplication
We’re excited to announce that with this release we’ve made it easier for you to get the information you need from Power BI to help you to address and overcome these challenges.
We now release new Power BI Admin APIs, along with a .NET SDK, that enable administrators to create their own custom-made solutions based on Power BI metadata and lineage. The idea for this new functionality was born out of thinking about how to improve the efficiency and performance of the Power BI scan so that it could support large numbers of data assets while still conforming to the security practices of organizations.
Service Principal authentication for read-only scanner Admin APIs
To better support the security constraints of some organizations, we added service principal support for the scanner Admin APIs. Service principal is an authentication method that can be used to let an Azure AD application access Power BI APIs. With this authentication method, you no longer have to maintain a service account with an admin role. Rather, to allow your app to use the Admin APIs, you just have to give your approval once as part of the tenant settings configuration.
To get an idea of the value of service principal authentication, we can look at the case of Collibra. Collibra, a data intelligence company with industry-leading data catalog and governance tools, just announced the integration of Power BI with Collibra Data Catalog. With this integration, Power BI customers who use Collibra can enjoy the benefits of having the metadata and lineage info of their Power BI assets in the Catalog, empowering them to make better data-driven decisions, as well as to govern their data more effectively.
“We discovered the absolute necessity of the service principal in the early stages of our Power BI integration development”, Yulia Prylypko, a product manager in Collibra, told us. “Without service principal, we couldn’t have fulfilled our customers’ security requirements, but with this support we have unblocked customers so that they can start using Power BI in Collibra”.
To enable service principal access to read-only Admin APIs, read more here.
Asynchronous unified scanning APIs
In the past, to get a full scan of Power BI assets, data source metadata, and lineage, you had to call multiple APIs. Now we’ve released unified Async APIs that can get you all the required metadata and lineage information in an efficient, reliable way.
We learned that each tenant has a massive amount of data, and so, to avoid failures in returning the metadata and to improve scanning time, we implemented the APIs in an asynchronous way. The APIs were designed with full-tenant scan in mind, and their efficiency on the server side was improved dramatically. The time it now takes for a full scan of large tenants can be just minutes or hours, instead of days or weeks as in the past, and the number of failures has gone down significantly.
Incremental scan
We understand the customer need to have a scheduled scan that gets the required info from Power BI and provides an up-to-date picture of what’s going on in the Power BI tenant. We know how to distinguish which of the customer’s workspaces don’t change frequently, most of the time staying the same without any updates. In these “static workspaces”, the data might get refreshed, but the associated metadata stays the same. With this understanding, we designed support for an incremental scan, giving customers the flexibility to scan only those workspaces that have changed since the last time they were scanned. Using the incremental scan can reduce scanning time significantly and save resources, both for the customer and for the Power BI service. Note: incremental scan supports up to 30 days back, and not before feature roll-out which is December 10th.
Endorsement (Certified and Promoted) labels
To better support the discovery of high-quality data, the information the API returns for dataflows, datasets, and reports includes information about endorsement, if any. This makes it easy to get a clear overall picture about endorsed content in your organization. Read more about the new endorsement capabilities here.
Sensitivity labels
If you use sensitivity labels in Power BI to protect your data, you might find it useful to extract and use this information in your customized scanning solution.
The new APIs return the sensitivity label ID for each labeled artifact. You can use it to create your own report to see how well your data is protected.
Walkthrough: Scan using the new Admin APIs
Step 1: Before you start, decide which authentication method you’d like to use with the APIs. You can choose to use the Power BI service Admin delegated token as before, or to use the new Service Principal support for read-only Admin APIs.
Step 2: Perform a full scan.
Call workspaces/modified without modifiedSince to get the complete list of workspace IDs in the tenant. This retrieves all the workspaces in the tenant, including classic workspaces, personal workspaces, and new workspaces.
Divide the list into chunks of 100 workspaces at most.
For each chunk of 100 workspaces:
Call workspaces/getInfo to trigger a scan call for these 100 workspaces. You will receive the scan ID in the response to use in the next steps. In the location header you’ll also receive the URI to call for the next step. The URI supports the following additional parameters, added in the query string (The default for both parameters is false):
- lineage=true to receive the lineage info for all the artifacts returned
- datasourceDetails=true to receive data source details for datasets and dataflows
Use the URI from the location header you received in Step 1 and poll on workspaces/scanStatus/{scan_id} until the status returned is “Succeeded”. This means the scan result is ready. It is recommended to use a polling interval of 30-60 seconds. In the location header you’ll also receive the URI to call for the next step. Only use it when the status is “Succeeded”.
Use the URI from the location header your received in Step 2, and read the data using workspaces/scanResult/{scan_id}. The data contains the workspaces list, artifacts info, and additional metadata based on the parameters passed in Step 1.
Step 3: Perform incremental scan.
Now that you have all the workspaces and the metadata and lineage of their assets, it’s recommended that you perform only incremental scans that reference the previous scan that you did.
Call Modified Workspaces with modifiedSince set to the start time of the last scan in order to get the workspaces that have changed and which therefore require another scan.
Separate this list into chunks of up to 100 workspaces, and get the data for these changed workspaces using the 3 API calls as described in Step 2.
Wrapping things up
This is an exciting milestone for us and we are sure it is for you too! This release focused on enabling you to start building your homegrown scanning solutions based on the metadata and lineage of your tenant’s assets. We’ll continue working to enhance the Admin APIs with more information to address your needs.
APIs resources
Use the following links to locate resources that can help you get started using the new APIs: