Introduction
Over the past few weeks we have released several new features in Dataflows, allowing users to more seamlessly ingest, prepare, and refresh data that can be widely reused by others across the Power Platform and Azure. Here’s a recap of all the features added or improved over this period. At the end, we’ll also provide a bigger picture view on our priorities and what’s coming soon.
Premium Per User
The Premium features of Dataflows, including enhanced compute and incremental refresh, are now available to all PPU-licensed users, a license that recently became generally available. A lot of our customers are seeing great gains in performance, along with the extra features, while using PPU licenses over Pro. To learn more about Premium per user, please review the online documentation and the Power BI pricing page.
Dataflows Authoring
Today users can author dataflows in a variety of ways – including using Power Query Online, or attaching to a data lake folder with data stored in CDM format. We have two updates here as part of our bigger investments we are making in authoring improvements.
The first update is a terminology refresh, where we are responding to customer feedback and data from user research, and updating some terminology in Dataflows to be more intuitive and make its usage more productive. The terminology updates are listed below, and you’ll see them updated in the authoring experience as well.
TERMINOLOGY UPDATES
Legacy term | Current term |
Entity, entities | Table, tables |
Field, fields Attribute, attributes |
Column, columns |
Note: These terminology updates aren’t applicable to any APIs or web services for Dataflows, so any integrations you might have built programmatically will continue to work seamlessly.
While a subtle change, this is aimed at quality of life and making dataflows simpler to understand and use. This small investment also builds on the many investments we have made inside of Power Query Online around step folding indicators, visual data prep with diagram view, enhanced warnings, and other experience and quality of life improvements. It is another precursor to bigger changes we have coming around making the Dataflows authoring experience faster, more intuitive, and more enjoyable – from connectivity, to defining ETL steps, to Save and Close.
SUPPORT TO READ THE LATEST CDM MANIFEST
The second update we have allows users to read the latest Common Data Model format, the manifest. A Common Data Model manifest object and the document that contains one (*.manifest.cdm.json) is an organizing document that acts as an entry point directory that points to the items in the Common Data Model folder. The manifest object describes the list of tables, giving a detailed schema description for each table, a collection of data partition files for each table, a list of the known relationships between tables, and potentially other sub-manifest objects that are nested.
Those familiar with the model.json file and the format that describes a Common Data Model folder will recognize the manifest as a similar but expanded concept. In fact, the Common Data Model object model offers backward compatibility with the model.json format. With this update, make sure that the name of your document is *default.manifest.cdm.json* and your files are in CSV format to enjoy read support for both file formats with Dataflows – supporting the subset of features supported by both formats. Learn more about the previous CDM model.json here and the new CDM manifest here.
Dataflows Management
ENHANCED REFRESH METRICS
One of your biggest asks is to make Dataflows easier to manage and maintain. Improving the refresh history and stats is our first click stop towards better visibility into Dataflow performance. With the enhanced refresh history metrics, you can get better information about refreshes. Using Refresh History provides an overview of refreshes, including the type – on demand or scheduled, the duration, and the run status. To see details in the form of a CSV file, select the download icon on the far right of the refresh description’s row. The downloaded CSV includes the attributes described in the following table. Premium refreshes now provide more information based on the dataflows capabilities like the enhanced compute engine and incremental refresh – super useful for optimization and troubleshooting scenarios. Learn more about the Enhanced Refresh History here.
ENHANCED WORKSPACE VIEWER ROLE
The ability to make dataflows more collaborative is highly requested. The following update is the first sneak peak at relaxing ownership requirements and permission for key functions and allowing others to do more with Dataflows. Now, non-owners with at least the Viewer workspace role can now see Dataflow tables, their column names and data types.
This feature, while subtle, is a precursor to better collaboration and editing experiences we have planned. Further updates we will make here will use workspace roles, so do get acquainted with them. Learn more about workspace roles here.
ADLS Gen 2 integration is Now Generally Available
In Power BI Dataflows, customers can build their own reusable data preparation pipelines. A key part of making this data reusable and accessible is our Azure Data Lake Storage integration. Today, we are making this integration Generally Available. This means customers can connect their Power BI dataflows to their own Azure Data Lake Storage at the tenant or workspace level. With this flexibility, we allow workspace admins to connect to an Azure subscription to bring their own ADLS Gen2 account. This will make it easier for departments to control and assign permissions as well as give flexibility to large organizations who may require multiple ADLS Gen2 accounts for different needs and purposes. There are many interesting extensibility use cases you can unlock with Dataflows, Azure, CDM, and Power Platform – and a few of our MVPs have been blogging about amazing scenarios using this feature that caught our eye:
-
- TheBICcountant, a blog by MVP Imke Feldmann, describes how you can use this integration and the Power Query connectors for Azure Storage to read data from Dataflows in Excel, a great solution if you want to reuse data from Dataflows as a citizen data engineer or analyst, or encourage others in your organization to do the same. This is an awesome use case! Check out Imke’s blog and how-to walkthrough here.
- Datachant, a blog by MVP Gil Raviv, showcases how to build a data quality report using this feature – analyzing data in each of the snapshots, and automating data profiling using Power BI. We think this is a pretty cool use case – read more about how to build something like this for yourself here – Data Quality Automation with Power Query – DataChant.
New Dataflows Documentation
One of the top asks we have heard from the community is better documentation and guidance for key scenarios and features. Here’s some of our recently updated documentation:
- Dataflows Best Practices
- This article provides a list of curated best practices, with links to articles and other information that will help you understand and use dataflows to their full potential
- Understanding and Optimizing Dataflow Refreshes
- This article demystifies the refresh process and provides guidance and recommendations for common scenarios
- Using Azure Data Lake Gen 2
- This article explains how to configure the connection, the value proposition, details about how your data is stored, and more information about the CDM format
Coming Soon
Careful reading might have noticed a few teasers. Indeed, we have some exciting features that will be available in the coming months ahead. Currently, we are laser focused on performance and reliability improvements, and delivering on YOUR top asks, including:
- Background Validation
- With this update, users will be able to Save a dataflow, leave the screen, and continue working. Validation now takes place in the background so users can save the dataflow without friction.
- Multiple Contributors to Dataflows
- We’re working to allow non-owners to edit Dataflows, see the applied steps, and manage them – making Dataflows more collaborative so that teams can be productive.
- Notifications to Non-Owners
- Sending refresh failure emails to users or groups, just like Datasets
- Deployment Pipelines
- We’ll bring support for Deployment Pipelines to Dataflows, allowing you to deploy and update Dataflows across the lifecycle stages and make them more enterprise grade.
- Updated Guidance
- Advice for common scenarios and prescriptive how-to’s
- Webinar on Dataflows
- Samples
- Say you’re new to Power BI and want to try it out but don’t have any data. Or maybe you’d like to see Dataflows that illustrate some of the capabilities of Power BI. We’ll have you covered.
There is more to come, so please make sure to follow our release notes to track the latest roadmap updates. Are we missing something important to you? Please post ideas or vote for them so that we can know what is missing for your team.