Skip to main content

What’s new in Power Query & Dataflows – November 2020

Over the last few weeks, we have released lots of new Power Query & Dataflows features – Here’s a recap of all the features added or improved in this period:

  • Data Transformations and Query Editor enhancements:
    • Diagram View for authoring of queries
    • Schema view
    • Add Column from Examples
    • Web Connector – By Example Data Extraction
    • Fuzzy Matching:
      • Group By – Fuzzy Matching options
      • Cluster Values
    • Copy/Paste Queries between PQ Online and PQ Desktop
    • Copy data from Query Editor preview
    • Formula bar – Query Script mode
    • Multi-select support in Queries pane
    • Status Bar & Evaluation Performance Counters
  • New and Enhanced Connectors:
    • Parquet files
    • SAP HANA
    • SAP BusinessWarehouse
    • Snowflake
    • Impala
    • FHIR
    • SQL Server – Advanced Options in Get Data UX
  • Dataflows Management, Compute & APIs:
    • New automation possibilities for working with Dataflows programmatically
    • The Enhanced Compute engine is now generally available
    • New automation capabilities for working with Dataflows programmatically
  •  Documentation:
    • New Power Query & Dataflows documentation & best practices

You can continue reading each of the sections below for more details about these new capabilities.

Diagram View

With Power Query customers can easily create complex dependencies between data sources, queries and data transformations. While the Power Query Editor provides ways for users to list queries (“Queries” pane) and steps within them (“Applied Steps” pane), customers have demanded a more visual representation of all queries and their dependencies. In Power Query Desktop, the Query Dependencies view provides a read-only view allowing users to understand the dependencies between data sources and queries, or between multiple queries. However, feedback from customers has been that the Query Dependencies view felt short in a couple of aspects:

  • Providing only query-level visibility for dependencies, instead of step-level granularity.
  • Not allowing users to make edits (such as adding/deleting/editing) to their queries and steps within those queries.
  • Because the Query Dependencies view is a modal dialog experience, there is friction in transitioning between using the Query Dependencies view and authoring queries within the main Power Query Editor dialog, which is a highly iterative process.

Based on this feedback, we decided to invest on a new Diagram View experience that is more deeply integrated within the Power Query Editor experience; becoming a new pane within this dialog that users can interact with at any point in time during their Power Query authoring experience. Conceptually, this new Diagram View provides a graphical representation of the Queries & Steps panes together, with the ability to both get a high-level view of an entire graph of queries (e.g. all queries within a dataflow) as well as digging into query-level and step-level information in a highly visual interactive way.

We’re very happy to announce that the Diagram View is now available as a Public Preview feature within Power Query Online to author dataflows, which customers can enable from within the View tab in the ribbon.

Upon enabling this option, the new Diagram View pane appears on top of the main preview area. At the same time, the Queries & Steps panes are automatically collapsed (since the Diagram View now provides a more visual representation of queries and steps, as well as the same functional capabilities via context menu option on each of these object types).

Please note that, currently, Diagram View is in Preview and as such it provides limited capabilities compared to other Generally Available features – In particular, support for Accessibility features (keyboarding, high-contrast mode, screen readers support) is limited or not yet available. Providing an accessible feature set within the Diagram View is our top priority and area of investment within this feature, along with incorporating new capabilities and addressing feedback from Preview users.

Schema view

When working in the Power Query Editor with tables that have many columns, simple tasks can become incredibly cumbersome because even finding the right column by horizontally scrolling and parsing through all the data is inefficient.

The newly released Schema view displays your column information in a list that’s easy to parse and interact with, making it easier than ever to work on your table schema within Power Query.

In addition to an optimized column management experience, another key benefit of schema view is that transforms tend to yield results faster. These results are faster because this view only requires the columns information to be computed instead of a preview of the data rows. So even working with long-running queries with a few columns will benefit from using the Schema view.

You can turn on the new Schema view by selecting Schema view in the View tab within the ribbon. When you’re ready to work on your data again, you can switch back to the Data view from the same location.

Upon enabling the Schema view, the main preview area switches from Data view to displaying the list of columns in the table results, including their names, data types and several contextual operations, can also be accessed from the ribbon’s new “Schema tools” tab.

You can learn more about the Schema view in this documentation article: https://docs.microsoft.com/en-us/power-query/schema-view

By Example Data Extraction

One of the most exciting and differentiating capabilities within Power Query is the ability to extract and reshape data by example from a variety of sources, including existing tables in the Query Editor, Web pages, Text & CSV files.

This month we are enabling two of these capabilities in Power Query Online: “Add Column From Examples” and “Web By Example”.

Add Column From Examples

“Add Column From Examples” enables users to seamlessly extract and reshape data from existing columns in the Power Query Editor without having to think about the (one or multiple) operations to apply on top of those columns. Instead, using Add Column From Examples (available within the “Add Column” tab in the ribbon) brings users into an interactive experience where they can specify one or more sample output values that they would like to obtain. With just a few examples, Power Query is able to apply smart AI heuristics to infer the best combination of data transformations to get to that result.

You can learn more about Add Column From Examples in this documentation article: https://docs.microsoft.com/en-us/power-query/column-from-example

Web By Example

The Web Page connector in Power Query allows users to extract data formatted as HTML tables from a Web Page. Often however, data on Web pages is not in tidy tables that are easy to extract. Getting data from such pages can be challenging, even if the data is structured and consistent.

Web By Example makes it very easy to extract data from such Web pages, by allowing users to specify a few sample output values from a web page and letting Power Query figure out the right transformation steps in order to extract data from the web page.

Web By Example can be accessed from the “Choose Data” screen when using the Web Page connector.

Upon selecting this option, users will be taken into the Web By Example dialog where they can specify sample output values to extract from this page. Note that, unlike in Power Query Desktop where an inline preview of the web page is displayed, this inline preview is not available within Power Query Online, but users can open it in a different browser window/tab.

You can learn more about Web By Example in this documentation article: https://docs.microsoft.com/en-us/power-query/connectors/web/web-by-example

Fuzzy Matching Enhancements

Often when trying to get data into the right shape, users need to deal with reconciling values that are conceptually the same but represented slightly different in their text form. Examples of this include company names, people names, cities, etc. with different representations or, simply, misspellings.

Last year, we introduced Fuzzy Matching options within the Merge Queries transformation, allowing users to match similar values when joining tables.

We’re now extending those capabilities, based on customer feedback, to support Fuzzy Matching capabilities within two other data transformations:

Fuzzy Group By

In Power Query, you can group values in various rows into a single value by grouping the rows according to the values in one or more columns. Up until now, the only matching criteria for grouping values was an exact match.

We have recently added new Fuzzy Matching options to the Group By dialog, allowing users to match similar values when trying to perform a Group By operation. In the example below, grouping rows to calculate total sales by person, and matching similar Name values.

Which results in the following summarized table:

You can learn more about the new Fuzzy Matching capabilities within Group By in this documentation article: https://docs.microsoft.com/en-us/power-query/group-by#fuzzy-grouping

Cluster Values

In many cases when dealing with slightly similar values that need to be reconciled, users may not want to merge tables nor group rows within a single table, but rather simply have a new column in their tables with the canonical value that is common across multiple slightly different values – what is known as a “cluster” value.

The newly added “Cluster values” transform does exactly that. It can be found within the “Add Column” tab in the ribbon and provides Fuzzy Matching options to provide users with the most flexibility in clustering values, including the ability to obtain the similarity score between the input and canonical values for each row of data.

The output from this transformation is a new column with the clustered values and (optionally) another column with the similarity score between values.

Copy/Paste Queries between PQ Online and PQ Desktop

With Power Query being available across several products, including Power BI (Desktop & Dataflows in PowerBI.com), Excel, Power Apps (Dataflows), Dynamics 365 Customer Insights, Azure Data Factory, Power Automate and more, it’s very common for users to want to move their queries across products. While it is possible to copy-paste M code within the Advanced Editor, this can be complex when trying to copy multiple queries with dependencies.

In Power Query Desktop, it has been possible for some time to right-click (or CTRL+C) queries from the Queries pane, which would take care of copying the selected query (or multiple queries) and all of their upstream dependencies needed in order to the selected queries to work, then paste it into another Power Query Desktop instance (whether in Power BI, Excel or SQL Server Data Tools).

Recently we enabled the same capability within Power Query Online, make it extremely easy to move queries between Power Query Desktop and Online, or between multiple Power Query Online instances.

Copy data from Query Editor preview

An even more basic scenario than copy-pasting queries for users is to copy-paste data results from the Power Query Online Editor – whether it is copying an entire table, a column, a cell value, etc. – These options have all been recently added to Power Query Online; simply right-click on the desired element to copy it to the clipboard.

Multi-select support in Queries pane

It is now possible to perform actions over multiple queries at once by multi-selecting them (CTRL+Click or Shift+Click) in the Queries pane.

Multi-select actions include the ability to Copy, Paste, Delete or move queries to a query group.

Formula bar – Query Script mode

We have introduced a new mode in the Formula Bar, allowing users to switch from step-level script to the full query script. Users can toggle between the two modes from the View tab in the ribbon, or the new shortcut in the Status Bar.

Status Bar & Evaluation Performance Counters

We recently added a new status bar to the Power Query Online Editor dialog, enabling users to:

  • Get a summary view of any validation warnings in their queries.
  • See column & row counts for the currently selected query.
  • When query evaluations are in progress, see performance counters and progress indicators to better understand where time is spent during a query evaluation.
  • When Data Profiling is enabled, allow users to switch between “Top 1,000 rows” and “Entire dataset” for their data profiles.
  • Toggle between Step & Query script modes for the formula bar.
  • Switch between Data & Schema views

Parquet Files connector

The new Parquet Files connector allows users to seamlessly import data from Parquet files into Power Query Online. With this connector users can point at specific files, or use other connectors (such as Azure Data Lake Storage Gen2) to connect to a folder with multiple Parquet files and easily combine them together.

SAP HANA

A few weeks ago, the SAP HANA connector was released in Power Query Online. This connector allows customers to import data from SAP HANA Analytic and Calculation views from SAP HANA databases in the cloud and on-premises, via the On-premises data gateway, into Power Query.

Note that Advanced Options for this connector (such as Native Database Queries, Connection Timeout options, etc.) are supported currently at the M Engine level. Support for these options in the Get Data UX will light up later this year, in the interim they can be configured via the formula bar or Advanced Editor.

SAP BusinessWarehouse

Just like with SAP HANA, the existing SAP Business Warehouse connectors for Application and Message Server have been recently enabled in Power Query Online.

Snowflake

Recently we enabled the Snowflake connector in Power Query Online, enabling customers to import data from Snowflake Data Warehouse into Power Query Online.

Impala

The new Impala connector allows customers to import data from Impala clusters into Power Query Online, by either using a cloud-to-cloud connection or the On-premises data gateway for on-premises clusters.

FHIR

We recently added a new connector to Power Query Online that allows users to import data from FHIR. FHIR is a standard describing data formats and APIs for exchanging electronic health records. The FHIR connector has been available for a few months in Power BI Desktop and we’re very excited to bring now into Power Query Online. You can learn more about the FHIR connector in this documentation article: https://docs.microsoft.com/en-us/power-query/connectors/fhir/fhir

SQL Server – Advanced Options in Get Data UX

We have enhanced the SQL Server connector to include Get Data UX support for some of the most frequently used Advanced Options provided by this connector, including the ability to specify a native SQL statement, to include relationship columns, navigating using full hierarchy (e.g. including database schemas) and support for SQL Server failover.

These new options can be found under the “Advanced Options” section within the SQL Server connector dialog.

Workspace-level granularity to attach storage

We now allow workspace admins to connect to an Azure subscription to bring their own ADLS Gen2 account. This will make it easier for departments to control and assign permissions as well as give flexibility to large organizations who may require multiple ADLS Gen2 accounts for different needs and purposes. Read more here

The Enhanced Compute Engine is now generally available

The enhanced compute engine is now generally available. The enhanced compute engine improves performance for multiple scenarios by loading dataflow entity data into a SQL-based cache.  Using SQL clustered columnstore indices and other optimizations, we target up to a 20x improvement in query processing. Computed entities and DirectQuery connections against the dataflow in Premium can then be fulfilled by reading from the cache instead of reading from storage and flat files as Dataflows in Power BI Pro do. The enhanced compute engine in Power BI Dataflows enables Power BI Premium subscribers to:

  1. Speed up refresh operations when computed entities or linked entities are involved
  2. Enable DQ connectivity over dataflows leveraging the compute engine
  3. Achieve improved performance in the transformation steps of dataflows when entities are cached within the compute engine

Read more about the enhanced compute engine here.

Enhanced Dataflows API and Samples

The Dataflows API provides programmability and automation opportunities for your dataflows. With this enhanced API, users will be able to interact with their dataflows to perform popular transactional activities including refresh, create, delete, import, export, and much more. To help you get started, we are also providing some samples for popular use cases such as:

  1. Refresh dataflows
  2. Automatically refresh datasets when a dataflow completes

You can find the refresh APIs documented here and the Sample PowerShell script here.

New Power Query & Dataflows documentation

We have added over 50 new Power Query & Dataflows documentation articles, including getting started and quick how-to guides for common data transformations, connector reference articles and best practices for working with Power Query & Dataflows.

You can access them at https://docs.microsoft.com/en-us/power-query/

That’s all for this month! We hope that you find these features useful and continue sending us your feedback and suggestions so we can improve and add new capabilities to Power Query & Dataflows.