Over the past few weeks we have released several new features in Dataflows, allowing users to seamlessly ingest and prepare data that can be widely reused by other users across the Power Platform, including:
7 New Data Connectors:
- PDF Files
- Local Folder
- SharePoint Folder
- Google BigQuery
- HDInsight Spark
- Apache Spark
- Generic ODBC
40+ New Data Transformations:
- Combine Files UX
- Merge Queries – Visual join kind selection
- Additional Number/Date/DateTime/Duration transformations UX
- List transformations: Statistics, Sort, Keep/Remove/Reverse items
- Fill Up/Down
- Move Columns left/right/beginning/end
- Replace Errors
Other Power Query Authoring Enhancements:
- Data Profiling
- Query Parameters UX
- Function Authoring UX
- M Intellisense support in Advanced Query Editor & Formula Bar
- Select Related Tables as part of Get Data UX
You can continue reading the rest of this blog post for more details about each of these features.
New Data Connectors
This month we have added 7 new data connectors to Dataflows, including:
- PDF Files – This connector allows users to extract tables from PDF documents.
- Folder – Use this connector in order to ingest data from files in local (or network-based) file systems, or to query and analyze files’ metadata.
- SharePoint Folder – Similar scenarios as the previous item, but available on top of a SharePoint folder (either on-prem or SharePoint Online).
- Google BigQuery – Ingest data from Google BigQuery databases, transform, filter and reshape this data as part of your dataflow.
- HDInsight Spark – Read tables from HDInsight Spark databases.
- Apache Spark – Read tables from any Apache Spark distribution database, either on-prem or cloud-based.
- Generic ODBC – Plug in any ODBC driver, specify a connection string or DSN and connect from your dataflows to import and transform data from many different sources.
You can find all of these new connectors within the Power Query Online Get Data experience in Dataflows:
New Data Transformations
Combine Files
A very common use case within Data Preparation consists in combining data from multiple files within a folder in order to create a single logical view or table. For example, imagine being able to easily append files from a folder that have the same schema but contain rows of data for different time periods; with new files being dropped in the same location on a regular basis.
This scenario can now be easily accomplished using Dataflows, in combination with the Folder, SharePoint Folder and Azure Blobs connector, or on top of any other tabular result in Power Query that contains a column with Binary data.
Users can access the Combine Files experience directly from the Get Data preview for each of those connectors, or from the Power Query Editor by clicking the Combine icon in a Binary column’s header or under the Combine menu in the toolbar.
Upon entering the Combine Files dialog, users can preview and select the relevant data from each of the files, by using one of the files in the folder as the sample. Users can also decide to skip files with errors, generally caused by a different structure within that file that prevents the extraction logic from working.
After closing the Combine Files dialog, users will see multiple queries generated in the Power Query Editor, allowing further transformations to be applied in order to extract data from each of the files, as well as the final output query that combines all files within the folder using the specified transformation steps.
You can learn more about Combine Files in this documentation article: https://docs.microsoft.com/en-us/power-query/combinefiles
Merge Queries – Visual join kind selection
Merge Queries allows users to combine (join) two tables within Power Query. In addition to selecting which tables and columns to use for the join, users can also select the type of join they would like to apply:
We have recently improved the user experience for selecting the join type by displaying Venn diagrams that visually describe the semantics of each type.
Additional Number/Date/DateTime/Duration transformations UX
We’ve added UX support for several Number/Date/DateTime/Duration transformations via the toolbar. These operations can now be accessed from the Power Query Editor and make it very easy to work with queries that yield scalar values of any of these types.
- Number transformations:
- Standard: Add, Multiply, Subtract, Divide, Integer divide, Modulo, Percentage, Percent of
- Scientific: Absolute value, Power (Square, Cube and Custom), Square root, Exponent, Logarithm (Base-10 and Natural) and Factorial.
- Date transformations:
- Age
- Date only
- Year: Extract year, Start of Year, End of Year
- Month: Extract month (number), start of month, end of month, days in month, name of the month.
- Quarter: Quarter of year, Start of quarter, End of quarter.
- Week: Week of year, Week of month, Start of week, End of week.
- Day: Day, Day of week, Day of year, Start of day, End of day, Name of day.
- Time transformations:
- Time only
- Hour: Extract hour, Start of hour, End of hour
- Minute
- Second
- Duration transforms:
- Extract component: Days, Hours, Minutes, Seconds
- Calculate total: Years, Days, Hours, Minutes, Seconds
List Transforms: Statistics, Sort, Keep/Remove/Reverse items
A lot of new operations have been added to the Power Query Editor toolbar in order to transform List results, including:
- Reduce items:
- Remove top items
- Remove bottom items
- Remove alternate items
- Keep top items
- Keep bottom items
- Keep range of items
- Remove duplicates
- Reverse items
- Sort Items: Ascending and descending.
- Statistics:
- Sum
- Minimum
- Maximum
- Median
- Average
- Standard deviation
- Count values
- Count distinct values
Fill Up/Down
A common transformation needed by users when dealing with jagged tables is the ability to fill values to adjacent empty cells above or below a cell with a non-empty value. This can be accomplished with the Fill Up and Fill Down transformations in Power Query, which can now be accessed from the toolbar or right-click context menu.
Move Columns left/right/beginning/end
With the new Move Columns menu in the Power Query Editor toolbar and column context menu, it is extremely easy to reorder columns within a table.
Replace Errors
Another common transformation is dealing with cell-level errors within a table (which might have been caused by trying to change a column type over incompatible data values or similar operations). In addition to being able to remove error values, we’re now adding a new Replace Errors transformation that allows users to replace error values within a column with another valid value, such as a default value.
This operation can be accessed from the Transform Column menu in the Power Query Editor toolbar, or via the column context menu.
Users can then specify a value to replace errors with for the selected column.
Data Profiling
Power Query provides hundreds of data transformations that can be applied by users to clean and reshape data into the desired output. However, it is sometimes hard for users to clearly identify issues within their data such as errors, empty values or duplicates. Additionally, it’s also important for users to understand value distribution within a given column or other relevant statistics (such as average, median, min, max for a Number columns, or other type specific statistics).
This month, we’re bringing Data Profiling capabilities to Power Query Online that will help dataflows authors more easily understand their data and produce better data entities that can be reused by several other users within their organizations.
The new Data Profiling capabilities can be enabled from the Global Options dialog in the Power Query Editor toolbar.
From the Global Options dialog, users can configure which Data Profiling capabilities they would like to enable, including inline Column Quality and Value Distribution information in the Data preview, as well as the Column Profile Details pane. Users can also select whether they would like these data profiles to be calculated on top of the top 1,000 rows of data or over the entire dataset.
After enabling these features, they can be accessed from the main Power Query Editor surface, contextual to the table columns in the data preview.
The Data Profiling charts are interactive and allow users to take action directly from them via Smart Recommendations offered to users on hover – for example, to remove errors or duplicate values from the inline column quality details, or to exclude a specific value from the Value Distribution chart in the Column Profiles details pane.
Query Parameters
With the new Query Parameters feature, users can now easily define one or multiple parameters to be used in their dataflows. Users can define new parameters by using the “Manage Parameters” dialog in the Power Query Editor, accessible from the toolbar.
From the Manage Parameters dialog, users can create new parameters and specify metadata and settings for each parameter:
- Parameter Name.
- Parameter Description: This will be displayed next to the parameter name in downstream experiences, and helps the user who is specifying the parameter value to better understand the purpose and semantics of this parameter.
- Optional vs. Required: Users can specify whether a certain parameter is optional or a value for that parameter must be specified (required).
- Parameter Type: This field applies a Data Type restriction on the input value for the parameter. For instance, users can define a parameter of type Text, or Date/Time. Users can also specify Any type for more flexibility.
- Parameter Accepted Values: In addition to Data Type restrictions, users can apply further restrictions to the allowed values for a given parameter. For instance, users could specify that the Data Type for a parameter is Text and restrict the acceptable values to a static list of Text values. Users will then be able to pick one of these values when specifying the parameter value to use. Users can preconfigure Accepted Values as a static list, or any value of the expected type, or dynamically binding the list of Accepted Values for a parameter to the output of another query returning a list of values. This enables dynamic sets of options to be displayed to the user, maybe even based on their selection for another parameter. A typical example for this would be making a “City” parameter change the list of values to select from, based on another parameter that allows users to specify a “State”.
- Default Value: This setting allows the Parameter creator to specify what the default value or selection should be for the user specifying the parameter value.
- Current Value: This setting allows users to specify the value for this parameter in the current dataflow.
Once a parameter has been created, it can be accessed and modified from the Queries pane. Additionally, users can create and bind parameters to input fields from specific dialogs in Power Query, including connector dialogs (e.g. Server), Filter Rows dialog and more.
Function Authoring UX
With this update, we’re making it extremely easy to author functions in Power Query without writing a single line of code!
Users can now create a function based on an existing query by using the “Create Function” option in the query context menu.
This option brings up a dialog that allows users to provide a function name. If the original query was bound to any query parameters, those query parameters will become function input parameters, inheriting the same name, type, description and restrictions from the original query parameter.
After the function has been created, it becomes readily available for users to invoke it from their dataflow.
M Intellisense support in Advanced Query Editor, Blank Query & Formula Bar
One of the most frequently requested features from users has been Intellisense support when authoring M code in Power Query. Recently we enabled M Intellisense in the Advanced Query Editor (accessible via the right-click menu on queries within the Queries pane), Blank Query option in Get Data and the formula bar.
Select Related Tables as part of Get Data UX
A few weeks ago we released an update that lights up the “Select Related Tables” capability within the Navigator (or Choose data) stage of the Power Query experience. This option allows users to easily identify and select all tables that have relationships in the data source to the currently selected tables.
This new “Select Related Tables” option is available for those connectors that expose relationships information, such as relational databases (SQL Server, Oracle, MySQL, etc.), OData and other sources.
That’s all for this month! We hope that you find these feature valuable.
We’re looking forward to your feedback and suggestions that you might have in order to make Dataflows even better, please vote for them in our Power BI Ideas Forum’s Dataflows category.
Learn more:
- Power Query: www.powerquery.com
- Dataflows in Power BI: https://docs.microsoft.com/en-us/power-bi/service-dataflows-overview
- Dataflows in Power Apps: https://docs.microsoft.com/en-us/powerapps/maker/common-data-service/create-and-use-dataflows