DataFrames, PowerShell, and Excel
Microsoft is announcing the preview of a DataFrame type for .NET to make data exploration easy
The functions in the PowerShell cvtToDF
is a proof of concept allowing you to easily transform Powershell arrays into a DataFrame and then explore.
There are a couple of other functions Out-DataFrame
which formats it to more PowerShell readable output and Add-ToDF
that lets you manipulate data in a column.
Get the PowerShell
Grab the PowerShell to try out here PSDataFrame.
Test data
Here is the sample data, save in both a csv
and Excel
file for testing.
Region | Item | TotalSold | DateSold | Factor |
---|---|---|---|---|
West | drill | 29 | 12/2 | 2.1 |
South | lime | 19 | 12/21 | 2.1 |
West | nail | 57 | 12/23 | 2.1 |
West | melon | 1 | 12/12 | 2.1 |
North | saw | 88 | 12/22 | 2.1 |
South | avocado | 42 | 12/24 | 2.1 |
North | screws | 86 | 12/25 | 2.1 |
West | avocado | 7 | 12/27 | 2.1 |
East | avocado | 83 | 12/29 | 2.1 |
West | drill | 89 | 12/28 | 2.1 |
PowerShell CSV and DataFrames
Here, you dot source the PowerShell script and you can create a DataFrame
from CSV
data using the built-in Import-Csv
PowerShell function ConvertTo-DataFrame (Import-Csv .\testData.csv)
.
. .\cvtToDF.ps1
(ConvertTo-DataFrame (Import-Csv .\testData.csv)).GroupBy("Region").Sum("TotalSold").Sort("Region") | Out-DataFrame
Region TotalSold
------ ---------
East 83
North 174
South 61
West 183
ConvertTo-DataFrame
returns a DataFrame so you can then do things like GroupBy
, Sum
, and Sort
to get these results.
PowerShell Excel and DataFrames
Since we’re using PowerShell, we can pass any PowerShell array containing objects to ConvertTo-DataFrame
.
Here, we’re using Import-Excel
to read a spreadsheet to create the DataFrame.
Note: You can get the PowerShell Excel module from the PowerShell Gallery Install-Module ImportExcel
.
. .\cvtToDF.ps1
(ConvertTo-DataFrame (Import-Excel .\testData.xlsx)).GroupBy("Region").Sum("TotalSold").Sort("Region") | Out-DataFrame
Region TotalSold
------ ---------
East 83
North 174
South 61
West 183
As before, ConvertTo-DataFrame
returns a DataFrame so you can then use the GroupBy
, Sum
, and Sort
methods to get these results.
Perform a Computation
The DataFrame and DataFrameColumn classes expose a number of useful APIs. Add-ToDF
PowerShellizes the Add
method.
. .\cvtToDF.ps1
$df = ConvertTo-DataFrame (Import-Excel .\testData.csv)
Add-ToDF -targetDF $df -ColumnName TotalSold -Value 100
# C# syntax
# $df[TotalSold].Add(100, $false)
# Add-ToDF -targetDF $df -ColumnName TotalSold -Value 100 -Inplace
# C# syntax
# $df[TotalSold].Add(100, $true)
It adds 100 to all the values in the ColumnName
and returns them. If you use the -InPlace
switch, it also updates the values in the DataFrame.
129
119
157
101
188
142
186
107
183
189
Summary
The Microsoft DataFrame is a preview, and ConvertTo-DataFrame
a proof of concept. It’s an excellent playground to make data exploration easy.
Definitely give it a try.
Get the PowerShell
Grab the PowerShell to try out here PSDataFrame.