I'm stuck in the following situation:
I have to get information out of a CSV file. I imported the CSV using Import-Csv
.
My raw data looks like this:
45227;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXX;XXXX;XXX@XX.com;;;3.7;;
where the column containing 3.7
is the value of interest ("Points").
Here comes my first problem --> Using Import-Csv
, powershell will save this information in a [string]
property. To avoid that i used the following line:
| Select @{Name="Points";Expression={[decimal]$_.Points}}
Now i'm getting a Selected.System.Management.Automation.PSCustomObject
-typed object containing that property as a [decimal]
. Now i wanted to sum up all the points, that were used by the same e-mail address:
$Data[$Index].Points += (
$Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} |
measure Points -sum
).Sum
This seemed to work just fine, but if i open up $Data[$Index] | gm
i'm getting this: Points NoteProperty double Points=71301.6000000006
The property changed to [double]
. I dug a bit and i found out that Powershell's GenericMeasureInfo.Sum
Property can only give back a Nullable<Double>
instance as a property value.
It seems like i'm producing an overflow of [double]
, because the number being displayed is totally wrong. I want to stick to decimal or integer so i have an output like 71123.4
or something like that.
Is there any other approach for that, so i don't have to use (Measure-Object -sum).Sum
?
Thanks in advance!
tl;dr:
If you need to control the specific numeric data type used for summing up numbers:
Avoid Measure-Object
, which invariably uses [double]
calculations.
Instead, use the LINQ Sum
method (accessible in PSv3+) with a cast to the desired numeric type:
[Linq.Enumerable]::Sum(
[decimal[]] @(
$Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
).Points
)
Mathias R. Jessen's helpful answer shows you an elegant way to sum your Points
column grouped by rows that share the same email address and Theo's helpful answer improves on it by truly summing the points as [decimal]
values.
Some general points about Measure-Object
with -Sum
and floating-point data types:
You correctly state:
The property [data type] changed to
double
[...] i found out that Powershell'sGenericMeasureInfo.Sum
property can only give back aNullable<Double>
as property value.
Indeed: Measure-Object -Sum
:
[double]
values to sum up the inputs.[double]
s, if possible - even if they're not numbers.
[double]
(e.g., 'foo'
), an non-terminating error is emitted, but summing continues with any remaining inputs.The above implies that even strings are acceptable input to Measure-Object -Sum
, because they'll be converted to [double]
on demand during summation.
That means that you could use your Import-Csv
command directly, as in the following example (which uses two [pscustomobject]
instances to simulate Import-Csv
's output):
PS> ([pscustomobject] @{ Points = '3.7' }, [pscustomobject] @{ Points = '1.2' } |
Measure-Object Points -Sum).Sum
4.9 # .Points property values were summed correctly.
71301.6000000006
[...] It seems like i'm producing an overflow of "double"
Overflow would imply exceeding the maximum value that can be stored in a [double]
, which is (a) unlikely ([double]::MaxValue
is 1.79769313486232E+308
, i.e., greater than 10 to the power of 308) and (b) would produce a different symptom; e.g.:
PS> ([double]::MaxValue, [double]::MaxValue | Measure-Object -Sum).Sum
ā # represents positive infinity
What you do get, however, is rounding errors due to the [double]
type's internal binary representation, which doesn't always have an exact decimal representation, which can lead to baffling calculation results; e.g.:
PS> 1.3 - 1.1 -eq 0.2
False # !! With [double]s, 1.3 - 1.1 is NOT exactly equal to 0.2
For more information, see https://floating-point-gui.de/
Using [decimal]
values does solve this problem, but note that this comes at the expense of a smaller range (in effect, you get 28 decimal digits of precision - the absolute value of the max. number depends on where the decimal point is placed; as an integer, it is 79,228,162,514,264,337,593,543,950,335
, i.e. close to 8 * 1028).
If you do need the precision of [decimal]
s, you must avoid Measure-Object
and do your own summing.
In the context of your original command, you could use the Sum
LINQ method:
[Linq.Enumerable]::Sum(
[decimal[]] @(
$Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
).Points
)
The use of @(...)
(the array subexpression operator) rather than just (...)
around the pipeline command ensures that the overall command doesn't fail in case the pipeline happens to return no rows. @(...)
turns the non-output into an empty array, for which .Sum()
correctly returns 0
.
[decimal[]]
cast would result in $null
, and PowerShell wouldn't be able to find the [decimal[]]
-typed overload of the .Sum()
method and report an error, "Multiple ambiguous overloads found for "Sum" and the argument count: 1".The above command invariably requires all matching CSV rows (represented as custom objects) into memory as a whole, whereas Measure-Object
- as most cmdlets in the PowerShell pipeline - would process them one by one, which requires only a constant amount of memory (but is slower).
If loading all matching rows into memory at once is not an option, use the ForEach-Object
(foreach
) cmdlet, but note that this would only make sense if you substituted an actual Import-Csv
call for the already-in-memory array $Imported_Csv
:
# Replace $Imported_Csv with the original Import-Csv call to
# get memory-friendly one-by-one processing.
$Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} |
foreach -Begin { [decimal] $sum = 0 } -Process { $sum += $_.Points } -End { $sum }