powershellcsvsummeasure-object

Alternatives to (Measure-Object -sum).Sum


I'm stuck in the following situation: I have to get information out of a CSV file. I imported the CSV using Import-Csv.

My raw data looks like this:

45227;01.10.2018 03:24:00;Xxxx Xxxx Xxxxx x XX xxxxxxxxxxxxxx Xxxxx xxx Xxxxxxxxxxxxxxxxxxx;;3;XXXX;XXXX;XXX@XX.com;;;3.7;;

where the column containing 3.7 is the value of interest ("Points").

Here comes my first problem --> Using Import-Csv, powershell will save this information in a [string] property. To avoid that i used the following line:

| Select @{Name="Points";Expression={[decimal]$_.Points}}

Now i'm getting a Selected.System.Management.Automation.PSCustomObject-typed object containing that property as a [decimal]. Now i wanted to sum up all the points, that were used by the same e-mail address:

$Data[$Index].Points += (
  $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} | 
    measure Points -sum
).Sum

This seemed to work just fine, but if i open up $Data[$Index] | gm i'm getting this: Points NoteProperty double Points=71301.6000000006

The property changed to [double]. I dug a bit and i found out that Powershell's GenericMeasureInfo.Sum Property can only give back a Nullable<Double> instance as a property value.

It seems like i'm producing an overflow of [double], because the number being displayed is totally wrong. I want to stick to decimal or integer so i have an output like 71123.4 or something like that.

Is there any other approach for that, so i don't have to use (Measure-Object -sum).Sum ?

Thanks in advance!


Solution

  • tl;dr:

    If you need to control the specific numeric data type used for summing up numbers:

    [Linq.Enumerable]::Sum(
      [decimal[]] @(
        $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
      ).Points
    )
    

    Mathias R. Jessen's helpful answer shows you an elegant way to sum your Points column grouped by rows that share the same email address and Theo's helpful answer improves on it by truly summing the points as [decimal] values.

    Some general points about Measure-Object with -Sum and floating-point data types:

    You correctly state:

    The property [data type] changed to double [...] i found out that Powershell's GenericMeasureInfo.Sum property can only give back a Nullable<Double> as property value.

    Indeed: Measure-Object -Sum:

    The above implies that even strings are acceptable input to Measure-Object -Sum, because they'll be converted to [double] on demand during summation. That means that you could use your Import-Csv command directly, as in the following example (which uses two [pscustomobject] instances to simulate Import-Csv's output):

    PS> ([pscustomobject] @{ Points = '3.7' }, [pscustomobject] @{ Points = '1.2' } |
          Measure-Object Points -Sum).Sum
    4.9  # .Points property values were summed correctly.
    

    71301.6000000006 [...] It seems like i'm producing an overflow of "double"

    Overflow would imply exceeding the maximum value that can be stored in a [double], which is (a) unlikely ([double]::MaxValue is 1.79769313486232E+308, i.e., greater than 10 to the power of 308) and (b) would produce a different symptom; e.g.:

    PS> ([double]::MaxValue, [double]::MaxValue | Measure-Object -Sum).Sum
    āˆž  # represents positive infinity
    

    What you do get, however, is rounding errors due to the [double] type's internal binary representation, which doesn't always have an exact decimal representation, which can lead to baffling calculation results; e.g.:

    PS> 1.3 - 1.1 -eq 0.2
    False # !! With [double]s, 1.3 - 1.1 is NOT exactly equal to 0.2
    

    For more information, see https://floating-point-gui.de/

    Using [decimal] values does solve this problem, but note that this comes at the expense of a smaller range (in effect, you get 28 decimal digits of precision - the absolute value of the max. number depends on where the decimal point is placed; as an integer, it is 79,228,162,514,264,337,593,543,950,335, i.e. close to 8 * 1028).

    If you do need the precision of [decimal]s, you must avoid Measure-Object and do your own summing.

    In the context of your original command, you could use the Sum LINQ method:

    [Linq.Enumerable]::Sum(
      [decimal[]] @(
        $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender}
      ).Points
    )
    

    If loading all matching rows into memory at once is not an option, use the ForEach-Object (foreach) cmdlet, but note that this would only make sense if you substituted an actual Import-Csv call for the already-in-memory array $Imported_Csv:

    # Replace $Imported_Csv with the original Import-Csv call to 
    # get memory-friendly one-by-one processing.
    $Imported_CSV | where {$_.Sender -eq $Imported_CSV_Unique.Sender} |
      foreach -Begin { [decimal] $sum = 0 } -Process { $sum += $_.Points } -End { $sum }