ruby-on-railsrubyhashchartkick

Merging & Summing nested hashes in Ruby


What I'm trying to do is very similar to the question outlined in this post, but I have one additional problem in that the nested values of my hash need to have their dates grouped and the values of each date summed. The goal is to create a Multiple Series Graph in Chartkick.

The query, grabbing a month range for example:

arr = LineItem.includes(:order, :product)
              .where(orders: {order_date: Date.parse("Jan 1 2020")..Date.parse("Feb 1 2020")})
              .map { |line_item| { name: line_item.product.model_number, data: { line_item.order.order_date.strftime('%a %b %d, %Y') => line_item.order_quantity } } }

The output hash:

 => [
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}}, 
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}}, 
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}}, 
{:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}}, 
{:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}, 
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}}, 
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}}, 
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}}, 
{:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}}, 
{:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}}, 
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}}, 
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}}, 
{:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}}, 
...

My expected hash; which should group the name, then group the date and sum the value:

 => [
{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>7, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>1}}, 
{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>13, "Wed Jan 22, 2020"=>3}}, 
{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5, "Thu Jan 23, 2020"=>4}}, 
...

However, after running this code:

arr.group_by {|h| h[:name]}.map { |k,v| { name: k, data: v.map {|h| h[:data]}.reduce(&:merge)}}

this is the output:

 => [
{:name=>"RP-AP02", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>1, "Wed Jan 22, 2020"=>1}},
{:name=>"RP-AP04", :data=>{"Mon Jan 20, 2020"=>2, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>3}}, 
{:name=>"RP-AP01", :data=>{"Tue Jan 21, 2020"=>5, "Thu Jan 23, 2020"=>3}},
...

The output generated does group the name and data, but does not sum the quantities. I'm grouping it by day here as an example, but would also like the option of grouping it by week & month. In the past 8 hours of monkeying with this, I've also tried using Groupdate to no avail.


Solution

  • There are many ways to obtain the desired return value. Here are two. First I define arr.

    arr = [
      {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}}, 
      {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}}, 
      {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}}, 
      {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}}, 
      {:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}, 
      {:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}}, 
      {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}}, 
      {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}}, 
      {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}}, 
      {:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}}, 
      {:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}}, 
      {:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}}, 
      {:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}}]
    

    The first calculation employs the methods Enumerable#group_by and Hash#transform_values.

    arr.group_by { |h| h[:name] }
       .map do |k,v|
         { name: k,
           data: v.group_by do |h|
                   h[:data].keys.first
                 end.transform_values { |a| a.sum { |h| h[:data].values.first }}
         }
    end
      #=> [{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>7,
                                      "Tue Jan 21, 2020"=>4,
                                      "Wed Jan 22, 2020"=>1}},
           {:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2,
                                      "Tue Jan 21, 2020"=>13,
                                      "Wed Jan 22, 2020"=>3}},
           {:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5,
                                      "Thu Jan 23, 2020"=>4}}]
    

    Note:

    arr.group_by { |h| h[:name] }
      #=> {"FR-GP02"=>[{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
                       {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
                       {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
                       {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
                       {:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}],
           "FR-GP04"=>[{:name=>"FR-GP04", :data=>{"Mon Jan 20, 2020"=>2}},
                       {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>4}},
                       {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>3}},
                       {:name=>"FR-GP04", :data=>{"Tue Jan 21, 2020"=>6}},
                       {:name=>"FR-GP04", :data=>{"Wed Jan 22, 2020"=>3}}],
           "FR-GP01"=>[{:name=>"FR-GP01", :data=>{"Tue Jan 21, 2020"=>5}},
                       {:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>3}},
                       {:name=>"FR-GP01", :data=>{"Thu Jan 23, 2020"=>1}}]}
    

    map's block variables initially equal the following:

    k = "FR-GP02"
    v = [{:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
         {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}},
         {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
         {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}},
         {:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}]
    

    Then the value of :data in the first hash being created is computed as follows:

    f = v.group_by do |h|
          h[:data].keys.first
        end
      #=> {"Mon Jan 20, 2020"=>[
      #      {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>2}},
      #      {:name=>"FR-GP02", :data=>{"Mon Jan 20, 2020"=>5}}],
      #    "Tue Jan 21, 2020"=>[
      #      {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>1}},
      #      {:name=>"FR-GP02", :data=>{"Tue Jan 21, 2020"=>3}}],
      #    "Wed Jan 22, 2020"=>[
      #      {:name=>"FR-GP02", :data=>{"Wed Jan 22, 2020"=>1}}]}
    

    and lastly,

    f.transform_values { |a| a.sum { |h| h[:data].values.first }}
      #=> {"Mon Jan 20, 2020"=>7, "Tue Jan 21, 2020"=>4, "Wed Jan 22, 2020"=>1}
    

    Here is a second way to obtain the desired result.

    arr.each_with_object(Hash.new(0)) do |g,h|
      d, n = g[:data].flatten
      h[[g[:name], d]] += n
    end.group_by { |(name, _),_| name }
       .map do |name,arr|
         { name: name, data: arr.each_with_object({}) { |((_,d),t),h| h[d] = t } }
        end
      #=> (as above)
    

    The steps are as follows.

    s = arr.each_with_object(Hash.new(0)) do |g,h|
      d, n = g[:data].flatten
      h[[g[:name], d]] += n
    end
      #=> {["FR-GP02", "Mon Jan 20, 2020"]=>7,
      #    ["FR-GP02", "Tue Jan 21, 2020"]=>4,
      #    ["FR-GP02", "Wed Jan 22, 2020"]=>1,
      #    ["FR-GP04", "Mon Jan 20, 2020"]=>2,
      #    ["FR-GP04", "Tue Jan 21, 2020"]=>13,
      #    ["FR-GP04", "Wed Jan 22, 2020"]=>3,
      #    ["FR-GP01", "Tue Jan 21, 2020"]=>5,
      #    ["FR-GP01", "Thu Jan 23, 2020"]=>4}
    

    This uses the form of Hash::new that takes an argument called its default value (usually, as here, zero) and no block. If a hash is defined

    h = Hash.new(0)
    

    and--possibly after adding key-value pairs--does not have a key k, h[k] will return the default value. This means that in the expression

    h[[g[:name], d]] += n
    

    if h does not have a key [g[:name], d] the value of h for that key is initialized to zero before n is added. If h does have that key the current value of that key is increased by n.

    Continuing the calculation,

    t = s.group_by { |(name,_),_| name }
      #=> {"FR-GP02"=>[[["FR-GP02", "Mon Jan 20, 2020"], 7],
      #                [["FR-GP02", "Tue Jan 21, 2020"], 4],
      #                [["FR-GP02", "Wed Jan 22, 2020"], 1]],
      #    "FR-GP04"=>[[["FR-GP04", "Mon Jan 20, 2020"], 2],
      #                [["FR-GP04", "Tue Jan 21, 2020"], 13],
      #                [["FR-GP04", "Wed Jan 22, 2020"], 3]],
      #    "FR-GP01"=>[[["FR-GP01", "Tue Jan 21, 2020"], 5],
      #                [["FR-GP01", "Thu Jan 23, 2020"], 4]]}
    

    Lastly,

    t.map do |name,arr|
      { name: name, data: arr.each_with_object({}) { |((_,d),t),h| h[d] = t } }
    end
      #=> (as above)
    

    Here and earlier I've made good use of Ruby's powerful technique called Array decomposition. See also this article.