Im trying to to create a hash with one key per each type of extension on a directory. To every key I would like to add two values: number of times that extension is repeated and total size of all the files with that extension.
Something similar to this:
{".md" => {"ext_reps" => 6, "ext_size_sum" => 2350}, ".txt" => {"ext_reps" => 3, "ext_size_sum" => 1300}}
But I´m stuck on this step:
hash = Hash.new{|hsh,key| hsh[key] = {}}
ext_reps = 0
ext_size_sum = 0
Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
hash[File.extname(file)].store "ext_reps", ext_reps
hash[File.extname(file)].store "ext_size_sum", ext_size_sum
end
p hash
With this result:
{".md" => {"ext_reps" => 0, "ext_size_sum" => 0}, ".txt" => {"ext_reps" => 0, "ext_size_sum" => 0}}
And I can't finde the way to increment ext_reps
and ext_siz_sum
Thanks
Suppose the file name extensions and files sizes drawn are as follows.
files = [{ ext: 'a', size: 10 },
{ ext: 'b', size: 20 },
{ ext: 'a', size: 30 },
{ ext: 'c', size: 40 },
{ ext: 'b', size: 50 },
{ ext: 'a', size: 60 }]
You can use Hash#group_by and Hash#transform_values.
files.group_by { |h| h[:ext] }.
transform_values do |arr|
{ "ext_reps"=>arr.size, "ext_size_sum"=>arr.sum { |h| h[:size] } }
end
#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
# "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
# "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
Note that the first calculation is as follows.
files.group_by { |h| h[:ext] }
#=> {"a"=>[{:ext=>"a", :size=>10}, {:ext=>"a", :size=>30},
# {:ext=>"a", :size=>60}],
# "b"=>[{:ext=>"b", :size=>20}, {:ext=>"b", :size=>50}],
# "c"=>[{:ext=>"c", :size=>40}]}
Another way is use the forms of Hash#update (aka Hash#merge!
) and Hash#merge that employ a block to compute the values of keys that are present in both hashes being merged. (Ruby does not consult that block when a key-value pair with key k
is being merged into the hash being built (h
) when h
does not have a key k
.)
See the docs for an explanation of the three parameters of the block that returns the values of common keys of hashes being merged.
files.each_with_object({}) do |g,h|
h.update(g[:ext]=>{"ext_reps"=>1, "ext_size_sum"=>g[:size]}) do |_k,o,n|
o.merge(n) { |_kk, oo, nn| oo + nn }
end
end
#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
# "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
# "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
I've chosen names for the common keys of the "outer" and "inner" hashes (_k
and _kk
, respectively) that begin with an underscore to signal to the reader that they are not used in the block calculation. This is common practive.
Note that this approach avoids the creation of a temporary hash similar to that created by group_by
and therefore tends to use less memory than the first approach.