I have tried without success to generate using Rails query methods a list of all active providers with a count of active associated studies. Even providers with zero studies should be reported. My efforts with Rails query methods do not return the providers without associated active studies. Example:
result = Provider.left_joins(:studies)
.where("studies.active = true AND providers.active = true")
.group("providers.id")
.select("providers.id AS 'prov_id', COUNT('studies.id') AS 'study_count'")
.order("study_count ASC")
result.map { |row| [ row["prov_id"], row["study_count"] ] }
returns: [ [4,1], [1,2] ]
I can achieve my goal with raw SQL. I suspect this is more efficient than using functionally equivalent Rails query methods. However, I cannot test that as I do not have the functionally equivalent Rails code. Suggestions?
This works with sqlite3 and should work with postgresql and mysql:
result = ActiveRecord::Base.connection.execute <<-SQL
SELECT "providers"."id" AS 'prov_id',
COUNT(sid) AS 'study_count'
FROM "providers"
LEFT OUTER JOIN (SELECT "studies"."id" AS "sid",
"studies"."provider_id" AS "spid"
FROM "studies"
WHERE (studies.active = true))
ON "spid" = "providers"."id"
WHERE (providers.active = true)
GROUP BY "providers"."id"
ORDER BY study_count ASC
SQL
result.map { |row| [ row["prov_id"], row["study_count"] ] }
The SQL generates an array of [prov_id, study_count]
ordered by study_count:
[ [3,0], [5,0], [6,0], [4,1], [1,2] ]
Notice that providers without any associated studies are shown.
The models & schema:
class Provider < ApplicationRecord
has_many :studies
end
class Study < ApplicationRecord
belongs_to :provider
end
schema.rb:
create_table "providers", force: :cascade do |t|
t.string "name", null: false
t.boolean "active", default: true, null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
create_table "studies", force: :cascade do |t|
t.integer "provider_id", null: false
t.boolean "active", default: true, null: false
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.index ["provider_id"], name: "index_studies_on_provider_id"
end
Tables
providers: studies:
| id | active | | id | provider_id | active |
| 1 | true | | 1 | 4 | false |
| 2 | false | | 2 | 4 | true |
| 3 | true | | 3 | 1 | true |
| 4 | true | | 4 | 1 | true |
| 5 | true | | 5 | 2 | true |
| 6 | true | | 6 | 1 | false |
Expected results
[ [3,0], [5,0], [6,0], [4,1], [1,2] ]
While we could take the obtuse approach of recreating your second raw SQL query, with the subquery, in rails based on your request this is entirely unnecessary. (Although I have provided those solutions below as well)
The issue with your first query is the inclusion of studies.active = true
without addressing NULL values.
A LEFT JOIN with a WHERE condition that does not allow for NULL is essentially an INNER JOIN.
To solve this you can simply allow for NULLS (I also cleaned up the implementation a bit to avoid all of the Strings you were using, further "cleanup" is shown in the subsequent code blocks)
Provider.left_joins(:studies)
.where(active: true, studies: {active: [true,nil]})
.group(:id)
.select("providers.id AS 'prov_id', COUNT('studies.id') AS 'study_count'")
.order("study_count ASC")
result.map { |row| [ row["prov_id"], row["study_count"] ] }
Additionally, the subsequent step of mapping can be cut out of the process by using the aggregation methods already built into rails (e.g. #count
)
For Example this will return a Hash
of {provider.id => study_count}
easily convertible to an Array
via #to_a
Provider.left_joins(:studies)
.where(active: true, studies: {active: [true,nil]})
.group(:id)
.order(Study.arel_table[:id].count.asc)
.count(Study.arel_table[:id])
Alternately you can skip the Hash
step using #pluck
.
Provider.left_joins(:studies)
.where(active: true, studies: {active: [true,nil]})
.group(:id)
.order(Study.arel_table[:id].count.asc)
.pluck(:id, Study.arel_table[:id].count)
Alternate options
These options are lower level access via Arel
, the underlying query assembler for rails, but they provide a similar approach to your current working query.
We can accomplish the desired query by assembling a subquery and an OUTER JOIN as follows:
studies_table = Study.arel_table
studies_table
.project(studies_table[:provider_id], studies_table[:id])
.where(studies_table[:active].eq(true))
.as('studies')
studies_join = Arel::Nodes::OuterJoin.new(studies_table,
studies_table.create_on(
studies_table[:provider_id].eq(Provider.arel_table[:id])))
Although we can also just add an additional condition to the join like so (which is a bit cleaner)
studies_table = Study.arel_table
studies_join = Arel::Nodes::OuterJoin.new(studies_table,
studies_table.create_on(
studies_table[:provider_id].eq(Provider.arel_table[:id]).and(
studies_table[:active].eq(true))
))
Then we simply pass that to the joins method
Provider
.joins(studies_join)
.where(active: true)
.group(:id)
.order("study_count ASC")
.pluck(:id,studies_table[:id].count.as('study_count'))
This will produce
SELECT providers.id,
COUNT(studies.id) AS 'study_count'
FROM providers
LEFT OUTER JOIN studies ON studies.provider_id = providers.id
AND studies.active = true
WHERE (providers.active = true)
GROUP BY providers.id
ORDER BY study_count ASC