sqlruby-on-rails

What is the Rails ActiveRecord code to generate this 'raw' SQL or it's functional equivalent?


I have tried without success to generate using Rails query methods a list of all active providers with a count of active associated studies. Even providers with zero studies should be reported. My efforts with Rails query methods do not return the providers without associated active studies. Example:

result = Provider.left_joins(:studies)
 .where("studies.active = true AND providers.active = true")
 .group("providers.id")
 .select("providers.id AS 'prov_id', COUNT('studies.id') AS 'study_count'")
 .order("study_count ASC")
result.map { |row| [ row["prov_id"], row["study_count"] ] }

returns: [ [4,1], [1,2] ]

I can achieve my goal with raw SQL. I suspect this is more efficient than using functionally equivalent Rails query methods. However, I cannot test that as I do not have the functionally equivalent Rails code. Suggestions?

This works with sqlite3 and should work with postgresql and mysql:

result = ActiveRecord::Base.connection.execute <<-SQL
  SELECT "providers"."id" AS 'prov_id',
          COUNT(sid) AS 'study_count'
  FROM "providers"
  LEFT OUTER JOIN (SELECT "studies"."id" AS "sid",
                          "studies"."provider_id" AS "spid"
                  FROM "studies"
                  WHERE (studies.active = true))
  ON "spid" = "providers"."id"
  WHERE (providers.active = true)
  GROUP BY "providers"."id"
  ORDER BY study_count ASC
SQL
result.map { |row| [ row["prov_id"], row["study_count"] ] }

The SQL generates an array of [prov_id, study_count] ordered by study_count:

[ [3,0], [5,0], [6,0], [4,1], [1,2] ]

Notice that providers without any associated studies are shown.

The models & schema:

class Provider < ApplicationRecord
  has_many :studies
end

class Study < ApplicationRecord
  belongs_to :provider
end

schema.rb:
  create_table "providers", force: :cascade do |t|
    t.string "name", null: false
    t.boolean "active", default: true, null: false
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

  create_table "studies", force: :cascade do |t|
    t.integer "provider_id", null: false
    t.boolean "active", default: true, null: false
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["provider_id"], name: "index_studies_on_provider_id"
  end

Tables

providers:              studies:  
| id | active |         | id | provider_id | active |
|  1 |  true  |         |  1 |      4      |  false |
|  2 |  false |         |  2 |      4      |  true  |
|  3 |  true  |         |  3 |      1      |  true  |
|  4 |  true  |         |  4 |      1      |  true  |
|  5 |  true  |         |  5 |      2      |  true  |
|  6 |  true  |         |  6 |      1      |  false |

Expected results

[ [3,0], [5,0], [6,0], [4,1], [1,2] ]


Solution

  • While we could take the obtuse approach of recreating your second raw SQL query, with the subquery, in rails based on your request this is entirely unnecessary. (Although I have provided those solutions below as well)

    The issue with your first query is the inclusion of studies.active = true without addressing NULL values.

    A LEFT JOIN with a WHERE condition that does not allow for NULL is essentially an INNER JOIN.

    To solve this you can simply allow for NULLS (I also cleaned up the implementation a bit to avoid all of the Strings you were using, further "cleanup" is shown in the subsequent code blocks)

    Provider.left_joins(:studies)
     .where(active: true, studies: {active: [true,nil]})
     .group(:id)
     .select("providers.id AS 'prov_id', COUNT('studies.id') AS 'study_count'")
     .order("study_count ASC")
    result.map { |row| [ row["prov_id"], row["study_count"] ] }
    

    Additionally, the subsequent step of mapping can be cut out of the process by using the aggregation methods already built into rails (e.g. #count)

    For Example this will return a Hash of {provider.id => study_count} easily convertible to an Array via #to_a

    Provider.left_joins(:studies)
     .where(active: true, studies: {active: [true,nil]})
     .group(:id)
     .order(Study.arel_table[:id].count.asc)
     .count(Study.arel_table[:id])
    

    Alternately you can skip the Hash step using #pluck.

    Provider.left_joins(:studies)
     .where(active: true, studies: {active: [true,nil]})
     .group(:id)
     .order(Study.arel_table[:id].count.asc)
     .pluck(:id, Study.arel_table[:id].count)
    

    Alternate options

    These options are lower level access via Arel, the underlying query assembler for rails, but they provide a similar approach to your current working query.

    We can accomplish the desired query by assembling a subquery and an OUTER JOIN as follows:

    studies_table = Study.arel_table
    studies_table
      .project(studies_table[:provider_id], studies_table[:id])
      .where(studies_table[:active].eq(true))
      .as('studies') 
    
    studies_join = Arel::Nodes::OuterJoin.new(studies_table, 
               studies_table.create_on(
                 studies_table[:provider_id].eq(Provider.arel_table[:id])))
    

    Although we can also just add an additional condition to the join like so (which is a bit cleaner)

    studies_table = Study.arel_table
    
    studies_join = Arel::Nodes::OuterJoin.new(studies_table, 
               studies_table.create_on(
                 studies_table[:provider_id].eq(Provider.arel_table[:id]).and(
                   studies_table[:active].eq(true))
           ))
    

    Then we simply pass that to the joins method

    Provider
      .joins(studies_join)
       .where(active: true)
       .group(:id)
       .order("study_count ASC")
       .pluck(:id,studies_table[:id].count.as('study_count'))
    

    This will produce

    SELECT providers.id,
            COUNT(studies.id) AS 'study_count'
    FROM providers
    LEFT OUTER JOIN studies ON studies.provider_id = providers.id 
      AND studies.active = true
    WHERE (providers.active = true)
    GROUP BY providers.id
    ORDER BY study_count ASC