clojurecascadingcascalog

Supplying a default value for left outer joins


I was wondering what would be the best way of specifying a default value when doing an outer-join in cascalog for field that could be null.

 (def example-query
    (<- [?id ?fname ?lname !days-active]
        (users :> ?id ?fname ?lname)
        (active :> ?fname ?lname !days-active))

In this example users and active would be previously defined queries and I'm just looking to correlate active user information (?fname ?lname !days-active) and regular user information (?id ?fname ?lname)

So when the join happened if there was no corresponding information for !days-active it would output 0 instead of nil i.e.

392393 john smith 3
003030 jane doe 0

instead of

392393 john smith 3
003030 jane doe null

Updated Example

(<- [!!database-id ?feature !!user-clicks !!engaged-users ?application-id ?active-users]
                     (app-id-db-id-feature-clicks-engaged :> ?application-id !!database-id ?feature !!user-clicks !!engaged-users )
                     (user-info :> ?application-id ?feature ?active-users))]

example output would look something roughly like

4234 search null null 222 5000
3232 profile 500 400 331 6000

with the filtering that I'm interested I could change the fields that would be !!engaged-users and !!user-clicks to have 0 instead of null. Would using multiple Or predicates work?


Solution

  • I think what you want to do is add an or predicate:

    (def example-query
       (<- [?id ?fname ?lname !days-active]
           (users :> ?id ?fname ?lname)
           (active :> ?fname ?lname !days-active)
           (or !days-active 0 :> ?active-days)))
    

    That's not an outer join, by the way, it's just not filtering out null variables in the !days-active position.