I was wondering what would be the best way of specifying a default value when doing an outer-join in cascalog for field that could be null.
(def example-query
(<- [?id ?fname ?lname !days-active]
(users :> ?id ?fname ?lname)
(active :> ?fname ?lname !days-active))
In this example users and active would be previously defined queries and I'm just looking to correlate active user information (?fname ?lname !days-active) and regular user information (?id ?fname ?lname)
So when the join happened if there was no corresponding information for !days-active it would output 0 instead of nil i.e.
392393 john smith 3
003030 jane doe 0
instead of
392393 john smith 3
003030 jane doe null
Updated Example
(<- [!!database-id ?feature !!user-clicks !!engaged-users ?application-id ?active-users]
(app-id-db-id-feature-clicks-engaged :> ?application-id !!database-id ?feature !!user-clicks !!engaged-users )
(user-info :> ?application-id ?feature ?active-users))]
example output would look something roughly like
4234 search null null 222 5000
3232 profile 500 400 331 6000
with the filtering that I'm interested I could change the fields that would be !!engaged-users and !!user-clicks to have 0 instead of null. Would using multiple Or predicates work?
I think what you want to do is add an or
predicate:
(def example-query
(<- [?id ?fname ?lname !days-active]
(users :> ?id ?fname ?lname)
(active :> ?fname ?lname !days-active)
(or !days-active 0 :> ?active-days)))
That's not an outer join, by the way, it's just not filtering out null variables in the !days-active
position.