sqlapache-sparkapache-phoenix

Apache Phoenix SQL Join Limitation when using sub-queries


I have this query in Apache Phoenix SQL:

select WO.* from (
               select "nr_id", "txt_commrcial_label"
                 from "e_application" APP
                where "txt_commrcial_label" in ('a','b')  
                  and "nr_id" not in (select "nr_ap_id"
                                        from "e_workorder"
                                       where "nr_id" in ('888')) 
                  and "epochtimestampchanged" = (select max("epochtimestampchanged")
                                                   from "e_application" 
                                                  where "nr_id" = APP."nr_id") ) as APP2,

--

       (select Y.ID as WO_ID, Y."nr_id" as WO_nr_id, Y."nr_ap_id" as WO_nr_ap_id
          from ( select "nr_id", max("epochtimestampchanged") as max_epochtimestampchanged
                   from "e_workorder"
                  where CAST(TO_NUMBER("epochtimestampchanged") AS TIMESTAMP) < TO_TIMESTAMP('2020-10-21 19:22:20.0') 
               group by "nr_id" ) as X, "e_workorder" as Y
         where Y."nr_id" = X."nr_id"
           and Y."epochtimestampchanged" < X.max_epochtimestampchanged ) as WO

--

where APP2."nr_id" = WO.WO_nr_ap_id;

I get java language illegal ... blurb for this not overly complex statement. But I cannot see the reason here or in the manuals.

The individual queries work (imagine the ( and , are not there), but no joy when these 2 sub-queries merged to a JOIN.

Do I need to persist the results to tables and then JOIN? Or is there way around this? I have the impression this is too complex in terms of sub-queries.


Solution

  • For others to note, this is a big and a different SQL Approach is needed as per below which is a work-around with note from Cloudera:

    The best workaround is to explicitly define a join in the APP2 query. See the APP_MAX_TIMESTAMP table joined with the APP table, defining basically the same condition as in the original query (but using a table join instead of an inner select):

    The query that should work and should do the same as the original query:

    select
       WO.*
    from
       (
          select
             "nr_id",
             "txt_commrcial_label"
          from
             "e_application" APP
             LEFT JOIN (
                select
                   max("epochtimestampchanged") as max_app_timestamp,
                   "nr_id" as max_app_timestamp_nr_id
                from
                   "e_application"
                group by "nr_id"
             ) APP_MAX_TIMESTAMP
             ON APP_MAX_TIMESTAMP.max_app_timestamp_nr_id = APP."nr_id"
          where
             "txt_commrcial_label" in
             ( list
             )
             and "nr_id" not in
             (
                select
                   "nr_ap_id"
                from
                   "e_workorder"
                where
                   "nr_id" in
                   (
                      '888'
                   )
             )
             and "epochtimestampchanged" = max_app_timestamp
       )
       as APP2,
       (
          select
             Y.ID as WO_ID,
             Y."nr_id" as WO_nr_id,
             Y."nr_ap_id" as WO_nr_ap_id
          from
             (
                select
                   "nr_id",
                   max("epochtimestampchanged") as max_epochtimestampchanged
                from
                   "e_workorder"
                where
                   CAST(TO_NUMBER("epochtimestampchanged") AS TIMESTAMP) < TO_TIMESTAMP('2022-10-10 19:22:20.0')
                group by
                   "nr_id"
             )
             as X,
             "e_workorder" as Y
          where
             Y."nr_id" = X."nr_id"
             and Y."epochtimestampchanged" < X.max_epochtimestampchanged
       )
       as WO
    where
       APP2."nr_id" = WO.WO_nr_ap_id;