sqlexcelamazon-redshiftrecursive-queryanalytical

"SQL Error [XX000]: ERROR: Numeric column 2 precision and scales cannot be merged" when coding logic in REdhsift


Such code gives me an error of: "SQL Error [XX000]: ERROR: Numeric column 2 precision and scales cannot be merged"

WITH RECURSIVE build (PERIOD_START_DATE,PEQ,PREV_PEQ,repeated_patient, cur_rn, max_rn) as (
select 
PERIOD_START_DATE,round(PEQ,1), round(cast(PREV_PEQ as float),1), round(cast(repeated_patient as float),1), cur_rn, max_rn from (       
select 
            PERIOD_START_DATE,
            PEQ,
            cast(PREV_PEQ as float),
            cast(repeated_patient as float),
            ROW_NUMBER() OVER (ORDER BY period_start_date) AS rn
            , 2::int as cur_rn
            , count(1) over() as max_rn

from (select 
            PERIOD_START_DATE,
            round(1.0*SU_VALUE /6, 1)  PEQ,
            LAG(ROUND(1.0*SU_VALUE/6,1 ),1) OVER ( ORDER BY PERIOD_START_DATE ) PREV_PEQ, 
            cast(0 as float) as repeated_patient
      FROM testoss 
     ) where PEQ != PREV_PEQ  ) where rn=1
union all  
select  
            b.PERIOD_START_DATE,
            round(b.PEQ, 1),
            round(b.PREV_PEQ, 1),
 round(cast(case 
    when b.PREV_PEQ - nvl(t.repeated_patient,0) > b.PEQ then b.PEQ
    else b.PREV_PEQ - nvl(t.repeated_patient,0) 
  end as float), 1) as repeated_patient,
   b.cur_rn + 1 AS cur_rn,
   b.max_rn
  from build b join 
  (SELECT period_start_date, PEQ, PREV_PEQ, repeated_patient, lag(period_start_date) over(order by period_start_date) prev_period, 
      ROW_NUMBER() OVER (ORDER BY period_start_date) AS rn from (SELECT 
            PERIOD_START_DATE,
            round(1.0*SU_VALUE /6, 1)  PEQ,
            LAG(ROUND(1.0*SU_VALUE/6,1 ),1) OVER (ORDER BY PERIOD_START_DATE ) PREV_PEQ, 
            cast(0 as float) as repeated_patient
      FROM testoss 
     )) t ON t.prev_period = b.period_start_date
   WHERE t.rn = b.cur_rn AND b.cur_rn <= b.max_rn) 
   select   
            PERIOD_START_DATE,
            PEQ,
            PREV_PEQ,
            repeated_patient
            from build;

Given period date (PERIOD_START_DATE) and Standard units values (SU_VALUE) I need to code following requirements:

• Dosage: 6 tablets in month 1,2,13 and 14 of the treatment. No drug in 3rd and 4th year.

• # of Total Patient Equals who have taken the drug in a particular month are calculated by dividing standard units (SU) sold in that month by monthly dosage (considered as 6 tablets (SU))

• New patients taking drug in a particular month are calculated by subtracting the repeated patients number from the above number for that month. Repeated patients for a month are equivalent to new patients for the previous month. ○ We assume 100% compliance of new patients in taking the second dosage and hence are accounted for next 12 months (next step)

• Final Total Patient Equals are calculated by summing up new patients number for recent 12 months i.e. the present month (for which the TPE are calculated) and the previous 11 months.

• Total Patient Equals share is calculated by dividing 'Total Patient Equals calculated using above methodology' by 'Sum of Total Patient Equals of all MS products'

Those should be the results:

enter image description here

Above requirements and formulas in cells are here ->

https://docs.google.com/spreadsheets/d/1xvItLMT8-BcoVjNR-NDIYUTlmTuebP-crD7foSFAwOM/edit?gid=1133570729#gid=1133570729

Here would be the test data:

  CREATE TABLE TESTOSS
   (    PERIOD_START_DATE date, 
    SU_VALUE integer
   );
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-SEP-17','DD-MON-YY'),69);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-OCT-17','DD-MON-YY'),263);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-NOV-17','DD-MON-YY'),684);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-DEC-17','DD-MON-YY'),938);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-JAN-18','DD-MON-YY'),1352);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-FEB-18','DD-MON-YY'),1174);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-MAR-18','DD-MON-YY'),1123);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-APR-18','DD-MON-YY'),1649);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-MAY-18','DD-MON-YY'),1402);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-JUN-18','DD-MON-YY'),1548);
Insert into TESTOSS (PERIOD_START_DATE,SU_VALUE) values (to_date('01-JUL-18','DD-MON-YY'),1448);

Could anybody please help me to resolve?

With Oracle it was resolved using MODEL clause Translate excel formulas into SQL query

WITH METRICS AS
( 
  SELECT 
         PERIOD_START_DATE,
         PEQ,
         PREV_PEQ,
         REPEATED_PATIENT,
         (PEQ - REPEATED_PATIENT) NEW_PATIENT
  FROM
  (  SELECT PERIOD_START_DATE,
            SU_VALUE /6  PEQ,
            LAG (ROUND( SU_VALUE /6),1,0) OVER ( ORDER BY PERIOD_START_DATE ) REV_PEQ,
            0 AS REPEATED_PATIENT
      FROM TESTOSS
      ORDER BY  PERIOD_START_DATE     
   )
   MODEL
      DIMENSION BY (ROW_NUMBER() OVER (ORDER BY PERIOD_START_DATE) RN)
      MEASURES (PRODUCT, PERIOD_START_DATE, PEQ, PREV_PEQ, REPEATED_PATIENT)

   RULES (
          REPEATED_PATIENT [ANY] =
          ( 
               CASE 
                    WHEN PREV_PEQ[CV(RN)]-NVL(REPEATED_PATIENT[CV(RN)-1],0) > PEQ[CV(RN)] THEN PEQ[CV(RN)] 
                    ELSE PREV_PEQ[CV(RN)]-NVL(REPEATED_PATIENT[CV(RN)-1],0) 
                END
            )
        )
)
SELECT 
        PERIOD_START_DATE,
        NEW_PATIENT,
        SUM(NEW_PATIENT) OVER(ORDER BY PERIOD_START_DATE RANGE BETWEEN INTERVAL '11' MONTH PRECEDING AND CURRENT ROW
        ) AS FINAL_PEQ
    FROM METRICS
    ORDER BY PERIOD_START_DATE

But there is no MODEL clause in Redhsift.

Similar case was translated with Redhsift in Translate excel formulas into Redshift SQL query


Solution

  • The error you are getting is due UNIONing values of differing types. "1.0*su_value /6" is of type NUMERIC of uncontrolled scale. You need to take care of your types. I think you want float as you cast to this in other places. This runs:

    WITH recursive build (period_start_date,peq,prev_peq,repeated_patient, cur_rn, max_rn) AS
    (
       SELECT period_start_date,
         round(peq,1),
         round(cast(prev_peq AS FLOAT),1),
         round(cast(repeated_patient AS FLOAT),1),
         cur_rn,
         max_rn
       FROM  (
            SELECT  period_start_date,
               peq,
               cast(prev_peq AS FLOAT),
               cast(repeated_patient AS FLOAT),
               row_number() over (ORDER BY period_start_date) AS rn ,
               2::INT              AS cur_rn ,
               count(1) over()           AS max_rn
            FROM  (
                  SELECT  period_start_date,
                     round(1.0*su_value /6, 1)::float                peq,
                     lag(round(1.0*su_value/6,1 ),1) over ( ORDER BY period_start_date )::float  prev_peq,
                     cast(0 AS FLOAT)                  AS repeated_patient
                  FROM  testoss 
                )
            WHERE  peq != prev_peq 
          )
       WHERE rn=1
       UNION ALL
       SELECT b.period_start_date,
         round(b.peq, 1),
         round(b.prev_peq, 1),
         round(cast(
         CASE
            WHEN b.prev_peq - nvl(t.repeated_patient,0) > b.peq THEN b.peq
            ELSE b.prev_peq - nvl(t.repeated_patient,0)
         END AS FLOAT), 1) AS repeated_patient,
         b.cur_rn + 1  AS cur_rn,
         b.max_rn
       FROM  build b
       join
         (
            SELECT  period_start_date,
               peq,
               prev_peq,
               repeated_patient,
               lag(period_start_date) over(ORDER BY period_start_date)  prev_period,
               row_number() over (ORDER BY period_start_date)    AS rn
            FROM  (
                  SELECT  period_start_date,
                     round(1.0*su_value /6, 1)::float                peq,
                     lag(round(1.0*su_value/6,1 ),1) over (ORDER BY period_start_date )::float  prev_peq,
                     cast(0 AS FLOAT)                  AS repeated_patient
                  FROM  testoss )
          ) t
       ON  t.prev_period = b.period_start_date
       WHERE t.rn = b.cur_rn
       AND  b.cur_rn <= b.max_rn
    )
    SELECT period_start_date,
       peq,
       prev_peq,
       repeated_patient
    FROM  build;
    

    Also please run your test cases before posting. You had syntactical errors.