sqldatabricks-sql

UNSUPPORTED_FEATURE.SET_OPERATION_ON_MAP_TYPE The feature is not supported: Cannot have MAP type columns which calls set operations (INTERSECT, E


How can I fix the following error? I am getting error when running query in DataBrick:

[UNSUPPORTED_FEATURE.SET_OPERATION_ON_MAP_TYPE] The feature is not supported: Cannot have MAP type columns in DataFrame which calls set operations (INTERSECT, EXCEPT, etc.), but the type of column ROptions is "MAP<STRING, STRUCT<ReplyText: STRING, TBFlag: INT, IgFlag: INT, InvalidFlag: INT, RScore: DECIMAL(16,13)>>". SQLSTATE: 0A000

    SELECT
    distinct  af.Name AS D_Name
    , af.Q_ID 
    , map_from_entries(collect_list((CAST(LTRIM(RTRIM(lower(ers.res.reply))) AS STRING)
                      ,named_struct('ReplyText', CASE WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '1' THEN TRIM('VERY POOR')
                      WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND ers.res.reply = '2' THEN TRIM('POOR')
                      WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND ers.res.reply = '3' THEN TRIM('FAIR')
                      WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND ers.res.reply = '4' THEN TRIM('GOOD')
                      WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND ers.res.reply = '5' THEN TRIM('VERY GOOD')
                      WHEN af.Q_TYPE = 'L' THEN COALESCE(Lower(qrd.LABEL), CAST(LTRIM(RTRIM(lower(ers.res.reply))) AS STRING))
                      ELSE LTRIM(RTRIM(ers.res.reply) END
                        ,'TBFlag', CASE WHEN ers.res.reply BETWEEN CAST(af.R_MIN AS STRING) AND CAST(af.R_MAX AS STRING) THEN 1
                        WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND af.C_Q_FLAG = 'Y' AND CAST(ers.res.reply AS INT) = 5 THEN 1
                        WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' AND af.C_Q_FLAG = 'N' AND CAST(ers.res.reply AS INT) = 5 THEN 1
                        ELSE 0 END
                        ,'IgFlag', CASE WHEN sc.S_IGNORE_FLAG = 'Y' THEN 1 ELSE 0 END
                        ,'InvalidFlag', CASE WHEN af.SurveyItemPGAnalyticFlag = 1 THEN CASE WHEN ers.res.reply not between 1 and 5 THEN 1 ELSE 0 END
                        WHEN af.C_Flag = 1 THEN CASE WHEN qrd.CHOICE IS NULL THEN 1 ELSE 0 END 
                        ELSE 0 END
                        ,'RScore', CASE WHEN af.SCALE_ID IS NOT NULL AND sc.PRECISION IS NOT NULL THEN CAST(ers.res.reply AS DECIMAL(16, 13))
                        WHEN af.Q_TYPE = 'A' AND af.D_TYPE = 'L' THEN CAST(ROUND((CAST(ers.res.reply AS INT) - 1) * 100 / 4) AS DECIMAL(16, 13))
                        ELSE NULL END)))) AS ROptions
    FROM CTE ers
    INNER JOIN Fact af ON ers.D_Name = af.Name AND ers.res.Q_ID = af.Q_ID
    LEFT OUTER JOIN Range qrd ON af.DB_Name = qrd.D_Name AND af.Q_ID = qrd.Q_ID AND COALESCE(ers.res.reply,'') = CAST(qrd.CHOICE AS STRING)
    LEFT OUTER JOIN Sect sc ON af.Name = sc.D_Name AND COALESCE(af.SCALE_ID, 0) = COALESCE(sc.SCALE_ID, 0) AND COALESCE(ers.res.reply, '') = CAST(sc.SCALE_LABEL_ID AS STRING)
    GROUP BY af.ame, af.Q_ID

Solution

  • Do not apply distinct directly to Map Type columns; instead, use the logic below.

    First, create your map column in a subquery, and then write another query on top of it using sort_array(map_entries(details)) to convert your map column to sorted map entries. Next, apply distinct on it, and then write another query on top of it to convert the map entries back to a map.

    Example :

    # +----+--------+
    # |name| details|
    # +----+--------+
    # |   n|{m -> 1}|
    # |   n|{m -> 1}|
    # +----+--------+
    
    SELECT name
          ,map_from_entries(details_entries) as details
    FROM ( 
          SELECT DISTINCT name
                         ,sort_array(map_entries(details)) as details_entries
           FROM data_tbl)
    

    Ref : Spark SQL Select Distinct record from hive map data type column

    If you don't want to apply distinct on the map column, you can use the query below to get unique records.

    WITH BaseData AS (
        SELECT
            af.Name AS D_Name,
            af.Q_ID,
            map_from_entries(collect_list(
                (
                    CAST(LTRIM(RTRIM(lower(ers.res.reply))) AS STRING),
                    named_struct(
                        'ReplyText',
                        CASE 
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '1' THEN TRIM('VERY POOR')
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '2' THEN TRIM('POOR')
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '3' THEN TRIM('FAIR')
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '4' THEN TRIM('GOOD')
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND ers.res.reply = '5' THEN TRIM('VERY GOOD')
                            WHEN af.Q_TYPE = 'L' THEN COALESCE(Lower(qrd.LABEL), CAST(LTRIM(RTRIM(lower(ers.res.reply))) AS STRING))
                            ELSE LTRIM(RTRIM(ers.res.reply)) 
                        END,
                        'TBFlag', 
                        CASE 
                            WHEN ers.res.reply BETWEEN CAST(af.R_MIN AS STRING) AND CAST(af.R_MAX AS STRING) THEN 1
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND af.C_Q_FLAG = 'Y' AND CAST(ers.res.reply AS INT) = 5 THEN 1
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' AND af.C_Q_FLAG = 'N' AND CAST(ers.res.reply AS INT) = 5 THEN 1
                            ELSE 0 
                        END,
                        'IgFlag', 
                        CASE 
                            WHEN sc.S_IGNORE_FLAG = 'Y' THEN 1 
                            ELSE 0 
                        END,
                        'InvalidFlag', 
                        CASE 
                            WHEN af.SurveyItemPGAnalyticFlag = 1 THEN 
                                CASE 
                                    WHEN ers.res.reply NOT BETWEEN 1 AND 5 THEN 1 
                                    ELSE 0 
                                END
                            WHEN af.C_Flag = 1 THEN 
                                CASE 
                                    WHEN qrd.CHOICE IS NULL THEN 1 
                                    ELSE 0 
                                END 
                            ELSE 0 
                        END,
                        'RScore', 
                        CASE 
                            WHEN af.SCALE_ID IS NOT NULL AND sc.PRECISION IS NOT NULL THEN CAST(ers.res.reply AS DECIMAL(16, 13))
                            WHEN af.Q_TYPE = 'A' AND af.DATA_TYPE = 'L' THEN CAST(ROUND((CAST(ers.res.reply AS INT) - 1) * 100 / 4) AS DECIMAL(16, 13))
                            ELSE NULL 
                        END
                    )
                )
            )) AS ROptions,
            ROW_NUMBER() OVER (PARTITION BY af.Name, af.Q_ID ORDER BY af.Name, af.Q_ID) AS rn
        FROM CTE ers
        INNER JOIN Fact af ON ers.D_Name = af.Name AND ers.res.Q_ID = af.Q_ID
        LEFT OUTER JOIN Range qrd ON af.DB_Name = qrd.D_Name AND af.Q_ID = qrd.Q_ID AND COALESCE(ers.res.reply, '') = CAST(qrd.CHOICE AS STRING)
        LEFT OUTER JOIN Sect sc ON af.Name = sc.D_Name AND COALESCE(af.SCALE_ID, 0) = COALESCE(sc.SCALE_ID, 0) AND COALESCE(ers.res.reply, '') = CAST(sc.SCALE_LABEL_ID AS STRING)
        GROUP BY af.Name, af.Q_ID
    )
    SELECT D_Name, Q_ID, ROptions
    FROM BaseData
    WHERE rn = 1