sqlamazon-redshiftapache-superset

Column reference "id" is ambiguous while filtering in a chart


I created a dataset in Superset SQL Lab by joining 2 tables on session_id.

SELECT *
FROM t1.performance as perf
INNER JOIN t1.test_session as session
ON perf.session_id = session.session_id

Now, to filter the these values in a chart, we regularly select a column name and value for that column. Doing this for the session_id column fails with:

Error: column reference "session_id" is ambiguous.

I tried to write custom query, using the psuedo column name created by Superset, "session_id__1", this fails with error:

column "session_id__1" does not exist in virtual_table, use the column name with table name ("perf.session_id").

But nothing seems to work.


Solution

  • Using select * is bad practice in SQL - always select the specific columns you require.

    This is the cause of the issue here, because * includes all the columns from performance and all the columns from test_session, including therefore two copies of session_id and your report tool has no way to tell them apart.

    Select it once and problem solved e.g.

    SELECT
      -- Common column only needs to come from one table, and doesn't require an alias
      perf.session_id
      -- Ambiguous column name needs to be aliased
      , perf.Col1 AS pCol1
      -- Unique column name - can remain unchanged
      , perf.Col2
      -- Ambiguous column name needs to be aliased
      , session.Col1 AS sCol1
      -- Unique column name - can remain unchanged
      , session.Col3
    FROM t1.performance AS perf
    INNER JOIN t1.test_session AS session
        ON perf.session_id = session.session_id;
    

    Personally I would use shorter aliases for conciseness e.g.

    SELECT
      p.session_id
      , p.Col1 AS pCol1
      , p.Col2
      , s.Col1 AS sCol1
      , s.Col3
    FROM t1.performance p
    INNER JOIN t1.test_session s
      ON p.session_id = s.session_id;