sql-servertableau-desktop

How to inset dummy data in a SQL Server SELECT statement based on table columns


I use a BI tool (Tableau) that has visual problems for my clients when certain filters are applied. Reason why is that some categorical values disappear if there is no data for them under these filters selection. The client needs that all combinations are present at all times.

In order to avoid I am trying to inject dummy values that prevents the filtering from happening:

SELECT * FROM TABLE_A
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'Peter' AS THIRD_COLUMN
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'Michael' AS THIRD_COLUMN
UNION
SELECT Null AS FIRST_COLUMN, Null AS SECOND_COLUMN, 'James' AS THIRD_COLUMN
...

In my real case I have more than 50 columns and from time to time more columns are going to be added.

I was wondering if there is a way in that I can set the values for the columns I am interested in and Null for everything else without the need to define every single column like in the example I provide.

I only have read permissions on the table


Solution

  • I solved my problem using Python to achieve it (detailed below):

    First let me detail the problem a little more with an example table

    Student Sport Score
    Mark Football 50
    Mark Golf 20
    Mark Basketball 70
    Peter Golf 30
    Peter Basketball 50

    In Tableau if you filter by sport and select Football will make Peter dissapear from the map since there is no row in the datasource combining Peter and Football, so resulting table after filtering will be this:

    Student Sport Score
    Mark Football 50

    But customer expects this:

    Student Sport Score
    Mark Football 50
    Peter Football 0

    So even if Sport its not in the visualization the effect of filtering by Football makes Peter dissapear.

    For my solution I take the data from a MSSQL Server put it inside a pandas dataframe. Then I use the following function:

    import pandas as pd
    
    def dummyGenerator(df: pd.DataFrame, distinct_fields: list, combined_fields: list, filter_selection: dict = {}):
        """
        dummyGenerator takes a pandas DataFrame and generates dummy values based on arguments         
        
        Args:
            df (pandas DataFrame) : Source data to input
            distinct_fields (list): Values to take as a distinct chunk. It will take the combinations present ONLY in the datasource of this fields
            combined_fields (list): Name of destination datasource
            filter_selection (dict): filter for specific values Example: {"field A": ["value1", "value2"], "field B": ["value1", "value2"]}
    
        Returns:
            resulting dataframe 
    
        """  
        
        #Distinct: Fields in which we want to take the default combinations available in original datasource
        distinct = df[distinct_fields].drop_duplicates().reset_index(drop=True)
    
        for field in filter_selection:
            distinct = distinct[distinct[field].isin(filter_selection[field])].reset_index(drop=True)
    
        #Get combinations combine all possibilities 
        combinations = []
    
        for field in combined_fields:
            combinations.append(df[field].unique())
    
        
        combined_df = pd.DataFrame(list(product(*combinations)), columns=combined_fields)
    
        result_df = pd.DataFrame()
    
        for column in df.columns:
            if column not in combined_df.columns:
                result_df[column] = None
            else:
                result_df[column] = combined_df[column]
    
    
        result = distinct.merge(result_df, how="cross", suffixes= ["", "_y"])
    
        return result[df.columns.to_list()]
    

    After that I concat the data to the original data source and then I refresh an existent Tableau's datasource with the following script (Note that tableau version refrence your Tableaus's desktop folder Example: 2022.1":

    def to_Tableau_DataSource(df: pd.DataFrame, project: str, datasource: str, tableau_version: str, site: str, server: str):
        """
        to_Tableau_DataSource takes a pandas DataFrame query and uses it to refresh a Tableau DataSource 
            
        It also requires an enviromental variable named ASRA containing the username and password in the format <user>@email.com:<password>
    
        Args:
            df (pandas DataFrame) : Source data to input
            project (str): Name of destination project
            datasource (str): Name of destination datasource
            tableau_version (str): Tableau Desktop version of hosting computer
    
        Returns:
            Nothing 
    
        """  
            
        home = str(Path.home()) + "\\Downloads\\"
    
        connection_string = os.getenv("ASRA") 
    
        u, p = connection_string.split(":")
    
        
        si = site
        s= server       
    
        filename = home + "output.xlsx"
        print(filename)
        df.to_excel(filename, index = False)
    
        filename.replace("/","\\")
    
        os.chdir(f"C:\\Program Files\\Tableau\\Tableau {tableau_version}\\bin\\")
        os.system(f'tableau refreshextract --server {s} --username {u} --password {p} --project "{project}" --site {si} --datasource "{datasource}" --original-file "{filename}"')
        os.remove(filename)