apache-sparkpysparkapache-spark-sqlspark-jdbc

how to use pyspark writing to JDBC without column name


My question is really really simple.

I'm use pyspark to export a hive table to SQL Server.

I found I exported column names as lines in the SQL Server.

I just want to do it without column names.

enter image description here

I don't want these columns in tables...

My pyspark code here:

df.write.jdbc("jdbc:sqlserver://10.8.12.10;instanceName=sql1", "table_name", "overwrite", {"user": "user_name", "password": "111111", "database": "Finance"})

Is there an option to skip column names?


Solution

  • I think the JDBC connector isn't actually what adds those header lines. The header is already present in your Dataframe, it's a known problem when reading data from Hive table.

    If you're using SQL to load data from Hive, you can try filtering the header with condition col != 'col':

    # adapt the condition by verifiying what is in  df.show()    
    df = spark.sql("select * from my_table where sold_to_party!='Sold-To Party'")