dataframescalaapache-sparkapache-spark-sql

How to merge two dataframes based on matching rows using spark scala


I have two dataframes like below, and I need to merge them based on matching rows.

Dataframe 1

ID status
V1 Low
V2 Low
V3 Low

Dataframe 2

ID status
V1 High
V2 High
V6 High

Expected dataframe like below

ID status
V1 Low
V1 High
V2 Low
V2 High

Solution

  • (I only know Java, not Scala, sorry)
    I would say, if I call :
    your dataset 1: A
    and dataset 2: B

    Column joinClause = A.col("ID").equalTo(B.col("ID"));
    
    Dataset A_with_B = A.join(B, joinClause, "left_semi")
    .union(
       B.join(A, joinClause, "left_semi")
    );