pythonpysparkapache-spark-sqldata-qualitygreat-expectations

Great Expectations list total unique values


I have run Great Expectation check expect_column_values_to_be_unique check on one of the column. It produced the following result as below.Total There are 62 Duplicates but in the output list it is returning only 20 elements. How to retrieve all duplicate records in that column. df.expect_column_values_to_be_unique('A')

  "exception_info": null,
  "expectation_config": {
    "expectation_type": "expect_column_values_to_be_unique",
    "kwargs": {
      "column": "A",
      "result_format": "BASIC"
    },
    "meta": {}
  },
  "meta": {},
  "success": false,
  "result": {
    "element_count": 100,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 62,
    "unexpected_percent": 62.0,
    "unexpected_percent_nonmissing": 62.0,
    "partial_unexpected_list": [
      37,
      62,
      72,
      53,
      22,
      61,
      95,
      21,
      64,
      59,
      77,
      53,
      0,
      22,
      24,
      46,
      0,
      16,
      78,
      60
    ]
  }
}

Solution

  • You're currently passing result_format as BASIC. To get the level of detail you're looking for, you'll want to instead pass result_format for this Expectation as COMPLETE to get the full list of unexpected values. For example:

    df.expect_column_values_to_be_unique(column="A", result_format="COMPLETE")
    

    See this documentation for more on result_format.