excelvbaexcel-formuladata-cleaningopenrefine

How to delete least filled duplicate rows?


I have excel with Survey Results. There are multiple submissions(rows) by the same email. I want to find the most filled / most completed row and delete the rest of the duplicate rows with the same email address.

I have a table like this. Multiple Submissions from a single email.

Email Question 1 Question 2 Question 3
Email1 Answer Answer Answer
Email1 Answer Answer
Email2 Answer
Email2 Answer
Email2 Answer Answer

I need to delete the least filed duplicated rows. and get Results like bellow.

Email Question 1 Question 2 Question 3
Email1 Answer Answer Answer
Email2 Answer Answer

How can I do this using openrefine or Excel?


Solution

  • This is easily possible with Excel.

    1. Create a helper column containing the formula COUNTA(%RowRange%).

    2. Then sort this helper column from largest to smallest.

    3. Last you select the entire table, click "Remove Duplicates" and in the pop-up box unselect all fields except email.

    See these step-by-step screenshots:

    Helper column formula

    Remove duplicates

    enter image description here

    Note:

    In a case like this:

    Email Question 1 Question 2 Question 3
    Email2 Answer
    Email2 Answer Answer

    The answer to question 2 would be lost of course. If you need to consolidate such cases to result in one row like this:

    Email Question 1 Question 2 Question 3
    Email2 Answer Answer Answer

    That is also easily possible. Let me know if this is what you need or if the first solution is sufficient.