I have excel with Survey Results. There are multiple submissions(rows) by the same email. I want to find the most filled / most completed row and delete the rest of the duplicate rows with the same email address.
I have a table like this. Multiple Submissions from a single email.
Question 1 | Question 2 | Question 3 | |
---|---|---|---|
Email1 | Answer | Answer | Answer |
Email1 | Answer | Answer | |
Email2 | Answer | ||
Email2 | Answer | ||
Email2 | Answer | Answer |
I need to delete the least filed duplicated rows. and get Results like bellow.
Question 1 | Question 2 | Question 3 | |
---|---|---|---|
Email1 | Answer | Answer | Answer |
Email2 | Answer | Answer |
How can I do this using openrefine or Excel?
This is easily possible with Excel.
Create a helper column containing the formula COUNTA(%RowRange%)
.
Then sort this helper column from largest to smallest.
Last you select the entire table, click "Remove Duplicates" and in the pop-up box unselect all fields except email.
See these step-by-step screenshots:
In a case like this:
Question 1 | Question 2 | Question 3 | |
---|---|---|---|
Email2 | Answer | ||
Email2 | Answer | Answer |
The answer to question 2 would be lost of course. If you need to consolidate such cases to result in one row like this:
Question 1 | Question 2 | Question 3 | |
---|---|---|---|
Email2 | Answer | Answer | Answer |
That is also easily possible. Let me know if this is what you need or if the first solution is sufficient.