I have a graphlab sframe dataframe where few rows have similar id value in "uid" column.
| VIM Document Type | Vendor Number & Zone | Value <5000 or >5000 | Today Status |
+-------------------+----------------------+----------------------+--------------+
| PO_VR_GLB | 1613407EMEAi | Less than 5000 | 0 |
| PO_VR_GLB | 249737LATIN AMERICA | More than 5000 | 1 |
| PO_MN_GLB | 1822317NORTH AMERICA | Less than 5000 | 1 |
| PO_MN_GLB | 1822317NORTH AMERICA | Less than 5000 | 1 |
| PO_MN_GLB | 1822317NORTH AMERICA | Less than 5000 | 1 |
| PO_MN_GLB | 1216902NORTH AMERICA | More than 5000 | 1 |
| PO_MN_GLB | 1213709EMEAi | Less than 5000 | 0 |
| PO_MN_GLB | 882843NORTH AMERICA | More than 5000 | 1 |
| PO_MN_GLB | 2131503ASIA PACIFIC | More than 5000 | 1 |
| PO_MN_GLB | 2131503ASIA PACIFIC | More than 5000 | 1 |
+-------------------+----------------------+----------------------+--------------+
+---------------------+
| uid |
+---------------------+
| 63068$#069 |
| 5789$#13 |
| 12933036$#IN6532618 |
| 12933022$#IN6590132 |
| 12932349$#IN6636468 |
| 12952077$#203250 |
| 13012770$#MUML04184 |
| 12945049$#112370 |
| 13582330$#CI160118 |
| 13012770$#MUML04184|
Here, I want to retain all the rows with unique uids and only one of the rows which have same uid, the row to be retained can be any row which has today status=1, (i.e. there can be rows where uid and row status are same, but other fields are different, in that case, we can keep any one of these rows.) I want to do these operations in graphlab sframes, but am unable to figure out how to proceed.
you may use SFrame.unique()
that can give you unique rows
sf = sf.unique()
Other way can also be using either groupby()
method or join()
methods where you can specify column name and further work. You may read their documentation on turi.com
click for various ways.
Another way (that I personally prefer) is to convert SFrame to Dataframe of pandas and work on getting data operations and again converting pandas Dataframe to SFrame. It depends on your choice and I hope this helps.