I'm a beginner on python and I would like to get the first and last value of the column date always that the mac_address be the same, for example:
I've ordered my dataframe by mac_address, date with the next line:
df = df.sort_values(by=['mac_address', 'date'], ascending=(True, True))
And the data are:
router mac_address date
589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34
590067 17091211 00:00:34:1a:03:e8 2021-01-01 22:10:54
590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04
.....
.....
.....
635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54
636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17
949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39
950699 17091251 00:00:af:82:56:93 2021-01-02 11:27:59
950700 17091253 00:00:af:82:56:93 2021-01-02 11:28:59
950702 17091257 00:00:af:82:56:93 2021-01-02 11:29:59
950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59
619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32
365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58
109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47
110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54
NOTE: the date column has the format "2021-01-01 05:50:54" and the differents mac address that appears are variable in the number of times
And I would like to have two outputs like this:
first output:
589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34
590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04
635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54
636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17
.....
.....
949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39
950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59
619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32
365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58
second output: Just considered the data that have a first and the last value, and not be considered the mac_adress that only once appears
589455 15001391 00:00:34:1a:03:e8 22:09:34
590136 17091236 00:00:34:1a:03:e8 22:11:04
635434 15001391 00:00:78:01:0d:11 00:14:54
636479 17091211 00:00:78:01:0d:11 00:16:17
.....
.....
949873 17091172 00:00:af:82:56:93 11:26:39
950703 17091258 00:00:af:82:56:93 11:30:59
I don't know if I'm complicating or this task is easier than I see, but I've passed the last 48 hours without any favorable result. Can you help me please? Thanks a lot
As your data are already sorted by mac address and dates, you don't need to use groupby
.
df1 = df.loc[(df['mac_address'].ne(df['mac_address'].shift())) |
(df['mac_address'].ne(df['mac_address'].shift(-1)))]
First output:
>>> df1
router mac_address date
589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34
590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04
635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54
636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17
949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39
950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59
619384 17091174 00:01:09:d2:09:e0 2021-01-01 23:34:32
365351 17091211 00:01:d2:7c:4e:32 2021-01-01 14:27:58
109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47
110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54
Second output:
>>> df1.loc[df1.duplicated('mac_address', keep=False)]
router mac_address date
589455 15001391 00:00:34:1a:03:e8 2021-01-01 22:09:34
590136 17091236 00:00:34:1a:03:e8 2021-01-01 22:11:04
635434 15001391 00:00:78:01:0d:11 2021-01-02 00:14:54
636479 17091211 00:00:78:01:0d:11 2021-01-02 00:16:17
949873 17091172 00:00:af:82:56:93 2021-01-02 11:26:39
950703 17091258 00:00:af:82:56:93 2021-01-02 11:30:59
109858 17091236 00:02:75:86:4e:34 2021-01-01 05:50:47
110281 17091211 00:02:75:86:4e:34 2021-01-01 05:50:54