pythondataframeformatmultiple-columnsunique-values

Get the first and last value of a column of dataframe respect another column


I'm a beginner on python and I would like to get the first and last value of the column date always that the mac_address be the same, for example:

I've ordered my dataframe by mac_address, date with the next line:

df = df.sort_values(by=['mac_address', 'date'], ascending=(True, True)) 

And the data are:

         router        mac_address      date
589455  15001391    00:00:34:1a:03:e8   2021-01-01 22:09:34
590067  17091211    00:00:34:1a:03:e8   2021-01-01 22:10:54
590136  17091236    00:00:34:1a:03:e8   2021-01-01 22:11:04
.....
.....
.....
635434  15001391    00:00:78:01:0d:11   2021-01-02 00:14:54
636479  17091211    00:00:78:01:0d:11   2021-01-02 00:16:17
949873  17091172    00:00:af:82:56:93   2021-01-02 11:26:39
950699  17091251    00:00:af:82:56:93   2021-01-02 11:27:59
950700  17091253    00:00:af:82:56:93   2021-01-02 11:28:59
950702  17091257    00:00:af:82:56:93   2021-01-02 11:29:59
950703  17091258    00:00:af:82:56:93   2021-01-02 11:30:59
619384  17091174    00:01:09:d2:09:e0   2021-01-01 23:34:32
365351  17091211    00:01:d2:7c:4e:32   2021-01-01 14:27:58
109858  17091236    00:02:75:86:4e:34   2021-01-01 05:50:47
110281  17091211    00:02:75:86:4e:34   2021-01-01 05:50:54

NOTE: the date column has the format "2021-01-01 05:50:54" and the differents mac address that appears are variable in the number of times

And I would like to have two outputs like this:

first output:

    589455  15001391    00:00:34:1a:03:e8   2021-01-01 22:09:34
    590136  17091236    00:00:34:1a:03:e8   2021-01-01 22:11:04
    635434  15001391    00:00:78:01:0d:11   2021-01-02 00:14:54
    636479  17091211    00:00:78:01:0d:11   2021-01-02 00:16:17
    .....
    .....
    949873  17091172    00:00:af:82:56:93   2021-01-02 11:26:39
    950703  17091258    00:00:af:82:56:93   2021-01-02 11:30:59
    619384  17091174    00:01:09:d2:09:e0   2021-01-01 23:34:32
    365351  17091211    00:01:d2:7c:4e:32   2021-01-01 14:27:58

second output: Just considered the data that have a first and the last value, and not be considered the mac_adress that only once appears

    589455  15001391    00:00:34:1a:03:e8   22:09:34
    590136  17091236    00:00:34:1a:03:e8   22:11:04
    635434  15001391    00:00:78:01:0d:11   00:14:54
    636479  17091211    00:00:78:01:0d:11   00:16:17
    .....
    .....
    949873  17091172    00:00:af:82:56:93   11:26:39
    950703  17091258    00:00:af:82:56:93   11:30:59

I don't know if I'm complicating or this task is easier than I see, but I've passed the last 48 hours without any favorable result. Can you help me please? Thanks a lot


Solution

  • As your data are already sorted by mac address and dates, you don't need to use groupby.

    df1 = df.loc[(df['mac_address'].ne(df['mac_address'].shift())) | 
                 (df['mac_address'].ne(df['mac_address'].shift(-1)))]
    

    First output:

    >>> df1
              router        mac_address                 date
    589455  15001391  00:00:34:1a:03:e8  2021-01-01 22:09:34
    590136  17091236  00:00:34:1a:03:e8  2021-01-01 22:11:04
    635434  15001391  00:00:78:01:0d:11  2021-01-02 00:14:54
    636479  17091211  00:00:78:01:0d:11  2021-01-02 00:16:17
    949873  17091172  00:00:af:82:56:93  2021-01-02 11:26:39
    950703  17091258  00:00:af:82:56:93  2021-01-02 11:30:59
    619384  17091174  00:01:09:d2:09:e0  2021-01-01 23:34:32
    365351  17091211  00:01:d2:7c:4e:32  2021-01-01 14:27:58
    109858  17091236  00:02:75:86:4e:34  2021-01-01 05:50:47
    110281  17091211  00:02:75:86:4e:34  2021-01-01 05:50:54
    

    Second output:

    >>> df1.loc[df1.duplicated('mac_address', keep=False)]
              router        mac_address                 date
    589455  15001391  00:00:34:1a:03:e8  2021-01-01 22:09:34
    590136  17091236  00:00:34:1a:03:e8  2021-01-01 22:11:04
    635434  15001391  00:00:78:01:0d:11  2021-01-02 00:14:54
    636479  17091211  00:00:78:01:0d:11  2021-01-02 00:16:17
    949873  17091172  00:00:af:82:56:93  2021-01-02 11:26:39
    950703  17091258  00:00:af:82:56:93  2021-01-02 11:30:59
    109858  17091236  00:02:75:86:4e:34  2021-01-01 05:50:47
    110281  17091211  00:02:75:86:4e:34  2021-01-01 05:50:54