pythonregexdataframecharactercolumnheader

How do I remove some data frame columns using regular expressions filtering but keep others that contain some of the characters?


I need to remove columns from a pandas data frame with headers containing a specific string pattern (.1). The code I have so far does this but also removes columns with headers containing the pattern 11, which I want to keep:

DX_totals = DX_totals[DX_totals.columns.drop(list(DX_totals.filter(regex='[.|1]{2}')))]

How do I adjust the code to drop only columns with headers containing the pattern .1?

The data are in the format:

Well ID PlantFlow   PlantChrome DXRunTime   ME01    ME02    ME03    ME04    ME05    ME06    ... MJ22.1  MJ23.1  MJ24.1  MJ25.1  MJ26.1  MJ27.1  MJ28.1  MJ29.1  MJ30.1  DX
0   2021-01-01 00:01:00 91668344    5426653 22092   980729  1117150 103164  287075  2747259 1885657 ... -44.115395  -40.537468  0   -31.149002  -61.727837  0   0   -68.037201  -63.994675  22092
1   2021-01-02 00:00:00 92506192    5471052 22332   993835  1131376 0   0   2777229 0   ... -44.074005  -40.616493  0   -32.239822  -61.803848  0   0   -68.023262  -63.993423  22332
2   2021-01-03 00:00:00 93343920    5515476 22572   1006940 1145596 0   0   2807222 0   ... -43.943542  -40.857651  0   -31.181437  -61.927658  0   0   -68.01889   -63.997154  22572

The desired outcome would look like:

Well ID PlantFlow   PlantChrome DXRunTime   ME01    ME02    ME03    ME04    ME05    ME06    ME11  ...
0   2021-01-01 00:01:00 91668344    5426653 22092   980729  1117150 103164  287075  2747259 2748354 ...
1   2021-01-02 00:00:00 92506192    5471052 22332   993835  1131376 0   0   2777229 0   2777350 ...

Thank you!


Solution

  • I created a dummy df with some of your headers:

    df = pd.DataFrame({'ME01': ['dummy0'], 'ME04': ['dummy1'], 'ME05': ['dummy2'], 'MJ22.1': ['dummy3'], 'MJ24.1': ['dummy4']})
    

    Then changed the regex:

    df[df.columns.drop(list(df.filter(regex=r'\.1')))]
    

    output:

    enter image description here