pythonpandasstem

How to get only filename without extension?


Imagine you have these paths of files you want to get the filename without extension from:

                       relfilepath
0                  20210322636.pdf
12              factuur-f23622.pdf
14                ingram micro.pdf
19    upfront.nl domein - Copy.pdf
21           upfront.nl domein.pdf
Name: relfilepath, dtype: object

I came up with the following however this gives me the problem that for the first item it becomes a number outputting '20210322636.0'.

from pathlib import Path


for i, row in dffinalselection.iterrows():
    dffinalselection['xmlfilename'][i] = Path(dffinalselection['relfilepath'][i]).stem
    dffinalselection['xmlfilename'] = dffinalselection['xmlfilename'].astype(str)

This is wrong since it should be '20210322636'

Please help!


Solution

  • If the column values are always the filename/filepath, split it from right on . with maxsplit parameter as 1 and take the first value after splitting.

    >>> df['relfilepath'].str.rsplit('.', n=1).str[0]
    
    0                  20210322636
    12              factuur-f23622
    14                ingram micro
    19    upfront.nl domein - Copy
    21           upfront.nl domein
    Name: relfilepath, dtype: object