pythonpandascsvimport-from-csv

pandas.read_csv() How to exclude specific separtor combinations


I have a csv like:

file:

1;a;3;4
1;2;b;4
1;[a;b];3;4

Loading like pd.from_csv(file, sep=';')

returns error:

ParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

as the [a;b] is seen as a separator. Is there a way to exclude ; when in [ ]

Thanks

p.s. changing the file is impossible due to reasons


Solution

  • You can use ;(?![^\[]*\]) as regex separator to match only semicolons not inside brackets:

    pd.read_csv(filename, sep=r';(?![^\[]*\])', engine='python')
    

    demo:

    text = '''1;a;3;4
    1;2;b;4
    1;[a;b];3;4
    '''
    
    import io
    import pandas as pd
    
    pd.read_csv(io.StringIO(text), sep=r';(?![^\[]*\])', engine='python')
    

    output:

       1      a  3  4
    0  1      2  b  4
    1  1  [a;b]  3  4
    

    regex demo