pythonstdinsysline-endings

Changing sys.stdin mode


How do I change the mode stdin is opened in? Specifically, we're piping CSV files to the python script to clean up the data, but with vertical tabs in the data it seems to need to be in universal-newlines mode.

The problem data seems to be some \x0b characters in the input stream. [edit: but actually turns out to be lines ending only with \r]

As printed by python, after opening one of the files with 'rU'

['P', 'B', '', '1 W Avene, #8\x0bMiami Beach, FL 33139']
['S', 'H', '\x0bElberon, NJ 07740', '9 E Avenue\x0bElberon, NJ 07740']
['C', 'W', 'E R A', '2 B 3rd Floor \x0bNew York NY 10023 ']
['D', 'M', '', '1 K Street, NW\x0bWashington, DC 20005']
['E', 'W', '', '5 P C Lane\x0bDenver, CO 80209-3311']

Solution

  • Your problem is that the CSV file you are reading uses CR (\r) newlines exclusively; it has nothing to do with the vertical tabs. Python 2.x opens stdin without universal line support (so that binary files work correctly).

    As a workaround, you can try this, assuming your input is relatively small:

    csv.reader(sys.stdin.read().split('\r'))