I am trying to load data with numpy.loadtxt. The file I'm trying to read is using cp1252 coding. Is there a possibility to change the encoding to cp1252 with numpy?
The following
import numpy as np
n = 10
myfile = '/path/to/myfile'
mydata = np.loadtxt(myfile, skiprows = n)
gives:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 189: invalid start byte
The file contains metadata (first n rows) followed by a table of floats.
This problem only occurs when running this on Ubuntu (12.04). On Windows it works well. For this reason I think this problem is related to the encoding.
Opening the file as shown in the following works well, too:
import codecs
data = codecs.open(myfile, encoding='cp1252')
datalines = data.readlines()
However I'd like to use np.loadtext
to directly read the data into a numpy array.
You have to open the file with the appropriate encoding before reading it with numpy:
import numpy as np
import codecs
n=10
filecp = codecs.open(myfile, encoding = 'cp1252')
mydata = np.loadtxt(filecp, skiprows = n)