pythonpandasnumpylinear-regressionnumpy-ndarray

Numpy reshape issue: ValueError: cannot reshape array


I'm trying to implement linear regression on the California housing dataset, and I'm reading data as below:

data = pd.read_csv(r'C:\Users\California_Houses.csv',header=None)
print(data.shape)
output: (20640, 14)

print(data.head())

output:
  0              1           2          3             4   \
0  Median_House_Value  Median_Income  Median_Age  Tot_Rooms  Tot_Bedrooms   
1              452600         8.3252          41        880           129   
2              358500         8.3014          21       7099          1106   
3              352100         7.2574          52       1467           190   
4              341300         5.6431          52       1274           235   

           5           6         7          8                  9   \
0  Population  Households  Latitude  Longitude  Distance_to_coast   
1         322         126     37.88    -122.23   9263.04077285038   
2        2401        1138     37.86    -122.22   10225.7330715424   
3         496         177     37.85    -122.24   8259.08510932293   
4         558         219     37.85    -122.25    7768.0865708364   

                 10                    11                   12  \
0    Distance_to_LA  Distance_to_SanDiego  Distance_to_SanJose   
1    556529.1583418       735501.80698384     67432.5170008434   
2  554279.850068765      733236.884360166     65049.9085739663   
3  554610.717069378       733525.68293736     64867.2898334847   
4  555194.266086292      734095.290744033     65287.1384120522   

                         13  
0  Distance_to_SanFrancisco  
1          21250.2137667799  
2          20880.6003997074  
3          18811.4874496884  
4          18031.0475677266  

since columns names are coming as first row of data, i tried to remove it as below

data = data.iloc[1:,:]

Then trying to convert it to Numpy ndarray and reshape it:

x = np.array(data.iloc[1:,1:]).reshape(data.shape[0],data.shape[1]-1)

ValueError                                Traceback (most recent call last)
Input In [16], in <cell line: 10>()
      8 print(data.all())
      9 #y = np.array(data.iloc[1:,0]).reshape(data.shape[0],1)
---> 10 x = np.array(data.iloc[1:,1:]).reshape(data.shape[0],data.shape[1]-1)
     11 #y = np.array(data.iloc[1:,0]).reshape(data.shape[0],1)
     12 print(x.shape)

ValueError:

 cannot reshape array of size 268320 into shape (20641,13)

Getting this error, please help.


Solution

  • Can you try below:

    x = np.array(data.iloc[:,1:]).reshape(data.shape[0],data.shape[1]-1)