Following DataFrame contains integer values. I want them to reshaped into a new column where every row will be represented, as a combination of each 3 rows of each columns from old dataframe.
import pandas as pd
data = pd.DataFrame({'column1': [123, 456, 789, 321, 654, 987, 1234, 45678],
'column2': [123, 456, 789, 321, 654, 987, 1234, 45678]})
data=data.astype(str) #string conv.
n = len(data) // 3 #reshaping to new DF
# Create a new DataFrame without commas
X = pd.DataFrame({
'vector': [' '.join(data.iloc[i:i+3, :].values.flatten()) for i in range(0, len(data), 3)]
})
Output:
vector
0 123 123 456 456 789 789
1 321 321 654 654 987 987
2 1234 1234 45678 45678
Now this datframe contans 'str' values. Is it possible to convert this datframe to 'int' again. Beacuse, I want to use this into SVM algorithm as numpy array, where it consider this dataframe as error due to 'str' object. I was unable to convert it to 'int' again, or is there any alternative way to do this?
You can attain the same result in a more idiomatic way by apply
ing a concatenating function to every group formed after splitting the dataframe into n=3
consecutive rows. No need to cast to str
in the middle:
def concat(x):
return pd.concat([x.T[c] for c in x.T]).to_list()
new = data.groupby(data.index // 3).apply(concat)
print(new)
gives
0 [123, 123, 456, 456, 789, 789]
1 [321, 321, 654, 654, 987, 987]
2 [1234, 1234, 45678, 45678]
dtype: object
In the resulting dataframe (actually a Series
), the value is of the type returned by concat
, in my example a list
. For other types, convert appropriately, e.g. .to_numpy()
.