I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):
Server | Hour1 | Hour2 | Hour3 |
---|---|---|---|
server1 | val1.1 | val1.2 | val1.3 |
server2 | val2.1 | val 2.2 | val2.3 |
I need it to be in a structure like this:
Server | time | Value |
---|---|---|
server1 | 0 | value0 |
server1 | 1 | value1 |
server2 | 0 | value0 |
server2 | 1 | value1 |
The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?
My current attempt is creating a new df with nested for loops like so:
for index, row in df.iterrows():
for n in range(0,167):
new_df.loc[len(new_df.index)] = row
new_df.iat[len(new_df.index)-1, 2] = n
for index, row in new_df.iterrows():
for i, r in df.iterrows():
new_df_ts = row[2]
if(row[0] == r[0]):
new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]
Use wide_to_long
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""
Server Hour1 Hour2 Hour3
server1 val1.1 val1.2 val1.3
server2 val2.1 val2.2 val2.3"""), sep="\t")
df = pd.wide_to_long(df, "Hour", "Server", "Time") \
.rename(columns={"Hour": "Value"}) \
.reset_index()
which results in
Server Time Value
0 server1 1 val1.1
1 server2 1 val2.1
2 server1 2 val1.2
3 server2 2 val2.2
4 server1 3 val1.3
5 server2 3 val2.3
Alternatively, use melt
and then strip the Hour
text out of the Time
column.
df = df.melt("Server", var_name="Time")
df["Time"] = df["Time"].str.strip("Hour").astype(int)
Then do whatever additional processing is required on the Time
and Value
columns, such as subtracting 1 from the time period. Use df.replace
or df["Value"].str.replace
to alter values if needed.