I need to read an excel file and loop over rows, retain value for a column Location
, anonymise the other values for the same column, and output the result for all combinations to seperate pdf files
Original DF
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000 | 13 | 1 |
| Coventry | 2200 | 24 | 2 |
| Norwich | 3000 | 56 | 3 |
| Sheffield | 2800 | 65 | 4 |
| Luton | 1800 | 90 | 5 |
Desired DF #1
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| Leicester | 2000 | 13 | 1 |
| ######## | 2200 | 24 | 2 |
| ######## | 3000 | 56 | 3 |
| ######## | 2800 | 65 | 4 |
| ######## | 1800 | 90 | 5 |
Desired DF #2
| Location | Patients | Deceased | Rank |
|-----------|----------|----------|------|
| ######## | 2000 | 13 | 1 |
| Coventry | 2200 | 24 | 2 |
| ######## | 3000 | 56 | 3 |
| ######## | 2800 | 65 | 4 |
| ######## | 1800 | 90 | 5 |
I want to write out each DF using pd.to_html
and then convert.
I'm finding it hard to figure out how to implement this in pandas or how best use iterrows
and faker
to achieve this or whether a replace()
would do in each iteration even.
The easiest way to do it, it's to play with the index to change one value by one value
import pandas as pd
import pandas as pd
# Create the fake data
data = {
'Location': ['Leicester', 'Coventry', 'Norwich', 'Sheffield', 'Luton'],
'Patients': [2000, 2200, 3000, 2800, 1800],
'Deceased': [13, 24, 56, 65, 90],
'Rank': [1, 2, 3, 4, 5]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Function to create a new DataFrame with anonymized Locations except for one row
def anonymize_locations(df, index_to_keep):
# Create a copy of the original DataFrame to avoid modifying it
df_copy = df.copy()
# Replace all Location values with '########' except for the index_to_keep
df_copy['Location'] = df_copy['Location'].apply(lambda x: '########')
df_copy.at[index_to_keep, 'Location'] = df.at[index_to_keep, 'Location']
return df_copy
# Step 3: Loop over each row and generate the desired DataFrames and save to html
for index in df.index:
anonymized_df = anonymize_locations(df, index)
file_name = f"output_{index + 1}.html"
anonymized_df.to_html(file_name)
print(f"Saved {file_name}")