Today I have a challenge at my school, it is convert a CSV file to a JSON file.
This CSV has a table structure (which means contains the information from an oracle table in this example). So i have to convert this CSV file into a hierarchical JSON.
My CSV example contain the next information:
TABLE_NAME,TIPO_CARGA,COLUMN_NAME,DATA_TYPE
CUENTAS,Full load,PERFIL,NVARCHAR2
CUENTAS,Full load,LLAMADA_INI,DATE
CUENTAS,Full load,LLAMADA_END,DATE
CUENTAS_PLAN,Delta,REGISTRO,NVARCHAR2
CUENTAS_PLAN,Delta,ULTIMA_VISITA,DATE
CUENTAS_PLAN,Delta,FECHA,DATE
CUENTAS_TACTICAS,Delta,CIUDAD,NVARCHAR2
CUENTAS_TACTICAS,Delta,SALIDA,NVARCHAR2
CUENTAS_TACTICAS,Delta,COMPLETADO,DATE
CUENTAS_MIEMBROS,Full load,NEGOCIO,NVARCHAR2
CUENTAS_MIEMBROS,Full load,CORE,NVARCHAR2
CUENTAS_MIEMBROS,Full load,DESC,NVARCHAR2
and I start with python using some libraries like pandas and also json, the next code is this:
import pandas as pd
import json
df = pd.read_csv("list_tablas_short.csv")
print(df.head())
thisisjson = df.to_json(orient='records')
print('CSV to JSON:\n', thisisjson)
thisisjson_dict = json.loads(thisisjson)
with open('data.json', 'w') as json_file:
json.dump(thisisjson_dict, json_file)
And the result is:
[
{
"TABLE_NAME;\"COLUMN_NAME\";\"DATA_TYPE\"": "CUENTAS;PERFIL;NVARCHAR2"
},
{
"TABLE_NAME;\"COLUMN_NAME\";\"DATA_TYPE\"": "CUENTAS;LLAMADA_INI;DATE"
},
{...}
]
This is nto the way i'm looking for.
The idea is get something like this:
{
"tablas": [
{
"nombre": "CUENTAS",
"load_type": "Full load"
"fields": [
{
"campo": "PERFIL",
"tipo": "NVARCHAR2"
},
{
"campo": "LLAMADA_INI",
"tipo": "DATE"
},
{
"campo": "LLAMADA_END",
"tipo": "DATE"
}
]
},
{
"nombre": "CUENTAS_PLAN",
"load_type": "Delta"
"fields": [
{
"campo": "REGISTRO",
"tipo": "NVARCHAR2"
},
{
"campo": "ULTIMA_VISITA",
"tipo": "DATE"
},
{
"campo": "FECHA",
"tipo": "DATE"
}
]
}
]
}
Can somebody help me with this?
Using a dictionary comprehension with groupby
and to_dict
:
import json
thisisjson_dict = {'tablas':
[{'nombre': n, 'load_type': l,
'fields': g.to_dict('records')}
for (n, l), g in df.rename(columns={'COLUMN_NAME':'campo',
'DATA_TYPE':'tipo'})
.set_index(['TABLE_NAME', 'TIPO_CARGA'])
.groupby(level=[0, 1])
]
}
out = json.dumps(thisisjson_dict, indent=2)
Variant:
thisisjson_dict = {'tablas':
[{'nombre': n, 'load_type': l,
'fields': g.to_dict('records')}
for (n, l), g in df.rename(columns={'COLUMN_NAME':'campo',
'DATA_TYPE':'tipo'})
.groupby(['TABLE_NAME', 'TIPO_CARGA'])
[['campo', 'tipo']]
]
}
out = json.dumps(thisisjson_dict, indent=2)
Output:
{
"tablas": [
{
"nombre": "CUENTAS",
"load_type": "Full load",
"fields": [
{
"campo": "PERFIL",
"tipo": "NVARCHAR2"
},
{
"campo": "LLAMADA_INI",
"tipo": "DATE"
},
{
"campo": "LLAMADA_END",
"tipo": "DATE"
}
]
},
{
"nombre": "CUENTAS_MIEMBROS",
"load_type": "Full load",
"fields": [
{
"campo": "NEGOCIO",
"tipo": "NVARCHAR2"
},
{
"campo": "CORE",
"tipo": "NVARCHAR2"
},
{
"campo": "DESC",
"tipo": "NVARCHAR2"
}
]
},
{
"nombre": "CUENTAS_PLAN",
"load_type": "Delta",
"fields": [
{
"campo": "REGISTRO",
"tipo": "NVARCHAR2"
},
{
"campo": "ULTIMA_VISITA",
"tipo": "DATE"
},
{
"campo": "FECHA",
"tipo": "DATE"
}
]
},
{
"nombre": "CUENTAS_TACTICAS",
"load_type": "Delta",
"fields": [
{
"campo": "CIUDAD",
"tipo": "NVARCHAR2"
},
{
"campo": "SALIDA",
"tipo": "NVARCHAR2"
},
{
"campo": "COMPLETADO",
"tipo": "DATE"
}
]
}
]
}