I have a following quest: There is an unmarked text file in the .TXT format of the following structure:
City name is Paris
It was build in 303 BD
NORTH PART
Size 56% of city
QUATERS
Quarter name Saint-Ouen
Size Big
Quarter name Saint-Denis
Size Medium
STREETS
Street name Napoleon's Av.
Number of houses 78
Is located in Saint-Ouen
ZipCode 001-020
Street name Republic st.
Number of houses 101
Is located in Saint-Ouen
ZipCode 031-039
CITIZENS
Hello! My name is Peter, i'm 29 years old. I live in Napoleon's Av. in 45 house, flat 5, room 111. I am office worker
Hello! My name is Helen, i'm 23 years old. I live in Napoleon's Av. in 45 house, flat 1, room 90. I am office worker and mother
Hello! My name is Pole, i'm 33 years old. I live in Republic st. in 100 house, flat 10, room 300. I am gardener, artist and freelancer
SOUTH PART
Size 44% of city
~same story~
So I want to deserialize this file for eaier data manipulation into dict and/or SQL database. Those files have ~5000 lines for 1 city and ~1000 cities in total.
I've created dataclasses for data storage
@dataclass
class CitizenData:
name: str
age: int
street: str
house: str #because it can be like "45B"
flat: int
room: int
jobs: list[str] = field(default_factory=list)
@dataclass
class StreetData:
name: str
houses_located: int
zip_code_min: int
zip_code_max: int
citizens:list[CitizenData] = field(default_factory=list)
class QuaterSize(Enum):
SMALL=1
MEDIUM = 2
BIG = 3
@dataclass
class QuarterData:
name: str
size: QuaterSize
streets: list[StreetData] = field(default_factory=list)
@dataclass
class CityPartData:
name: str
procent_of_city_size: float
quarters: list[QuarterData] = field(default_factory=list)
@dataclass
class City:
name: str
year_of_foundation: int
city_parts: list[CityPartData] = field(default_factory=list)
And now I'm wondering: is there any elegant and pythonic way to fill those classes with data from file? I've tried to search for kind of "formated reading" (like f-strings but with data gathering instead of data presenting) and found nothing. Only JSON and XML handling by special libs and nothing for this type of problems. Slicing the string lines looks barbaric but it is only type of solution I have on mind.
I want to make this program clean and easy supportable but I'm lack of python and architecture experience to be sure how to do it
kind of "formated reading" (like f-strings but with data gathering instead of data presenting)
There exists parse
which is described as
parse() is the opposite of format()
.format
is ancient way of doing for what f-strings are used in modern times. Using part of your input as example
import parse
citizens = "Hello! My name is Peter, i'm 29 years old. I live in Napoleon's Av. in 45 house, flat 5, room 111. I am office worker\nHello! My name is Helen, i'm 23 years old. I live in Napoleon's Av. in 45 house, flat 1, room 90. I am office worker and mother\nHello! My name is Pole, i'm 33 years old. I live in Republic st. in 100 house, flat 10, room 300. I am gardener, artist and freelancer"
for citizen in citizens.splitlines(keepends=False):
data = parse.parse("Hello! My name is {name}, i'm {age:d} years old. I live in {address}. I am {job}", citizen)
print(data.named) # data.named is dict
gives output
{'name': 'Peter', 'age': 29, 'address': "Napoleon's Av. in 45 house, flat 5, room 111", 'job': 'office worker'}
{'name': 'Helen', 'age': 23, 'address': "Napoleon's Av. in 45 house, flat 1, room 90", 'job': 'office worker and mother'}
{'name': 'Pole', 'age': 33, 'address': 'Republic st. in 100 house, flat 10, room 300', 'job': 'gardener, artist and freelancer'}
Observe :d
after age which informs parse to convert what was found to number