pythonstringfile-ioarchitecturepython-dataclasses

The better way to read a formated data from .TXT file strings in Python


I have a following quest: There is an unmarked text file in the .TXT format of the following structure:

City name is Paris
It was build in 303 BD

 NORTH PART
 Size 56% of city
 QUATERS
 Quarter name Saint-Ouen
 Size Big
 
 Quarter name Saint-Denis
 Size Medium

 STREETS
 Street name  Napoleon's Av.
 Number of houses 78
 Is located in Saint-Ouen
 ZipCode 001-020

 Street name  Republic st.
 Number of houses 101
 Is located in Saint-Ouen
 ZipCode 031-039

 CITIZENS
 Hello! My name is Peter, i'm 29 years old. I live in Napoleon's Av. in 45 house, flat 5, room 111. I am office worker

 Hello! My name is Helen, i'm 23 years old. I live in Napoleon's Av. in 45 house, flat 1, room 90. I am office worker and mother

 Hello! My name is Pole, i'm 33 years old. I live in Republic st. in 100 house, flat 10, room 300. I am gardener, artist and freelancer

 SOUTH PART
 Size 44% of city
  ~same story~

So I want to deserialize this file for eaier data manipulation into dict and/or SQL database. Those files have ~5000 lines for 1 city and ~1000 cities in total.

I've created dataclasses for data storage

@dataclass
class CitizenData:
   name: str
   age: int
   street: str
   house: str #because it can be like "45B"
   flat: int
   room: int
   jobs: list[str] = field(default_factory=list)


@dataclass
class StreetData:
   name: str
   houses_located: int
   zip_code_min: int
   zip_code_max: int
   citizens:list[CitizenData] = field(default_factory=list)


class QuaterSize(Enum):
   SMALL=1
   MEDIUM = 2
   BIG = 3


@dataclass
class QuarterData:
   name: str
   size: QuaterSize
   streets: list[StreetData] = field(default_factory=list)


@dataclass
class CityPartData:
   name: str
   procent_of_city_size: float
   quarters: list[QuarterData] = field(default_factory=list)

@dataclass
class City:
   name: str
   year_of_foundation: int
   city_parts: list[CityPartData] = field(default_factory=list)
   

And now I'm wondering: is there any elegant and pythonic way to fill those classes with data from file? I've tried to search for kind of "formated reading" (like f-strings but with data gathering instead of data presenting) and found nothing. Only JSON and XML handling by special libs and nothing for this type of problems. Slicing the string lines looks barbaric but it is only type of solution I have on mind.

I want to make this program clean and easy supportable but I'm lack of python and architecture experience to be sure how to do it


Solution

  • kind of "formated reading" (like f-strings but with data gathering instead of data presenting)

    There exists parse which is described as

    parse() is the opposite of format()

    .format is ancient way of doing for what f-strings are used in modern times. Using part of your input as example

    import parse
    citizens = "Hello! My name is Peter, i'm 29 years old. I live in Napoleon's Av. in 45 house, flat 5, room 111. I am office worker\nHello! My name is Helen, i'm 23 years old. I live in Napoleon's Av. in 45 house, flat 1, room 90. I am office worker and mother\nHello! My name is Pole, i'm 33 years old. I live in Republic st. in 100 house, flat 10, room 300. I am gardener, artist and freelancer"
    for citizen in citizens.splitlines(keepends=False):
        data = parse.parse("Hello! My name is {name}, i'm {age:d} years old. I live in {address}. I am {job}", citizen)
        print(data.named)  # data.named is dict
    

    gives output

    {'name': 'Peter', 'age': 29, 'address': "Napoleon's Av. in 45 house, flat 5, room 111", 'job': 'office worker'}
    {'name': 'Helen', 'age': 23, 'address': "Napoleon's Av. in 45 house, flat 1, room 90", 'job': 'office worker and mother'}
    {'name': 'Pole', 'age': 33, 'address': 'Republic st. in 100 house, flat 10, room 300', 'job': 'gardener, artist and freelancer'}
    

    Observe :d after age which informs parse to convert what was found to number