pythonclassobjectobject-oriented-analysis

Python - creating multiple objects for a class


In python I need to create 43 instances of a class 'Student' that includes the variables first_name, middle_name, last_name, student_id by reading in a file (Students.txt) and parsing it. The text file appears like this:

Last Name  Midle Name  First Name   Student ID  
----------------------------------------------
Howard                  Moe         howar1m     
Howard                  Curly       howar1c     
Fine                    Lary        fine1l      
Howard                  Shemp       howar1s     
Besser                  Joe         besse1j     
DeRita      Joe         Curly       derit1cj    
Tiure       Desilijic   Jaba        tiure1jd    
Tharen                  Bria        thare1b     
Tai         Besadii     Durga       tai1db      
Hego                    Damask      hego1d      
Lannister               Tyrion      lanni1t     
Stark                   Arya        stark1a     
Clegane                 Sandor      clega1s     
Targaryen               Daenerys    targa1d     
Bombadil                Tom         bomba1t     
Brandybuck              Meriadoc    brand1m     
Took                    Pregrin     took1p      
McCoy                   Leonard     mccoy1l     
Scott                   Montgomery  scott1m     
Crusher                 Wesley      crush1w     
Montoya                 Inigo       monto1i     
Rugen                   Tyrone      rugen1t     
Solo                    Han         solo1h      
Corey                   Carl        corey1c     
Flaumel                 Evelyn      flaum1e     
Taltos                  Vlad        talto1v     
e'Drien                 Morrolan    edrie1m     
Watson                  John        watso1j     
McCoy                   Ebenezar    mccoy1e     
Carpenter               Molly       carpe1m     
Graystone               Zoe         grays1z
Adama                   William     adama1w
Adama       Joseph      Leland      adama1l
Roslin                  Laura       rosli1l
Baltar                  Gaius       balta1g
Tigh                    Ellen       tigh1e
Tigh                    Saul        tigh1s
Cottle                  Sherman     cottl1s
Zarek                   Thomas      zarek1t
Murphy      James       Alexander   murph1a
Sobchak                 Walter      sobch1w
Dane                    Alexander   dane1a
Gruber                  Hans        grube1h
Biggs       John        Gil         biggs1gj

The class student is:

class Student (object):
    def __init__(self, first_name, middle_name, last_name, student_id):
        self.__first_name = first_name
        self.__middle_name = middle_name
        self.__last_name = last_name
        self.__student_id = student_id

What would be the easiest way to read into 'Students.txt' and create each instance of student?


Solution

  • Step by step tutorial

    To read the file content, use io.open. Don't forget to specify the file encoding if any name has accentuated characters.

    with io.open('students.txt', mode="r", encoding="utf8") as fd:
        content = fd.read()
    

    Here, you read the whole content and store it in memory (amount of data is small). You can also use an iterator.

    Then, you can split the content line by line with str.splitlines():

    lines = content.splitlines()
    # print(lines)
    

    You get something like:

    ['Last Name  Midle Name  First Name   Student ID  ',
     '----------------------------------------------',
     'Howard                  Moe         howar1m     ',
     'Howard                  Curly       howar1c     ',
     'Fine                    Lary        fine1l      ',
     'Howard                  Shemp       howar1s     ',
     'Besser                  Joe         besse1j     ',
     'DeRita      Joe         Curly       derit1cj    ',
     'Tiure       Desilijic   Jaba        tiure1jd    ',
     'Tharen                  Bria        thare1b     ']
    

    You have (nearly) fixed-length lines, so you can use slices to extract the fields.

    Here is what you can do for the header:

    header = lines.pop(0)
    fields = header[0:8], header[11:21], header[23:33], header[36:46]
    # print(fields)
    

    You get:

    ('Last Nam', 'Midle Name', 'First Name', 'Student ID')
    

    You can drop the line of hyphens:

    lines.pop(0)
    

    For each line, you can extract values using slices too. Note: slice indices are slightly different:

    for line in lines:
        record = line[0:8], line[12:21], line[23:34], line[36:46]
        # print(record)
    

    You'll get values with trailing space:

    ('Howard  ', '         ', ' Moe       ', 'howar1m   ')
    ('Howard  ', '         ', ' Curly     ', 'howar1c   ')
    ('Fine    ', '         ', ' Lary      ', 'fine1l    ')
    ('Howard  ', '         ', ' Shemp     ', 'howar1s   ')
    ('Besser  ', '         ', ' Joe       ', 'besse1j   ')
    ('DeRita  ', 'Joe      ', ' Curly     ', 'derit1cj  ')
    ('Tiure   ', 'Desilijic', ' Jaba      ', 'tiure1jd  ')
    ('Tharen  ', '         ', ' Bria      ', 'thare1b   ')
    

    To avoid trailing spaces, use str.strip() function:

    for line in lines:
        record = line[0:8], line[12:21], line[23:34], line[36:46]
        record = [v.strip() for v in record]
        # print(record)
    

    You get:

    ['Howard', '', 'Moe', 'howar1m']
    ['Howard', '', 'Curly', 'howar1c']
    ['Fine', '', 'Lary', 'fine1l']
    ['Howard', '', 'Shemp', 'howar1s']
    ['Besser', '', 'Joe', 'besse1j']
    ['DeRita', 'Joe', 'Curly', 'derit1cj']
    ['Tiure', 'Desilijic', 'Jaba', 'tiure1jd']
    ['Tharen', '', 'Bria', 'thare1b']
    

    At this point, I recommend you to store your record as a dict in a list:

    records = []
    for line in lines:
        record = line[0:8], line[12:21], line[23:34], line[36:46]
        record = [v.strip() for v in record]
        records.append(dict(zip(header, record)))
    

    You get:

    [{'First Name': 'Moe', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1m'},
     {'First Name': 'Curly', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1c'},
     {'First Name': 'Lary', 'Last Nam': 'Fine', 'Midle Name': '', 'Student ID': 'fine1l'},
     {'First Name': 'Shemp', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1s'},
     {'First Name': 'Joe', 'Last Nam': 'Besser', 'Midle Name': '', 'Student ID': 'besse1j'},
     {'First Name': 'Curly', 'Last Nam': 'DeRita', 'Midle Name': 'Joe', 'Student ID': 'derit1cj'},
     {'First Name': 'Jaba', 'Last Nam': 'Tiure', 'Midle Name': 'Desilijic', 'Student ID': 'tiure1jd'},
     {'First Name': 'Bria', 'Last Nam': 'Tharen', 'Midle Name': '', 'Student ID': 'thare1b'}]
    

    But you can also use a class:

    class Student(object):
        def __init__(self, first_name, middle_name, last_name, student_id):
            self.first_name = first_name
            self.middle_name = middle_name
            self.last_name = last_name
            self.student_id = student_id
        
        def __repr__(self):
            fmt = "<Student('{first_name}', '{middle_name}', '{last_name}', '{student_id}')>"
            return fmt.format(first_name=self.first_name, middle_name=self.middle_name, last_name=self.last_name, student_id=self.student_id)
            
    

    And construct a list of students:

    students = []
    for line in lines:
        record = line[0:8], line[12:21], line[23:34], line[36:46]
        record = [v.strip() for v in record]
        students.append(Student(*record))
    

    You get:

    [<Student('Howard', '', 'Moe', 'howar1m')>,
     <Student('Howard', '', 'Curly', 'howar1c')>,
     <Student('Fine', '', 'Lary', 'fine1l')>,
     <Student('Howard', '', 'Shemp', 'howar1s')>,
     <Student('Besser', '', 'Joe', 'besse1j')>,
     <Student('DeRita', 'Joe', 'Curly', 'derit1cj')>,
     <Student('Tiure', 'Desilijic', 'Jaba', 'tiure1jd')>,
     <Student('Tharen', '', 'Bria', 'thare1b')>]