In python I need to create 43 instances of a class 'Student' that includes the variables first_name, middle_name, last_name, student_id by reading in a file (Students.txt) and parsing it. The text file appears like this:
Last Name Midle Name First Name Student ID
----------------------------------------------
Howard Moe howar1m
Howard Curly howar1c
Fine Lary fine1l
Howard Shemp howar1s
Besser Joe besse1j
DeRita Joe Curly derit1cj
Tiure Desilijic Jaba tiure1jd
Tharen Bria thare1b
Tai Besadii Durga tai1db
Hego Damask hego1d
Lannister Tyrion lanni1t
Stark Arya stark1a
Clegane Sandor clega1s
Targaryen Daenerys targa1d
Bombadil Tom bomba1t
Brandybuck Meriadoc brand1m
Took Pregrin took1p
McCoy Leonard mccoy1l
Scott Montgomery scott1m
Crusher Wesley crush1w
Montoya Inigo monto1i
Rugen Tyrone rugen1t
Solo Han solo1h
Corey Carl corey1c
Flaumel Evelyn flaum1e
Taltos Vlad talto1v
e'Drien Morrolan edrie1m
Watson John watso1j
McCoy Ebenezar mccoy1e
Carpenter Molly carpe1m
Graystone Zoe grays1z
Adama William adama1w
Adama Joseph Leland adama1l
Roslin Laura rosli1l
Baltar Gaius balta1g
Tigh Ellen tigh1e
Tigh Saul tigh1s
Cottle Sherman cottl1s
Zarek Thomas zarek1t
Murphy James Alexander murph1a
Sobchak Walter sobch1w
Dane Alexander dane1a
Gruber Hans grube1h
Biggs John Gil biggs1gj
The class student is:
class Student (object):
def __init__(self, first_name, middle_name, last_name, student_id):
self.__first_name = first_name
self.__middle_name = middle_name
self.__last_name = last_name
self.__student_id = student_id
What would be the easiest way to read into 'Students.txt' and create each instance of student?
To read the file content, use io.open
. Don't forget to specify the file encoding if any name has accentuated characters.
with io.open('students.txt', mode="r", encoding="utf8") as fd:
content = fd.read()
Here, you read the whole content and store it in memory (amount of data is small). You can also use an iterator.
Then, you can split the content line by line with str.splitlines()
:
lines = content.splitlines()
# print(lines)
You get something like:
['Last Name Midle Name First Name Student ID ',
'----------------------------------------------',
'Howard Moe howar1m ',
'Howard Curly howar1c ',
'Fine Lary fine1l ',
'Howard Shemp howar1s ',
'Besser Joe besse1j ',
'DeRita Joe Curly derit1cj ',
'Tiure Desilijic Jaba tiure1jd ',
'Tharen Bria thare1b ']
You have (nearly) fixed-length lines, so you can use slices to extract the fields.
Here is what you can do for the header:
header = lines.pop(0)
fields = header[0:8], header[11:21], header[23:33], header[36:46]
# print(fields)
You get:
('Last Nam', 'Midle Name', 'First Name', 'Student ID')
You can drop the line of hyphens:
lines.pop(0)
For each line, you can extract values using slices too. Note: slice indices are slightly different:
for line in lines:
record = line[0:8], line[12:21], line[23:34], line[36:46]
# print(record)
You'll get values with trailing space:
('Howard ', ' ', ' Moe ', 'howar1m ')
('Howard ', ' ', ' Curly ', 'howar1c ')
('Fine ', ' ', ' Lary ', 'fine1l ')
('Howard ', ' ', ' Shemp ', 'howar1s ')
('Besser ', ' ', ' Joe ', 'besse1j ')
('DeRita ', 'Joe ', ' Curly ', 'derit1cj ')
('Tiure ', 'Desilijic', ' Jaba ', 'tiure1jd ')
('Tharen ', ' ', ' Bria ', 'thare1b ')
To avoid trailing spaces, use str.strip()
function:
for line in lines:
record = line[0:8], line[12:21], line[23:34], line[36:46]
record = [v.strip() for v in record]
# print(record)
You get:
['Howard', '', 'Moe', 'howar1m']
['Howard', '', 'Curly', 'howar1c']
['Fine', '', 'Lary', 'fine1l']
['Howard', '', 'Shemp', 'howar1s']
['Besser', '', 'Joe', 'besse1j']
['DeRita', 'Joe', 'Curly', 'derit1cj']
['Tiure', 'Desilijic', 'Jaba', 'tiure1jd']
['Tharen', '', 'Bria', 'thare1b']
At this point, I recommend you to store your record as a dict
in a list:
records = []
for line in lines:
record = line[0:8], line[12:21], line[23:34], line[36:46]
record = [v.strip() for v in record]
records.append(dict(zip(header, record)))
You get:
[{'First Name': 'Moe', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1m'},
{'First Name': 'Curly', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1c'},
{'First Name': 'Lary', 'Last Nam': 'Fine', 'Midle Name': '', 'Student ID': 'fine1l'},
{'First Name': 'Shemp', 'Last Nam': 'Howard', 'Midle Name': '', 'Student ID': 'howar1s'},
{'First Name': 'Joe', 'Last Nam': 'Besser', 'Midle Name': '', 'Student ID': 'besse1j'},
{'First Name': 'Curly', 'Last Nam': 'DeRita', 'Midle Name': 'Joe', 'Student ID': 'derit1cj'},
{'First Name': 'Jaba', 'Last Nam': 'Tiure', 'Midle Name': 'Desilijic', 'Student ID': 'tiure1jd'},
{'First Name': 'Bria', 'Last Nam': 'Tharen', 'Midle Name': '', 'Student ID': 'thare1b'}]
But you can also use a class:
class Student(object):
def __init__(self, first_name, middle_name, last_name, student_id):
self.first_name = first_name
self.middle_name = middle_name
self.last_name = last_name
self.student_id = student_id
def __repr__(self):
fmt = "<Student('{first_name}', '{middle_name}', '{last_name}', '{student_id}')>"
return fmt.format(first_name=self.first_name, middle_name=self.middle_name, last_name=self.last_name, student_id=self.student_id)
And construct a list of students:
students = []
for line in lines:
record = line[0:8], line[12:21], line[23:34], line[36:46]
record = [v.strip() for v in record]
students.append(Student(*record))
You get:
[<Student('Howard', '', 'Moe', 'howar1m')>,
<Student('Howard', '', 'Curly', 'howar1c')>,
<Student('Fine', '', 'Lary', 'fine1l')>,
<Student('Howard', '', 'Shemp', 'howar1s')>,
<Student('Besser', '', 'Joe', 'besse1j')>,
<Student('DeRita', 'Joe', 'Curly', 'derit1cj')>,
<Student('Tiure', 'Desilijic', 'Jaba', 'tiure1jd')>,
<Student('Tharen', '', 'Bria', 'thare1b')>]