
Python working with complex data structure - random acces

Looking for advice on the best approach.

I'm working with a text file that is colon delimited, with 4 columns:


This file maps a user to a printer for a specific type of document.

I need to read and potentially manipulate this file.

When reading the file I want to be able to answer different questions:

  1. List all printers for a specific user
  2. List all users that use a specific printer
  3. List all users that have have a specific document
  4. Does "this" user exist in the file with "this" printer and "this" document

So the access is somewhat random, ie there is no single query.

My current attempt is with nested dictionaries:

mydict[user][printer] = [list of documents]

I'm looking for a cleaner way to do this.

My current thinking is to use dataclass and create an instance of every record. But how do I do efficiently query these as per my examples above?

Thanks for reading, hope you can guide me.


  • pandas is made for such analyses.

    import pandas as pd # pip install pandas
    df = pd.read_csv("path_to_your_file.txt", 
                     names=['User', 'Company', 'Doctype', 'Printer'])
    1. List all printers for a specific user
    >>> df[df.User == "user1"].Printer
    0    printer1
    1    printer2
    2    printer3
    3    printer4
    Name: Printer, dtype: object
    1. List all users that use a specific printer
    >>> df[df.Printer == "printer1"].User
    0    user1
    7    user2
    Name: User, dtype: object
    1. List all users that have have a specific document
    >>> df[df.Doctype == "PURCHASE"].User
    2     user1
    6     user2
    10    user3
    Name: User, dtype: object
    1. Does "this" user exist in the file with "this" printer and "this" document? (In this case: Nope.)
    >>> df[(df.User == "user1") & (df.Doctype == "PURCHASE") & (df.Printer == "printer2")]
    Empty DataFrame
    Columns: [User, Company, Doctype, Printer]
    Index: []

    (Note the obligatory(!) parentheses around each condition and usage of & - not and - in the last example. That's a major source of errors for pandas beginners.)