pythonalgorithmtreedirectory-traversal

python tree directories unique names, tree algorithms


I'm working with terrible system which has implemented directory tree structure and support import of this structure in a "strange way", because import support only format child;parent. Import constraints:
- child items must have unique name
- if parent item is duplicated, child gets imported under first parent found

assume having following structure of directories to be created

root
|-- A
|   |-- aa
|   |   |-- cc
|   |   `-- dd
|   `-- bb
|       `-- ee
`-- B
    |-- aa
    |   |-- cc
    |   `-- dd
    `-- bb
        `-- FF

How would default import look, which does not work or produce wrong representation:

child;parent
root
A;root
aa;A
cc;aa
dd;aa
bb;A
ee;bb
B;root
aa;B   <-- duplicated child item does not work
cc;aa  <-- duplicated entry - system saves it under A instead of under B
dd;aa  <-- duplicated entry - system saves it under A instead of under B
bb;B
FF;bb <-- system saves it under A instead of under B

Wrong representation

root
|-- A
|   |-- aa
|   |   |-- cc
|   |   `-- dd
|   `-- bb
|       |-- FF
|       `-- ee
`-- B
    `-- aa
        |-- cc
        `-- dd

To tackle this problem I decided to rename every folder with unique string =id + additional changes (shorter name, etc to fit system requirements) and imported it into a system, then removed =id via database.
then import pairs looks like:

child;parent
root;
A==1;root=0
aa=2;A=1
cc=3;aa=2
dd=4;aa=2
bb=5;A=1
ee=7;bb=3
B=8;root=0
aa=9;B=8   
cc=10;aa=9 
dd=11;aa=9  
bb=12;B=8
FF=13;bb=12

And the structure is as desired

root=0
|-- A=1
|   |-- aa=2
|   |   |-- cc=3
|   |   `-- dd=4
|   `-- bb=5
|       `-- ee=7
`-- B=8
    |-- aa=9
    |   |-- cc=10
    |   `-- dd=11
    `-- bb=12
        `-- FF=13

However I need to work with need to work with original structure without renaming it.
I had an idea that I could keep the structure just in memory using tree data structure, but I got stuck in implementation.

I wanted to use os.walk('root') and treelib but I need help implementing this.

All tips highly appreciated. Thank you


Solution

  • I found the way how to have references for original and renamed directories

    I used the treelib because every node can contain tag, identifier, data, and extended treelib.Tree by method create_from_path.
    for this path: root/A/bb/cc/dd

    Then split the given path root/A/bb/cc/dd into portions and visiting every node + updating if already exists: using this template:
    treelib.Node(tag=unique_name, identifier=full_path, data=base_name)

    root -> Node(root=1, root, root)
    root/A -> Node(A=2, root/A, A)
    root/A/bb -> Node(bb=3, root/A/bb, bb)
    root/A/bb/cc-> Node(cc=4, root/A/bb/cc, cc)
    root/A/bb/cc/dd -> Node(dd=5, root/A/bb/cc/dd, dd)

    Then I could traverse through the tree and build desired paths like: root=1/A=2/bb=3/cc=4/dd=5