I'm working with terrible system which has implemented directory tree structure and support import of this structure in a "strange way", because
import support only format child;parent
.
Import constraints:
- child items must have unique name
- if parent item is duplicated, child gets imported under first parent found
assume having following structure of directories to be created
root
|-- A
| |-- aa
| | |-- cc
| | `-- dd
| `-- bb
| `-- ee
`-- B
|-- aa
| |-- cc
| `-- dd
`-- bb
`-- FF
How would default import look, which does not work or produce wrong representation:
child;parent
root
A;root
aa;A
cc;aa
dd;aa
bb;A
ee;bb
B;root
aa;B <-- duplicated child item does not work
cc;aa <-- duplicated entry - system saves it under A instead of under B
dd;aa <-- duplicated entry - system saves it under A instead of under B
bb;B
FF;bb <-- system saves it under A instead of under B
Wrong representation
root
|-- A
| |-- aa
| | |-- cc
| | `-- dd
| `-- bb
| |-- FF
| `-- ee
`-- B
`-- aa
|-- cc
`-- dd
To tackle this problem I decided to rename every folder with unique string =id
+ additional changes (shorter name, etc to fit system requirements)
and imported it into a system, then removed =id
via database.
then import pairs looks like:
child;parent
root;
A==1;root=0
aa=2;A=1
cc=3;aa=2
dd=4;aa=2
bb=5;A=1
ee=7;bb=3
B=8;root=0
aa=9;B=8
cc=10;aa=9
dd=11;aa=9
bb=12;B=8
FF=13;bb=12
And the structure is as desired
root=0
|-- A=1
| |-- aa=2
| | |-- cc=3
| | `-- dd=4
| `-- bb=5
| `-- ee=7
`-- B=8
|-- aa=9
| |-- cc=10
| `-- dd=11
`-- bb=12
`-- FF=13
However I need to work with need to work with original structure without renaming it.
I had an idea that I could keep the structure just in memory using tree data structure, but I got stuck in implementation.
I wanted to use os.walk('root')
and treelib
but I need help implementing this.
All tips highly appreciated. Thank you
I found the way how to have references for original and renamed directories
I used the treelib
because every node
can contain tag, identifier, data
,
and extended treelib.Tree
by method create_from_path
.
for this path: root/A/bb/cc/dd
Then split the given path root/A/bb/cc/dd
into portions and visiting every node + updating if already exists:
using this template:
treelib.Node(tag=unique_name, identifier=full_path, data=base_name)
root
-> Node(root=1, root, root)
root/A
-> Node(A=2, root/A, A)
root/A/bb
-> Node(bb=3, root/A/bb, bb)
root/A/bb/cc
-> Node(cc=4, root/A/bb/cc, cc)
root/A/bb/cc/dd
-> Node(dd=5, root/A/bb/cc/dd, dd)
Then I could traverse through the tree and build desired paths like:
root=1/A=2/bb=3/cc=4/dd=5