How to organise DataFrame columns

I am trying to organise DataFrame columns based on the specific rules, but I don't know the way.

For example, I have a DataFrame related to chemistry as shown below. Each row shows the number of chemical bonds in a chemical compound.

   OH  HO  CaO  OCa  OO  NaMg  MgNa
0   2   3    2    0   1     1     1
1   0   2    3    4   5     2     0
2   1   2    3    0   0     0     0

In chemistry, OH (Oxygen-Hydrogen) bond is equal to HO (Hydrogen-Oxygen) bond and CaO (Calcium-Oxygen) bond is equal to OCa (Oxygen-Calcium) bond in the meaning. Thus, I'd like to organise the DataFrame as shown below.

   OH  CaO  OO  NaMg 
0   5    2   1     2
1   2    7   9     2
2   3    3   0     0

I’m struggling because:

there are a variety of chemical bonds in my real DataFrame, so it is impossible to organise the information one by one (The number of columns is more than 3,000 and I don't know which kinds of chemical bonds exist and are duplicates.)
the number of letters depends on each element symbol and some symbols include lowercase (e.g. Hydrogen: H (one letter and only uppercase), Calcium: Ca (Two letters and uppercase & lowercase)

I looked for the same question online and wrote codes by myself, but I was not able to find the way. I would like to know the codes which solve my problem.

Solution

You can use str.findall to extract individual element ~~and use frozenset~~ and sort individual elements to reorganize the pairs. Using frozenset is not a good solution because for OO, the second will be lost.

Now you can group by this sets and apply sum:

# Modified from https://www.johndcook.com/blog/2016/02/04/regular-expression-to-match-a-chemical-element/
pat = r'(A[cglmrstu]|B[aehikr]?|C[adeflmnorsu]?|D[bsy]|E[rsu]|F[elmr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airuv]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|U(?:u[opst])?|V|W|Xe|Yb?|Z[nr])'

grp = df.columns.str.findall(pat).map(lambda x: tuple(sorted(x))))
out = df.groupby(grp, axis=1).sum().rename(columns=''.join)

Output:

>>> out
   CaO  HO  MgNa  OO
0    2   5     2   1
1    7   2     2   5
2    3   3     0   0