I'm performing some MD simulations involving systems of millions of atoms.
I have written some code to generate a file which is just a listing of XYZ atom coordinates. Now I need to generate bonds between the atoms. If two atoms are within a certain distance of each other, that is considered a bond.
Example XYZ file:
1 0 0
2 0 0
7 0 0
10 0 0
9 0 0
So I've got five atoms. If my distance threshold is 2 units, then my bond listing will be:
1 2
3 5
4 5
(where the numbers correspond to the index of the coordinate in the XYZ file).
The naive approach to generate this list is just:
for i = 1:numAtoms
for j = i+1:numAtoms
if distance(atom[i], atom[j]) < 2
bonds.push [i, j]
However, this quickly hits an algorithmic limit and is slow even in highly optimized C for millions of atoms, at least for as frequently as I'll be doing this process.
The only experience I have with space-partitioning data structures is with kd-trees when I wrote a photon mapper once, so I don't really know what the best solution to this problem is. I'm sure there's probably something out there that's optimal for this though.
I should also mention that my simulation box is periodic, meaning that an atom at (0.5, 0, 0) will bond to an atom at (boxWidth - 0.5, 0, 0) with a distance threshold such as 2.
Simple solutions are the first to try. They are quick to code, and easy to test. If that does not give you the required performance, then you can try something trickier.
So, you can seriously prune the search space by assigning grid co-ordinates to your atoms. Nothing technical. Like a poor-man's octree...
All you need to do is have a grid size of 2, and search all atoms within the local grid and each neighbouring grid. Obviously the grid is 3D. An atom's grid location is nothing more complex than dividing each its co-ordinates by the grid size.
Obviously you do a preliminary pass and create a list (or vector) of atoms belonging to each cell. You can store the lists in a map
indexed by the 3D grid co-ordinate. Then for each atom, you can just lookup the local lists and do the bond tests.
Also, don't use square-root for your distance. Operate with distance squared instead. This will save a bucket-load of processing.