I have read many tutorials and tried a number of minhash LSH, but it cannot generate the similarity matrix, instead it returns just similar data which exceeds the threshold. How can I generate it? My intention is to use the LSH results for clustering.
The whole point of LSH is to avoid pairwise distances, because that does not scale.
If you then put the data into a distance matrix, you get all the scalability problems again!
Instead consider an algorithm like DBSCAN clustering. It doesn't need a distance matrix, only neighbors at distance epsilon.