pythonpandasdatabaseastropycross-match

How to cross match with python 2 dataframes by (Cartesian) coordinates?


I have 2 astronomical catalogues, containing galaxies with their respective sky coordinates (ra, dec). I handle the catalogues as data frames. The catalogs are from different observational surveys and there are some galaxies that appear in both catalogs. I want to cross match these galaxies and put them in a new catalog. How can I do this is with python? I taught there should be some easy way with numpy, pandas, astropy or another package, but I couldn't find a solution? Thx


Solution

  • After a lot of research the easiest way I have found is by using a package called astroml, here a tutorial. Notebooks I have used it in are called cross_math_data_and_colour_cuts_.ipynb and PS_data_cleaning_and_processing.ipynb.

    from astroML.crossmatch import crossmatch_angular
    # if you are using google colab use first the line "!pip install astroml"
    
    df_1 = pd.read_csv('catalog_1.csv')
    df_2 = pd.read_csv('catalog_2.csv')
    
    # crossmatch catalogs
    max_radius = 1. / 3600  # 1 arcsec
    # note, that for the below to work the first 2 columns of the catalogs should be ra, dec
    # also, df_1 should be the longer of the 2 catalogs, else there will be index errors
    dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
    match = ~np.isinf(dist)
    # THE DESIRED SOLUTION IS THEN:
    df_crossed = df_1[match]
    
    
    # ALTERNATIVELY:
    # ind contains the indices of the cross-matched galaxies in respect to the second catalog,
    # when there is no match it the kind value is the length of the first catalog
    # so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
    df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in mind]
    # that way whenever you have a match 'new_var' will contain the correct value from 'old_var'
    # and whenever you have a mismatch it will contain -999 as a flag
    

    If one is in the convenient position of having in both dataframes not only coordinates, but matching IDs of the sources, then one can easily crossmatch with the pandas .merge() function. Let's say we have in df_1 the columns 'ID', 'ra', 'dec', 'object_class' and in df_2 we have 'ID', 'ra', 'dec', 'r_mag', then we can crossmatch with

    df_crossed = pd.merge(df_1, df_2, on='ID')
    

    By default this will do an inner cross-match (see for more details here). The resulting df_crossed will have the columns 'ID', 'ra', 'dec', 'object_class', 'r_mag'.

    You can also easily crossmatch on multiple columns, e.g. you can crossmatch on 'ID', 'ra', 'dec', by writing:

    df_crossed = pd.merge(df_1, df_2, on=['ID', 'ra', 'dec'])