pythonstring-matchingpython-dedupe

Python3 match, reverse match and dedupe


The intention of the code below is to process the two dictionaries and add matching symbol values from each dictionary to the pairs list if the value contains the item in cur but not if the value contains either item in the curpair list.

I'm successful with the value matching cur but I can't figure out how to do the reverse match against the items in curpair. Also, a secondary issue is that it seems to create duplicates, likely because of the additional for loop to compare against the items in curpair. Either way, I'm not sure if there's a way to dedupe in-line or if that needs to be another routine.

I'm sure there may be a way to do all of this, and simplify the code at the same time, with list comprehension, but maybe not. My trying to understand list comprehension only results in reassuring me that my Python experience is far too brief to be able to make sense of that yet :)

Grateful for any insights.

cur='EUR'
curpair=['BUSD', 'USDT']

def get_pairs(tickers):
    pairs = []
    for entry in tickers:
        if cur in entry['symbol']:
            for cp in curpair:
                if cp not in entry['symbol']:
                    pairs.append(entry['symbol'])
    return pairs

# d1 and d2 @ https://pastebin.com/NfNAeqD4
spot_pairs_list = get_pairs(d1)
margin_pairs_list = get_pairs(d2)
print(f"from d1: {spot_pairs_list}")
print(f"from d2: {margin_pairs_list}")

Output:

from d1: ['BTCEUR', 'BTCEUR', 'ETHEUR', 'ETHEUR', 'BNBEUR', 'BNBEUR', 'XRPEUR', 'XRPEUR', 'EURBUSD', 'EURUSDT', 'SXPEUR', 'SXPEUR', 'LINKEUR', 'LINKEUR', 'DOTEUR', 'DOTEUR', 'LTCEUR', 'LTCEUR', 'ADAEUR', 'ADAEUR', 'BCHEUR', 'BCHEUR', 'YFIEUR', 'YFIEUR', 'XLMEUR', 'XLMEUR', 'GRTEUR', 'GRTEUR', 'EOSEUR', 'EOSEUR', 'DOGEEUR', 'DOGEEUR', 'EGLDEUR', 'EGLDEUR', 'AVAXEUR', 'AVAXEUR', 'UNIEUR', 'UNIEUR', 'CHZEUR', 'CHZEUR', 'ENJEUR', 'ENJEUR', 'MATICEUR', 'MATICEUR', 'LUNAEUR', 'LUNAEUR', 'THETAEUR', 'THETAEUR', 'BTTEUR', 'BTTEUR', 'HOTEUR', 'HOTEUR', 'WINEUR', 'WINEUR', 'VETEUR', 'VETEUR', 'WRXEUR', 'WRXEUR', 'TRXEUR', 'TRXEUR', 'SHIBEUR', 'SHIBEUR', 'ETCEUR', 'ETCEUR', 'SOLEUR', 'SOLEUR', 'ICPEUR', 'ICPEUR']
from d2: ['ADAEUR', 'ADAEUR', 'BCHEUR', 'BCHEUR', 'BNBEUR', 'BNBEUR', 'BTCEUR', 'BTCEUR', 'DOTEUR', 'DOTEUR', 'ETHEUR', 'ETHEUR', 'EURBUSD', 'EURUSDT', 'LINKEUR', 'LINKEUR', 'LTCEUR', 'LTCEUR', 'SXPEUR', 'SXPEUR', 'XLMEUR', 'XLMEUR', 'XRPEUR', 'XRPEUR', 'YFIEUR', 'YFIEUR']

Solution

  • The problem with double values can easily be solved by using set instead of list.

    As for the other problem, this loop isn't doing the right thing:

    for cp in curpair:
        if cp not in entry['symbol']:
            pairs.append(entry['symbol'])
    

    This will append the symbol to the list if any of the elements in curpair is missing. For example, if the first cp is not in symbol, it's accepted even if the second element is in symbol. But it seems that you want to include only symbols that include none of the elements in curpair.

    In other words, you only want to append if cp in symbol is False for all cp.

    This, indeed, can easily be done with list comprehensions:

    def get_pairs(tickers):
        pairs = set() # set instead of list
        
        for entry in tickers:
            symbol = entry['symbol']
            if cur in symbol and not any([cp in symbol for cp in curpair]):
                pairs.add(symbol) # note it's 'add' for sets, not append
        
        return pairs
    

    [cp in symbol for cp in curpair] is the same as this (deliberately verbose) loop:

    cp_check = []
    
    for cp in curpair:
        if cp in curpair:
            cp_check.append(True)
        else:
            cp_check.append(False)
    

    So you will get a list of True and False values. any() returns True if any of the list elements are True, i.e., it basically does the opposite of what you want. Hence we need to reverse its truth value with not, which will give you True if all of the list elements are False, exactly what we need.