pythonfunctionreturn-valuereturn-typelsh

Function returning same variable separated by a comma


I don't understand the point of this function returning two variables, which are the same:

def construct_shingles(doc,k,h):
    #print 'antes -> ',doc,len(doc)
    doc = doc.lower()
    doc = ''.join(doc.split(' '))
    #print 'depois -> ',doc,len(doc)
    shingles = {}
    for i in xrange(len(doc)):
        substr = ''.join(doc[i:i+k])
        if len(substr) == k and substr not in shingles:
            shingles[substr] = 1

    if not h:
        return doc,shingles.keys()

    ret = tuple(shingles_hashed(shingles))

    return ret,ret

Seems redundant, but there must be a good reason for it, I just don't see why. Perhaps because there are two return statements? If 'h' is true, does it return both return statements? The calling functions look like:

def construct_set_shingles(docs,k,h=False):
    shingles = []
    for i in xrange(len(docs)):
        doc = docs[i]
        doc,sh = construct_shingles(doc,k,h)
        docs[i] = doc
        shingles.append(sh)
    return docs,shingles

and

def shingles_hashed(shingles):
    global len_buckets
    global hash_table
    shingles_hashed = []
    for substr in shingles:
        key = hash(substr)
        shingles_hashed.append(key)
        hash_table[key].append(substr)
    return shingles_hashed

The data set and function call look like:

k = 3 #number of shingles

d0 = "i know you"
d1 = "i think i met you"
d2 = "i did that"
d3 = "i did it"
d4 = "she says she knows you"
d5 = "know you personally"
d6 = "i think i know you"
d7 = "i know you personally"

docs = [d0,d1,d2,d3,d4,d5,d6,d7]
docsChange,shingles = construct_set_shingles(docs[:],k)

The github location: lsh/LHS


Solution

  • Your guess is correct, and regarding why return ret,ret, the answer is that return statement is meant to return a pair of equalling values rather than one.

    It is more of a style of coding rather than algorithm, because this can be done by other syntaxes. However this one is advantageous in some cases, e.g. if we write

    def func(x, y, z):
        ...
        return ret
    
    a = func(x, y, z)
    b = func(x, y, z)
    

    then func would be executed twice. But if:

    def func(x, y, z):
        ...
        return ret, ret
    
    a, b = func(x, y, z)
    

    then func can be executed only once while being able to return to both a and b

    Also in your particular case:

    If h is false then the program until executes until the line return doc,shingles.keys(), and then the variables doc and sh in construct_set_shingles respectively take values of doc and shingles.keys().

    Otherwise, the first return statement is omitted, the second one is executed and then both doc and sh take equal values, particularly equalling to the value of tuple(shingles_hashed(shingles))