pythondjangopostgresqlsearchunicode

Django Search Query to Match Names With and Without Accents in PostgreSQL


I'm using Django with a PostgreSQL database. In my database, I have a Users table with a name column that stores names with special characters, such as accents and the letter "Ñ". When performing a search, I want the results to include entries with and without accents. For example, if I have users with the names "lúlú" and "lulu", I would like both to appear in the search results, regardless of whether the search term includes accents.

Here's my current code for the search functionality:

def autocomplete(request):
result = []
try:
    if request.user.is_staff and request.user.is_active:
        columns = request.GET.get("column_name").split(",")
        value = request.GET.get("column_value")
        columns = [
            ("{}__icontains".format(column), request.GET.get("term"))
            for column in columns
        ]
        filters = request.GET.get("filters", [])
        if filters:
            filters = filters.split(",")
            filters = [tuple(x.split("=")) for x in filters]
        queryset = Filter(
            app_label=request.GET.get("app_label"),
            model_name=request.GET.get("model"),
        ).filter_by_list(columns, operator.or_, filters)
        for q in queryset:
            result.append(
                {"obj": q.to_json(), "label": str(q), "value": q.to_json()[value]}
            )
except AttributeError:
    pass
return HttpResponse(json.dumps(result, cls=Codec), content_type="application/json")

How can I modify my search to ignore accents and special characters so that a search for "lulu" also matches "lúlú" and vice versa? Are there any recommendations for handling accent-insensitive searches in Django with PostgreSQL?


Solution

  • I believe you would have to write a helper function yourself, here is a script that will give you a direction:

    words = [
        "Águeda",
        "Ålborg",
        "León",
        "Cancún",
        "Coimbra",
        "São Paulo",
        "Jönköping",
        "München",
        "Nykøbing Mors",
    ]
    
    char_map = {
        "a": ["ã"],
        "A": ["Å", "Á"],
        "o": ["ó", "ö", "ø"],
        "u": ["ü", "ú"]
    }
    
    
    def find_chr(value):
        """
        Function to find a correspondent character
        in the map dict. Note that it returns original
        value if not found.
        """
        for key, chr_list in char_map.items():
            if value in chr_list:
                return key
    
        return value
    
    
    def mutate(value):
        """
        Function that find if a word has a
        mutation. To increase performance
        the character verification is narrowed
        down to latin alphabet.
        """
        word = ""
    
        for chr in value:
            code_point = ord(chr)
            if (
                code_point >= ord("a")
                and code_point <= ord("z")
                or code_point >= ord("A")
                and code_point <= ord("Z")
            ):
                word += chr
            else:
                word += find_chr(chr)
    
        has_mutation = False if value == word else True
    
        return word, has_mutation
    
    
    for w in words:
        search_array = [w]
    
        [mutation, has_mutation] = mutate(w)
        if has_mutation:
            search_array.append(mutation)
    
        print(search_array)
    
    

    The result is the search array that will be at the __in lookup.