pythoncython

Speed up string assignments to list in Cython


I used re to group a long string, and put the obtained substrings into a list based on conditions specified by a dict to output. I found a significant improvement on the speed using Cython comparing to Python, but would expect a further improvement. A simple code in temp_test.pyx is:

cimport cython

import re


@cython.wraparound(False)
@cython.boundscheck(False)
def cleave(str long_str, str rules, dict term_char):
    """
    Cleaves a long string into substrings.

    """
    cdef:
        object compiler = re.compile(rules)
        Py_ssize_t i, ns
        Py_ssize_t nk = 0
        Py_ssize_t nq = <ssize_t> len(long_str)
        int p, t
        list split_str = compiler.split(long_str)
        list substrs = nq * [None]
        str s

    ns = <ssize_t> len(split_str)
    for i in range(ns):
        if split_str[i] is not None:
            s = split_str[i]
            p = len(s)
            if p == 1 and s in term_char:
                t = term_char[s]
                if t == 0:
                    substrs[nk] = split_str[i - 1]
                else:
                    substrs[nk] = split_str[i] + split_str[i + 1]
                nk += 1

    return substrs[:nk]

with the test code:

from temp_test import cleave
from typing import Dict

def test_cleave():
    long_str: str = r"ABCEDFRFR"

    rules: str = r"([BD])"
    term_char: Dict[str, int] = {"B": 0, "D": 1}

    subs = cleave(long_str, rules, term_char)
    print(subs)

After using cython -a to annotate the temp_test.pyx, I found nearly all lines were highlighted. Since the cleave will be called large number of times, and there exist lots of assignments of strings from a list (i.e., split_str ) to another list (i.e., substrs) in the function, I found there always exist checks like:

if (unlikely(__pyx_v_split_str == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
    __PYX_ERR(0, 24, __pyx_L1_error)
  }
  __pyx_t_5 = __Pyx_PyList_GET_SIZE(__pyx_v_split_str); if (unlikely(__pyx_t_5 == ((Py_ssize_t)-1))) __PYX_ERR(0, 24, __pyx_L1_error)
  __pyx_v_ns = ((Py_ssize_t)__pyx_t_5);
  1. I think disable those checks can improve the performance. How to achieve this?
  2. I found that @cython.wraparound(False) was highlighted too with lots of checks (too long not show here), is this because I check the key in dict? I.e., s in term_char. Do we have a better alternative?

I'm very often use memoryview, malloc, calloc etc. for numeric operations and calculations, but not familiar with string operations using cython, any suggestions are very appreciated.


Solution

  • There's a few small things I can see which are likely making it worse:

    if p == 1 and s in term_char:
        t = term_char[s]
    

    There you're effective looking up term_char[s] twice. Maybe:

    if p == 1:
        t = term_char.get(s, None)
        if t is None:
             continue
    

    or something like that.


    int p, t
    

    Typing t is probably a pessimization. Essentially the only thing you do with t is to get it out of a dict and compare it with 0.

    If you just left t untyped the getting it out of a dict is free (a tiny bit of ref-counting) and comparing it with 0 is almost certainly pretty cheap (since all Python int(0) are usually the same integer).

    Typing it as int means you get a Python object out of a dict, then do a type-check and extract the integer value. The comparison with 0 might be slightly quicker, but overall it's likely slower.

    (Typing p is probably fine).


    In contrast, what might be worthwhile is to rewrite

    substrs[nk] = split_str[i] + split_str[i + 1]
    

    as

    substrs[nk] = s + <str>(split_str[i + 1])
    

    (i.e. let it know both components are strings). This'll let it go down a slightly accelerated string concatenation route, rather than generic addition.


    if (unlikely(__pyx_v_split_str == Py_None)) {
    

    can be eliminated with @cython.nonecheck(False). Obviously you need to satisfy yourself that that's appropriate.


    I suspect all of the above will be marginal improvements though, and you shouldn't expect any big improvements from this kind of optimization.