numpyxtensor

xtensor equivalent of numpy a[a>3] = 1


Title says it - what is the xtensor equivalent of numpy's

# set all elements > 3 to 1
sometensor[sometensor > 3] = 1 

?

It looks like xt::filter works:

xt::filter(sometensor, sometensor > 3) = 1

But it also looks like the numpy version is much faster. I've build xtensor with xsimd, but it doesn't seem to help in this case. Is there a better, more simd-ish way to do it?

EDIT

I found filtration, which indeed is faster (by about 3x), but still slower than numpy (by about 10x)...

SOLUTION (thx Tom!)

a = xt::where(a > 0.5, 1.0, a);

Is the fastest of all - about 10x faster than filtration, so it looks like it's simd-d!


Solution

  • xt::filter appears be a view, which are (currently) not super efficient in xtensor. I would use xt::where. It might result in a temporary though, which may of may not be the case in NumPy. Since I don't know the details on the the temporary let's do at least some timing:

    1. NumPy indexing:

    import numpy as np 
    from datetime import datetime
    
    a = np.random.random([1000000])
    start = datetime.now()
    a[a > 0.5] = 1.0
    stop = datetime.now()
    print((stop - start).microseconds)
    

    On my system around 5000 microseconds.

    2. NumPy where

    import numpy as np 
    from datetime import datetime
    
    a = np.random.random([1000000])
    start = datetime.now()
    a = np.where(a > 0.5, 1.0, a)
    stop = datetime.now()
    print((stop - start).microseconds)
    

    On my system about 2500 microseconds.

    3. xtensor where

    #include <iostream>
    #include <chrono>
    #include <xtensor.hpp>
    
    using namespace std;
    
    int main() 
    {
        xt::xtensor<double, 1> a = xt::random::rand<double>({1000000});
    
        auto start = std::chrono::high_resolution_clock::now();    
        a = xt::where(a > 0.5, 1.0, a);
        auto stop = std::chrono::high_resolution_clock::now();
        auto duration = duration_cast<std::chrono::microseconds>(stop - start);
        cout << duration.count() << endl;
    }
    

    On my system between 2500 and 5000 microseconds (much more distributed than for NumPy) with xsimd, and about twice as long without xsimd.

    4. xtensor filter

    #include <iostream>
    #include <chrono>
    #include <xtensor.hpp>
    
    using namespace std;
    
    int main() 
    {
        xt::xtensor<double, 1> a = xt::random::rand<double>({1000000});
    
        auto start = std::chrono::high_resolution_clock::now();    
        xt::filter(a, a > 0.5) = 1.0;
        auto stop = std::chrono::high_resolution_clock::now();
        auto duration = duration_cast<std::chrono::microseconds>(stop - start);
        cout << duration.count() << endl;
    }
    

    On my system about 30000 microsconds with and without xsimd.

    Compilation

    I use

    cmake_minimum_required(VERSION 3.1)
    
    project(Run)
    
    set(CMAKE_BUILD_TYPE Release)
    
    find_package(xtensor REQUIRED)
    find_package(xsimd REQUIRED)
    add_executable(${PROJECT_NAME} main.cpp)
    target_link_libraries(${PROJECT_NAME} xtensor xtensor::optimize xtensor::use_xsimd)
    

    without xsimd I omit the last line.

    Rosetta / native

    I am running Mac's M1. The timings listed are on Rosetta (i.e. x86). For native build the timings are:

    1. 4500 microseconds.
    2. 1500 microseconds.
    3. 2000 microseconds with and without xsimd (I think xsimd simply doesn't yet work on that chip!).
    4. 15000 microseconds.