Title says it - what is the xtensor equivalent of numpy's
# set all elements > 3 to 1
sometensor[sometensor > 3] = 1
?
It looks like xt::filter
works:
xt::filter(sometensor, sometensor > 3) = 1
But it also looks like the numpy version is much faster. I've build xtensor with xsimd, but it doesn't seem to help in this case. Is there a better, more simd-ish way to do it?
EDIT
I found filtration
, which indeed is faster (by about 3x), but still slower than numpy (by about 10x)...
SOLUTION (thx Tom!)
a = xt::where(a > 0.5, 1.0, a);
Is the fastest of all - about 10x faster than filtration
, so it looks like it's simd-d!
xt::filter
appears be a view, which are (currently) not super efficient in xtensor. I would use xt::where
. It might result in a temporary though, which may of may not be the case in NumPy. Since I don't know the details on the the temporary let's do at least some timing:
import numpy as np
from datetime import datetime
a = np.random.random([1000000])
start = datetime.now()
a[a > 0.5] = 1.0
stop = datetime.now()
print((stop - start).microseconds)
On my system around 5000 microseconds.
import numpy as np
from datetime import datetime
a = np.random.random([1000000])
start = datetime.now()
a = np.where(a > 0.5, 1.0, a)
stop = datetime.now()
print((stop - start).microseconds)
On my system about 2500 microseconds.
#include <iostream>
#include <chrono>
#include <xtensor.hpp>
using namespace std;
int main()
{
xt::xtensor<double, 1> a = xt::random::rand<double>({1000000});
auto start = std::chrono::high_resolution_clock::now();
a = xt::where(a > 0.5, 1.0, a);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = duration_cast<std::chrono::microseconds>(stop - start);
cout << duration.count() << endl;
}
On my system between 2500 and 5000 microseconds (much more distributed than for NumPy) with xsimd, and about twice as long without xsimd.
#include <iostream>
#include <chrono>
#include <xtensor.hpp>
using namespace std;
int main()
{
xt::xtensor<double, 1> a = xt::random::rand<double>({1000000});
auto start = std::chrono::high_resolution_clock::now();
xt::filter(a, a > 0.5) = 1.0;
auto stop = std::chrono::high_resolution_clock::now();
auto duration = duration_cast<std::chrono::microseconds>(stop - start);
cout << duration.count() << endl;
}
On my system about 30000 microsconds with and without xsimd.
I use
cmake_minimum_required(VERSION 3.1)
project(Run)
set(CMAKE_BUILD_TYPE Release)
find_package(xtensor REQUIRED)
find_package(xsimd REQUIRED)
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} xtensor xtensor::optimize xtensor::use_xsimd)
without xsimd I omit the last line.
I am running Mac's M1. The timings listed are on Rosetta (i.e. x86
). For native build the timings are: