I am working to implement a code which was written in MATLAB into C++.
In MATLAB you can slice an Array with another array, like A(B), which results in a new array of the elements of A at the indexes specified by the values of the element in B.
I would like to do a similar thing in C++ using vectors. These vectors are of size 10000-40000 elements of type double.
I want to be able to slice these vectors using another vector of type int containing the indexes to be sliced.
For example, I have a vector v = <1.0, 3.0, 5.0, 2.0, 8.0> and a vector w = <0, 3, 2>. I want to slice v using w such that the outcome of the slice is a new vector (since the old vector must remain unchanged) x = <1.0, 2.0, 5.0>.
I came up with a function to do this:
template<typename T>
std::vector<T> slice(std::vector<T>& v, std::vector<int>& id) {
std::vector<T> tmp;
tmp.reserve(id.size());
for (auto& i : id) {
tmp.emplace_back(v[i]);
}
return tmp;
}
I was wondering if there was potentially a more efficient way to do such a task. Speed is the key here since this slice function will be in a for-loop which has approximately 300000 iterations. I heard the boost library might contain some valid solutions, but I have not had experience yet with it.
I used the chrono library to measure the time it takes to call this slice function, where the vector to be sliced was length 37520 and the vector containing the indexes was size 1550. For a single call of this function, the time elapsed = 0.0004284s. However, over ~300000 for-loop iterations, the total elapsed time was 134s.
Any advice would be much appreicated!
emplace_back
has some overhead as it involves some internal accounting inside std::vector
. Try this instead:
template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {
std::vector<T> tmp;
tmp.resize (id.size ());
size_t n = 0;
for (auto i : id) {
tmp [n++] = v [i];
}
return tmp;
}
Also, I removed an unnecessary dereference in your inner loop.
Edit: I thought about this some more, and inspired by @jack's answer, I think that the inner loop (which is the one that counts) can be optimised further. The idea is to put everything used by the loop in local variables, which gives the compiler the best chance to optimise the code. So try this, see what timings you get. Make sure that you test a Release / optimised build:
template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {
size_t id_size = id.size ();
std::vector<T> tmp (id_size);
T *tmp_data = tmp.data ();
const int *id_data = id.data ();
const T* v_data = v.data ();
for (size_t i = 0; i < id_size; ++i) {
tmp_data [i] = v_data [id_data [i]];
}
return tmp;
}