I need to evaluate my model's performance with limited training data. I am randomly selecting p of original training data. Assume p is 0.2 in this case. Here is some intil lines of code:
p = p*100
data_samples = (data.shape[0] * p)/100 # data.shape= (100, 50, 50, 3)
# for randomly selecting data
import random
random.seed(1234)
filter_indices=[random.randrange(0, data.shape[0]) for _ in range(data_samples)]
Its giving me total filter indices randomly ranging between 0 and total data size.
Now, I want to get those samples of indices from the 'data' that are equivalent to filter_indices but include all dimensions. How can I do that effectively and effeciently?
You can use numpy's integer array indexing to use your generated list of indices directly as index. When used on its own, the trailing dimensions will automatically be tacked on to the result! Smaller example:
import numpy as np
# Your data goes here
data = np.arange(90).reshape(10, 3, 3)
N = data.shape[0]
p = 0.2
# Generating random indices
n_samples = int(N * p)
np.random.seed(0)
filter_indices = np.random.choice(N, size=n_samples)
# Indexing magic:
out = data[filter_indices]
Note above that I've used numpy's built-in random module to streamline your code a little bit via np.random.choice
.
Results:
>>> filter_indices
array([5, 0])
>>> out
array([[[45, 46, 47],
[48, 49, 50],
[51, 52, 53]],
[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]]])
>>> out.shape
(2, 3, 3)
out
is exactly the 2 shape (3, 3)
subarrays in data
at indices 5 and 0. So the result has shape (2, 3, 3)
instead of (10, 3, 3)
.