I'm currently filtering a Java List to keep approximately a percentage of its elements using a random approach:
import java.util.List;
import java.util.Random;
public class Main {
public static void main(String[] args) {
List<String> list = List.of("1", "2", "3", "4", "5", "6", "7", "8", "9", "10");
// Percentage of elements to retain (between 0 and 100)
int percentageToRetain = 30; // Change this value to adjust the percentage
Random random = new Random();
List<String> result = list.stream()
.filter(s -> random.nextInt(100) < percentageToRetain)
.toList();
System.out.println(result);
}
}
While this works for approximating the percentage, each run produces different results, and the actual percentage retained can vary significantly from my target, especially with smaller lists.
For example: When targeting 30% of a 10-element list, I might get 2 elements (20%) on one run and 4 elements (40%) on another.
My question: How can I modify my code to consistently retain exactly the specified percentage of elements from a List? Ideally, I'd like a solution that:
Keeps exactly percentageToRetain percent of the elements (rounding on the lower as needed)
Still provides randomness in which specific elements are selected
Works efficiently for both small and large lists
How to handle this kind of "exact percentage sampling" problem?
@Marce Puente correctly points outin the question comments that retaining an exact percentage of elements is impossible, at least for smaller list sizes.
the idea of “exact percentage” is already wrong, imagine such a list: [ 10, 5, 1 ] and you want to keep it to *50%, what would you put, an item and a half, no, you have to set a tolerance, which will be determined by the number of item's.
For larger input sizes the rounding error decreases.
Another problem, as the busybee pointed out, is that discarding elements with a chance is not guranteed to return a stable amount of elements.
To illustrate that, here are 10 sets of random numbers between 0 and 100 (I noted the number of elements that would be retained given percentageToRetain = 30
in brackets):
0 64 93 11 77 13 82 87 8 68 (4 retains)
77 60 39 97 64 46 33 42 87 79 (0 retains)
26 91 0 32 49 31 15 7 2 7 (5 retains)
95 90 38 10 70 37 32 17 82 98 (2 retains)
11 43 96 58 10 40 30 4 40 54 (3 retains)
2 66 86 87 17 88 92 64 10 79 (4 retains)
3 86 75 51 30 37 2 96 49 89 (2 retains)
9 10 23 84 52 28 32 14 12 33 (5 retains)
53 91 54 66 28 61 67 35 45 70 (1 retains)
3 77 79 24 37 60 6 80 8 39 (4 retains)
int numberOfElementsToRetain = (int) (source.size() * percentageToRetain / 100.0);
numberOfElementsToRetain
elements to a new list & return that.Thus we get:
public static <E> List<E> retainNumberOfElementsRandomly2(List<E> source, int percentageToRetain){
int numberOfElementsToRetain = (int) (source.size() * percentageToRetain / 100.0);
ArrayList<E> defcopy = new ArrayList<>(source);
Random r = new Random();
Collections.shuffle(defcopy, r);
List<E> result = new ArrayList<>(numberOfElementsToRetain);
for(int i = 0; i < numberOfElementsToRetain; i++){
result.add(defcopy.get(i));
}
return result;
}