I understand that median of medians algorithm(I will denote as MoM) is a high constant factor O(N) algorithm. It finds the medians of k-groups(usually 5) and uses them as the next iteration's sets to find medians of. The pivot after finding this will be between 3/10n and 7/10n of the original set, where n is the number of iterations it took to find the one median base case.
I keep getting a segmentation fault when I run this code for MoM, but I'm not sure why. I've debugged it and believe that the issue lies with the fact that I'm calling medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
. However, I thought that this was logically sound since we were supposed to recursively find the median by calling itself. Perhaps my base case isn't correct? In a tutorial by YogiBearian on youtube(a stanford professor, link: https://www.youtube.com/watch?v=YU1HfMiJzwg ), he did not state any extra base case to take care of the O(N/5) operation of recursion in MoM.
Note: Per suggestions, I have added a base case and used .at() function by vectors.
static const int GROUP_SIZE = 5;
/* Helper function for m of m. This function divides the array into chunks of 5
* and finds the median of each group and puts it into a vector to return.
* The last group will be sorted and the median will be found despite its uneven size.
*/
vector<int> findMedians(vector<int>& vec, int start, int end){
vector<int> medians;
for(int i = start; i <= end; i+= GROUP_SIZE){
std::sort(vec.begin()+i, min(vec.begin()+i+GROUP_SIZE, vec.end()));
medians.push_back(vec.at(min(i + (GROUP_SIZE/2), (i + end)/2)));
}
return medians;
}
/* Job is to partition the array into chunks of 5(subject to change via const)
* And then find the median of them. Do this recursively using select as well.
*/
int medianOfMedian(vector<int>& vec, int start, int end, int k){
/* Acquire the medians of the 5-groups */
vector<int> medians = findMedians(vec, start, end);
/* Find the median of this */
int pivotVal;
if(medians.size() == 1)
pivotVal = medians.at(0);
else
pivotVal = medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
/* Stealing a page from select() ... */
int pivot = partitionHelper(vec, pivotVal, start, end);
cout << "After pivoting with the value " << pivot << " we get : " << endl;
for(int i = start; i < end; i++){
cout << vec.at(i) << ", ";
}
cout << "\n\n" << endl;
usleep(10000);
int length = pivot - start + 1;
if(k < length){
return medianOfMedian(vec, k, start, pivot-1);
}
else if(k == length){
return vec[k];
}
else{
return medianOfMedian(vec, k-length, pivot+1, end);
}
}
Here are some unit tests that I wrote for these 2 functions. Hopefully they help.
vector<int> initialize(int size, int mod){
int arr[size];
for(int i = 0; i < size; i++){
arr[i] = rand() % mod;
}
vector<int> vec(arr, arr+size);
return vec;
}
/* Unit test for findMedians */
void testFindMedians(){
const int SIZE = 36;
const int MOD = 20;
vector<int> vec = initialize(SIZE, MOD);
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
vector<int> medians = findMedians(vec, 0, SIZE-1);
cout << "The 5-sorted version: " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
cout << "The medians extracted: " << endl;
for(int i = 0; i < medians.size(); i++){
cout << medians[i] << ", ";
}
cout << "\n\n" << endl;
}
/* Unit test for medianOfMedian */
void testMedianOfMedian(){
const int SIZE = 30;
const int MOD = 70;
vector<int> vec = initialize(SIZE, MOD);
cout << "Given array : " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
int median = medianOfMedian(vec, 0, vec.size()-1, vec.size()/2);
cout << "\n\nThe median is : " << median << endl;
cout << "As opposed to sorting and then showing the median... : " << endl;
std::sort(vec.begin(), vec.end());
cout << "sorted array : " << endl;
for(int i = 0; i < SIZE; i++){
if(i == SIZE/2)
cout << "**";
cout << vec[i] << ", ";
}
cout << "Median : " << vec[SIZE/2] << endl;
}
Given array :
7, 49, 23, 48, 20, 62, 44, 8, 43, 29, 20, 65, 42, 62, 7, 33, 37, 39, 60, 52, 53, 19, 29, 7, 50, 3, 69, 58, 56, 65,
After pivoting with the value 5 we get :
23, 29, 39, 42, 43,
After pivoting with the value 0 we get :
39,
Segmentation Fault: 11
It seems all right and dandy until the segmentation fault. I'm confident that my partition function works as well(was one of the implementations for the leetcode question).
Disclaimer: This is not a homework problem, but rather my own curiosity about the algorithm after I used quickSelect in a leetcode problem set.
Please let me know if my question proposed requires more elaboration for MVCE, thanks!
EDIT: I figured out that the recursion partition scheme is wrong in my code. As Pradhan has pointed out - I somehow have empty vectors which lead to the start and end being 0 and -1 respectively, causing me to have segmentation fault from an infinite loop of calling it. Still trying to figure this part out.
MoM
always calls itself (to compute pivot
), and thus exhibits infinite recursion. This violates the "prime directive" of recursive algorithms: at some point, the problem is "small" enough to not need a recursive call.