rfunctionapplynmf

Customized KIM feauture selection function


The function extractFeatures from NMF package can select features using the following method only if the features fulfill both following criteria are retained:

score greater than \hat{\mu} + 3 \hat{\sigma}, where \hat{\mu} and \hat{\sigma} are the median and the median absolute deviation (MAD) of the scores respectively;

the maximum contribution to a basis component is greater than the median of all contributions (i.e. of all elements of W).

How can I write this function in R that only applies the first criteria to data matrix?

Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis." Bioinformatics (Oxford, England), 23(12), pp. 1495-502. ISSN 1460-2059, , .


Solution

  • Given a vector scores, the condition for each score can be checked as follows:

    scores <- rnorm(5)
    scores > (median(scores) + 3 * mad(scores))
    # [1] FALSE FALSE FALSE FALSE FALSE
    

    where we don't need to look for a function for MAD as mad from the package stats does exactly that. Now if you want to select corresponding columns from some matrix M, you could write simply

    M[, scores > (median(scores) + 3 * mad(scores))]
    

    And if you prefer a function for that, then you may use

    featureCriterion <- function(M, scores)
      M[, scores > (median(scores) + 3 * mad(scores))]