
Is it possible to have different elements in a factor to have a same levels?

I googled many times and the result was not what I want:

A sample dataset is provided as below:

year = c(1991,1996,2001,2006,2011,2016,2021)

factor(a,levels = c(1991,1996,2001,2011,2016,2021))

The result was:

[1] 1991 1996 2001 <NA> 2011 2016 2021
Levels: 1991 1996 2001 2011 2016 2021

I want to set the level of 2006 to be the same as 2001, therefore, my favorable outcome will be:

[1] 1991 1996 2001 2006 2011 2016 2021
Levels: 1991 1996 2001 2011 2016 2021

Is it possible to change the levels of 2006 to be the same as 2001 without changing the original content of the vector year?


  • When you dig into the source code of factor, I guess you will have the answer in your mind (I think it should be "No" to your question)

    > factor
    function (x = character(), levels, labels = levels, exclude = NA, 
        ordered = is.ordered(x), nmax = NA)
        if (is.null(x))
            x <- character()
        nx <- names(x)
        if (missing(levels)) {
            y <- unique(x, nmax = nmax)
            ind <- order(y)
            levels <- unique(as.character(y)[ind])
        if (!is.character(x))
            x <- as.character(x)
        levels <- levels[is.na(match(levels, exclude))]
        f <- match(x, levels)
        if (!is.null(nx))
            names(f) <- nx
        if (missing(labels)) {
            levels(f) <- as.character(levels)
        else {
            nlab <- length(labels)
            if (nlab == length(levels)) {
                nlevs <- unique(xlevs <- as.character(labels))
                at <- attributes(f)
                at$levels <- nlevs
                f <- match(xlevs, nlevs)[f]
                attributes(f) <- at
            else if (nlab == 1L)
                levels(f) <- paste0(labels, seq_along(levels))
            else stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
                nlab, length(levels)), domain = NA)
        class(f) <- c(if (ordered) "ordered", "factor")
    <bytecode: 0x00000186f0fe3640>
    <environment: namespace:base>

    As we can see, levels is generated either by unique(x, nmax = nmax) if the levels argument is not provided, or, levels[is.na(match(levels, exclude))] with the given levels. That means, you are not possible to have a single level for two x values.