pythonrotationscoringfactor-analysis

Factor order differs between loadings and scoring (with oblimin rotation), how is this possible?


I am running an exploratory factor analysis on a set of questions from a survey with the factor_analyser package in python. The result shows 8 factors with a clear set of variables with highest loadings in each of the factors.

In order to name the factors correctly and validate them, I wanted to analyse the correlation between the answered questions (with high loadings to a factor) and the factor scoring over all respondents.

However, when I analyse these results, the factors seem to switch. E.g., the first factor containing high loading variables on 'achievement'-questions, appears in the scoring results as the second factor having high correlation with the 'achievement'-questions for the respondents. Moreover, the high loading variables on the first factor show the lowest correlation with this factor scores when analysed with the factor scoring. See below the code:

fa = FactorAnalyzer(rotation = 'oblimin',
                   n_factors = 8)

fa.fit(test_data)
data_loadings = pd.DataFrame(fa.loadings_(test_data), index = test_data.columns)
data_transformed = pd.DataFrame(fa.transform(test_data), index = test_data.index)

Here's the visual outcome of the factor loadings, and here the visual outcome of the correlation matrix. Where you can see the (sorted) variables with highest loading to factor [0] differ from the variables with highest correlation to factor [0].

Does anyone know how this is possible? Does it have to do with the rotation, or the naming with the indices?


Solution

  • It seems like this is an issue with the FactorAnalyzer package.

    When an oblique rotation has changed the variance order of the factors, the factors are reordered to ensure the first has the greatest variance. However, the structure matrix was assigned using the loadings before reordering, which causes these orders to sometimes be different.

    This can cause a lot of surprise for the user; if the factors have been given interpretation, or names, a user may be stumped that the resulting factor scores do not match this interpretation at all, and correlate with different items than the ones with the strongest loadings.

    I've submitted a pull request to change this in the package.