Factor Analysis

Exploratory factor analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of a smaller number of common underlying dimensions. This involves finding a way of condensing the information contained in some of the original variables into a smaller set of implicit variables (called factors) with a minimum loss of information.

For example, suppose you would like to test the observation that customer satisfaction is based on product knowledge, communications skills and people skills. You develop a new questionnaire about customer satisfaction with 30 questions: 10 concerning product knowledge, 10 concerning communication skills and 10 concerning people skills. Before using the questionnaire on your sample, you pretest it on a group of people similar to those who will be completing your survey.

You perform a factor analysis to see if there are really these three factors. If they do, you will be able to create three separate scales, by summing the items on each dimension.

Factor analysis is based on a correlation table. If there are k items in the study (e.g. k questions in the above example) then the correlation table has k × k entries of form rij where each rij is the correlation coefficient between item i and item j. The main diagonal consists of entries with value 1.

Closely related to factor analysis is principal component analysis, which creates a picture of the relationships between the variables useful in identifying common factors.

Factor analysis is based on various concepts from Linear Algebra, in particular eigenvalues, eigenvectors, orthogonal matrices and the spectral theorem. We review these concepts first before explaining how principal component analysis and factor analysis work.


To illustrate Factor Analysis we will use an example. Click here for a complete description of this example.

38 Responses to Factor Analysis

  1. CampSawi says:

    I want to know the reliability on the responses on 3 statements.. and determine which one to reject retain or revise… is factor analysis can be a tool? How?

  2. Andrew says:

    I need help with a Factor Analysis of a small table of variances in Commodities/Prices and Volumes. Can anyone help me – I have never done one Before in Excel.

  3. Bentabet says:

    Dr Charles!
    What method use when you have categoricals variables?
    How compute a polychoric correlation matrix with Likert scale?
    Then you.

    • Charles says:

      I have described how to deal with categorical variables in the context of regression, but I don’t know how much sense this will have in the context of factor analysis. More important, though is how to deal with ordinal data (such as Likert scales). Here polychoric correlation may be used.
      I plan to add a webpage to the Real Statistics website explaining how to calculate the polychoric correlation coefficient (and from this you can create matrices of these coefficients). This should be available in the next couple of days.

  4. anna gaber says:

    Excellent and very useful

  5. Apreciado Dr, buenos dias, ¿Cómo podría hacer un análisi discriminante con Real Statistics?, o esa herramienat no la tiene le paquete?

    Muchas gracias

    Dear Dr, Hello, How can I make a discriminant analysis with Real statistics ?. Or that function is not in the package?

  6. ighofose akpomejevwe says:

    am very grateful. plz what do you mean by underlying dimensions

  7. Charan says:

    I have a data from 44 people for pilot study (23 questions).when i run reliability test (Cronback’s alpha ) the value is 0.856. but when i do the validity test by using principal component method or any other methods it give value as 0.56..

    is it a bad score? and could you please advise if there is a way to improve this score in actual research based on pilot study.

    • Charles says:


      Generally 0.56 is usually considered not to be a great score, while .856 is considered to be a very good score.

      One of the reasons that you do factor analysis to identify underlying concepts being studied. If, for example, you identify 3 such underlying concepts (i.e. the factors), you would map the original 23 questions into the 3 factors. You would then calculate three values for Cronbach’s alpha, one for the questions corresponding to factor 1 and separate scores for the questions corresponding to factors 2 and 3. You would usually expect the three separate scores to be higher than the one score based on all the questions.


  8. Lola says:

    When discussing findings from a factor analysis in a report in a narrative style do I need to report any other statistics other than the Chronbach alpha score?
    E.g I am stating that:
    I have found that out of the 8 scale variables used to measure x there are 3 themes a b c (cronbach alpha= xxx)

    • Charles says:

      This really depends on the specific research and why you are using Factor Analysis. Once you have used factor analysis to identify the three themes you can calculate Cronbach’s alpha for each of the three themes to determine the reliability of a questionnaire. You can also use the factor loadings to do all sorts of analyses (regression, ANOVA, etc. if these are appropriate for your research.

      • Lola says:

        Yes I will be following on with using these newly identified factors for further analysis.

        But my guidelines state I should back up my discussion with statistical evidence but this is supposed to be written as a commercial piece of research whereby the reader is not intended to be a pure stats expert, so how do I demonstrate the accuracy of my factor analysis in this case?

        • Charles says:

          Demonstrating the accuracy of your factor analysis won’t necessarily be easy to do to an audience that has no knowledge of statistics. Various indicators are given on the website, but it may be a challenge explaining these to a non-technical audience, especially since even for experts this is not a clear-cut thing.

      • Cronbach’s alpha has some limitations so it might be worth running a Guttman’s Lamba if you find that your Cronbach’s alphas are short of the .7 you’re looking for. Cronbach’s alpha tends to underestimate. It would be worth reporting the findings of Guttman’s Lambda (2).

  9. kumar says:

    actually the respondents number is 700. And attributes number is 10.

  10. kumar says:

    i have asked respondents to rank attributes say A,B,C,D .
    Rank 1 for first preference, Rank for 2nd preference . So, each one gives 4 ranks for ex: A,C, B, D as 2, 3, 1, 4.
    Now using these data , can I run factor analysis so that I find out underlying dimensions of these attributes?

  11. Nha says:

    Thank you so much. It’s very helpful

  12. Philip says:


    Thanks for creating this very informative website. Question with factor analysis. I have a developed open ended survey. Once I get these answers, I would group them according to common ideas. Would I then rank them in a number order to run a factor analysis?

    • Charles says:

      Sorry Philip, but you haven’t given me enough information to be able to answer your question.

      • Philip says:

        The survey is dealing with declining participation rate of African-Americans in baseball. Very open ended with some surveys being actual interviews. Would these answers be grouped with common words/themes and then numbered to create a factor analysis similar to your example.

        • Charles says:

          If you can figure out how to give a value to the open-ended questions, then I would imagine that you could use Factor Analysis.

  13. Sofian says:

    Hi Charles,

    I am looking at the possibility of grouping a vehicle make(variable) by using their depriciation % based on a monthly basis. This is due to my assumption that the depriciation % of certain make might have the same structure which can be group together. Can factor analysis look at the time series at the same time?
    In this case what is the right data structure that i should employ in order to do the analysis?
    For example:
    Should i put in the make as a column while for model and the month of depriciation valuation in the row? While inside the table is depriciation percentage?

    • Charles says:

      Unfortunately, based on the brief description that you have given, I don’t understand the problem that you are trying to solve or how factor analysis can be used. Can you provide some additional details?

  14. AP says:

    Do you have any examples of forecasting/predicting future events using factor analysis?

  15. CJW says:

    Thank you so much for making this website, great information and great statistical tools!

  16. Brian says:

    Wow, this series was really helpful for me, too. I was away from my SPSS program and needed to run an analysis with Excel. Wasn’t sure how, until I came across this page. Thank you.

  17. Bob says:

    Could you please explain how I can calculate the reliability Alpha from the outcome of a Likert-scale type questionnaire? The likert -scale range is 1 to 5 and there are four sample companies involved. I will highly appreciate your response.

  18. hamed says:

    First Thanks for your amazing site. I’m so lucky to find it. it really really helped me. I introduced it to my friends and they liked it too.
    Second the above link for “linear algebra background” doesn’t work. Please make it work.

    • Charles says:

      Thanks for bringing this to my attention. I have now made the correction. I am very pleased that the site has been useful to you.

Leave a Reply

Your email address will not be published. Required fields are marked *