**Exploratory factor analysis** is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of a smaller number of common underlying dimensions. This involves finding a way of condensing the information contained in some of the original variables into a smaller set of implicit variables (called factors) with a minimum loss of information.

For example, suppose you would like to test the observation that customer satisfaction is based on product knowledge, communications skills and people skills. You develop a new questionnaire about customer satisfaction with 30 questions: 10 concerning product knowledge, 10 concerning communication skills and 10 concerning people skills. Before using the questionnaire on your sample, you pretest it on a group of people similar to those who will be completing your survey.

You perform a factor analysis to see if there are really these three factors. If they do, you will be able to create three separate scales, by summing the items on each dimension.

Factor analysis is based on a correlation table. If there are *k* items in the study (e.g. *k* questions in the above example) then the correlation table has *k × k* entries of form *r _{ij} *where each

*r*is the correlation coefficient between item

_{ij}*i*and item

*j*. The main diagonal consists of entries with value 1.

Closely related to factor analysis is **principal component analysis**, which creates a picture of the relationships between the variables useful in identifying common factors.

Factor analysis is based on various concepts from Linear Algebra, in particular eigenvalues, eigenvectors, orthogonal matrices and the spectral theorem. We review these concepts first before explaining how principal component analysis and factor analysis work.

Topics:

- Linear Algebra Background
- Principal Component Analysis (PCA)
- Basic Concepts of Factor Analysis
- Factor Extraction
- Determining the Number of Factors to Retain
- Rotation
- Factor Scores
- Validity of Correlation Matrix and Sample Size
- Principal Axis Method of Factor Extraction
- Real Statistics Functions and Data Analysis Tools

To illustrate Factor Analysis we will use an **example**. Click here for a complete description of this example.

I want to know the reliability on the responses on 3 statements.. and determine which one to reject retain or revise… is factor analysis can be a tool? How?

You may want to consider Cronbach’s alpha.

Charles

I need help with a Factor Analysis of a small table of variances in Commodities/Prices and Volumes. Can anyone help me – I have never done one Before in Excel.

Thx

Andrew

Dr Charles!

What method use when you have categoricals variables?

How compute a polychoric correlation matrix with Likert scale?

Then you.

Bentabet,

I have described how to deal with categorical variables in the context of regression, but I don’t know how much sense this will have in the context of factor analysis. More important, though is how to deal with ordinal data (such as Likert scales). Here polychoric correlation may be used.

I plan to add a webpage to the Real Statistics website explaining how to calculate the polychoric correlation coefficient (and from this you can create matrices of these coefficients). This should be available in the next couple of days.

Charles

Excellent and very useful

Apreciado Dr, buenos dias, ¿Cómo podría hacer un análisi discriminante con Real Statistics?, o esa herramienat no la tiene le paquete?

Muchas gracias

Dear Dr, Hello, How can I make a discriminant analysis with Real statistics ?. Or that function is not in the package?

Gerardo,

Sorry, but this capability is not yet in the package.

Charles

OK Dr, thanks

am very grateful. plz what do you mean by underlying dimensions

Each hidden factor is considered to be a dimension

I have a data from 44 people for pilot study (23 questions).when i run reliability test (Cronback’s alpha ) the value is 0.856. but when i do the validity test by using principal component method or any other methods it give value as 0.56..

is it a bad score? and could you please advise if there is a way to improve this score in actual research based on pilot study.

Charan,

Generally 0.56 is usually considered not to be a great score, while .856 is considered to be a very good score.

One of the reasons that you do factor analysis to identify underlying concepts being studied. If, for example, you identify 3 such underlying concepts (i.e. the factors), you would map the original 23 questions into the 3 factors. You would then calculate three values for Cronbach’s alpha, one for the questions corresponding to factor 1 and separate scores for the questions corresponding to factors 2 and 3. You would usually expect the three separate scores to be higher than the one score based on all the questions.

Charles

When discussing findings from a factor analysis in a report in a narrative style do I need to report any other statistics other than the Chronbach alpha score?

E.g I am stating that:

I have found that out of the 8 scale variables used to measure x there are 3 themes a b c (cronbach alpha= xxx)

Lola,

This really depends on the specific research and why you are using Factor Analysis. Once you have used factor analysis to identify the three themes you can calculate Cronbach’s alpha for each of the three themes to determine the reliability of a questionnaire. You can also use the factor loadings to do all sorts of analyses (regression, ANOVA, etc. if these are appropriate for your research.

Charles

Yes I will be following on with using these newly identified factors for further analysis.

But my guidelines state I should back up my discussion with statistical evidence but this is supposed to be written as a commercial piece of research whereby the reader is not intended to be a pure stats expert, so how do I demonstrate the accuracy of my factor analysis in this case?

Lola,

Demonstrating the accuracy of your factor analysis won’t necessarily be easy to do to an audience that has no knowledge of statistics. Various indicators are given on the website, but it may be a challenge explaining these to a non-technical audience, especially since even for experts this is not a clear-cut thing.

Charles

Cronbach’s alpha has some limitations so it might be worth running a Guttman’s Lamba if you find that your Cronbach’s alphas are short of the .7 you’re looking for. Cronbach’s alpha tends to underestimate. It would be worth reporting the findings of Guttman’s Lambda (2).

Thanks Christopher,

I will be adding Guttman’s Lambda in one of the next releases of the software.

Charles

actually the respondents number is 700. And attributes number is 10.

Kumar,

Are yous saying that there are 700 respondents and the number of different attributes is 10?

Charles

i have asked respondents to rank attributes say A,B,C,D .

Rank 1 for first preference, Rank for 2nd preference . So, each one gives 4 ranks for ex: A,C, B, D as 2, 3, 1, 4.

Now using these data , can I run factor analysis so that I find out underlying dimensions of these attributes?

You can certainly run Factor Analysis in this case, as long as the assumptions are met.

Charles

Thank you so much. It’s very helpful

Charles,

Thanks for creating this very informative website. Question with factor analysis. I have a developed open ended survey. Once I get these answers, I would group them according to common ideas. Would I then rank them in a number order to run a factor analysis?

Sorry Philip, but you haven’t given me enough information to be able to answer your question.

Charles

The survey is dealing with declining participation rate of African-Americans in baseball. Very open ended with some surveys being actual interviews. Would these answers be grouped with common words/themes and then numbered to create a factor analysis similar to your example.

Philip,

If you can figure out how to give a value to the open-ended questions, then I would imagine that you could use Factor Analysis.

Charles

Hi Charles,

I am looking at the possibility of grouping a vehicle make(variable) by using their depriciation % based on a monthly basis. This is due to my assumption that the depriciation % of certain make might have the same structure which can be group together. Can factor analysis look at the time series at the same time?

In this case what is the right data structure that i should employ in order to do the analysis?

For example:

Should i put in the make as a column while for model and the month of depriciation valuation in the row? While inside the table is depriciation percentage?

Sofian,

Unfortunately, based on the brief description that you have given, I don’t understand the problem that you are trying to solve or how factor analysis can be used. Can you provide some additional details?

Charles

Do you have any examples of forecasting/predicting future events using factor analysis?

Sorry, but I don’t have such an example.

Charles

Thank you so much for making this website, great information and great statistical tools!

Wow, this series was really helpful for me, too. I was away from my SPSS program and needed to run an analysis with Excel. Wasn’t sure how, until I came across this page. Thank you.

Could you please explain how I can calculate the reliability Alpha from the outcome of a Likert-scale type questionnaire? The likert -scale range is 1 to 5 and there are four sample companies involved. I will highly appreciate your response.

Bob,

I provide such an example in Example 4 of http://www.real-statistics.com/reliability/cronbachs-alpha/.

Charles

Hi,

First Thanks for your amazing site. I’m so lucky to find it. it really really helped me. I introduced it to my friends and they liked it too.

Second the above link for “linear algebra background” doesn’t work. Please make it work.

http://www.real-statistics.com/multivariate-statistics/factor-analysis/linear-algebra%E2%80%A6actor-analysis/

Hamed,

Thanks for bringing this to my attention. I have now made the correction. I am very pleased that the site has been useful to you.

Charles