# Basic Concepts of Factor Analysis

In this model we again consider k independent variables x1, …, xk and observed data for each of these variables. Our objective is to identify m factors  y1, …, ym, preferably with mk as small as possible, which explain the observed data more succinctly.

Definition 1: Let X = [xi] be a random k × 1 column vector where each xi represents an observable trait, and let μ = [μi]  be the k × 1 column vector of the population means. Thus E[Xi] = μi. Let Y = [yi] be an m × 1 vector of unobserved common factors where m ≤ k. These factors play a role similar to the principal components in Principal Component Analysis.

We next suppose that each xi can be represented as a linear combination of the factors as follows:

where the εi are the components which are not explained by the linear relationship. We further assume that the mean of each  is 0 and the factors are independent with mean 0 and variance 1. We can consider the above equations to be a series of regression equations.

The coefficient βij is called the loading of the ith variable on the jth factor. The coefficient εi is called the specific factor for the ith variable. Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.

Define the communality of variable xi to be φi = $\sum_{j=1}^m \beta_{ij}^2$ and let ϕi = var(εi) and $\sigma_i^2$ = var(xi).

Observation: Since μi = E[xi] = E[βi0 + $\sum_{j=1}^m \beta_{ij}$ yi + εi] = E[βi0] + $\sum_{j=1}^m \beta_{ij}$ E[yi] + E[εi] = βi0 + 0 + 0 = βi0, it follows that the intercept term βi0 = μi, and so the regression equations can be expressed as

or equivalently

From the assumptions stated above it also follows that:

E[xi] = μi for all i
E[εi] = 0 for all i (the specific factors are presumed to be random with mean 0)

cov(yi, yj) = 0 if i ≠ j
cov(εi, εj) = 0 if i ≠ j
cov(yi, εj) = 0 for all i, j

From Property A of Correlation Advanced and Property 3 of Basic Concepts of Correlation, we get the following:

From these equivalences it follows that the population covariance matrix Σ for X has the form

where $\phi$ is the k × k diagonal matrix with $\phi_i$  in the ith position on the diagonal.

Observation: Let λ1 ≥ … ≥ λk be the eigenvalues of Σ with corresponding unit eigenvectors γ1, …, γk where each eigenvector γi = [γij] is a k × 1 column vector of the form γi = [γij]. Now define the k × k matrix β = [βij] such that βij = $\sqrt \lambda_j$γij for all 1 ≤ i, jk. As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and so the βij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1 of Linear Algebra Background (Spectral Decomposition Theorem), it follows that

As usual, we will approximate the population covariance matrix Σ by the sample covariance matrix S (for a given random sample). Using the above logic, it follows that

where λ1 ≥ … ≥ λk are the eigenvalues of S (a slight abuse of notation since these are not the same as the eigenvalues of Σ) with corresponding unit eigenvectors C1, …, Ck and L = [bij] is the k × k matrix such that bij = $\sqrt \lambda_j$cij.

As we saw previously

or equivalently

The sample versions of these are

We have also seen previously that

The sample version is therefore

and so
Similarly

### 4 Responses to Basic Concepts of Factor Analysis

1. Piyush says:

Hi Charles,

Thanks for the wonderful material. I’ve gone through Factor analysis concepts multiple times. However I am not able to reconcile the following concepts. In PCA, we are trying to express y as a linear combination of x. Here x is known and the betas and y are unknown. By trying to maximize the variance of y subject to some constraints we are able to solve for the betas. Once we know beta we can calculate y. However in Factor analysis, we are doing the converse i.e. trying to express x as a linear combination of y. However here too the solution we get is the same as that in PCA. That is the decomposition of the covariance matrix of X gives us the betas. How can the matrix beta be the same in both situations.

Would appreciate if you could provide some explanation.

Regards,

• Charles says:

Piyush,
I am sorry, but I have not had the time to study your comment in any detail. Can you give me a specific example where the two beta matrices are the same?
Charles

hi, i undestand the analogy that bij, the loading, is a piece of information that resides in the reduced model( i’ve read only PCA by now) but i don’t understand where cov(xi,yj) and cov(xi,xj) come from
(i thought i and j are different dimensions namely k and m)
…and i do not understand what i have to substitute them to verify the relationship say on the population version

Can you please provide an illustration?Otherwise the site is a great experience thus far and i’m sure it’s going to be that way until i consume it (what i can)

• Charles says:

Regarding the first type of covariance, i and j take any value from 1 to k. Thus there are k x k different versions of cov(xi,xj).

Regarding the second type of covariance, i takes any value from 1 to k and j takes any value from 1 to m. Thus there are k x m different versions of cov(xi,yj).

Charles