In this model we again consider *k* independent variables* x*_{1}*, …, x _{k}* and observed data for each of these variables. Our objective is to identify

*m*factors

*y*

_{1}

*, …,*y

_{m}, preferably with

*m*≤

*k*as small as possible, which explain the observed data more succinctly.

**Definition 1**: Let *X* = [*x _{i}*] be a random

*k*× 1 column vector where each

*x*represents an observable trait, and let

_{i}*μ*= [

*μ*] be the

_{i}*k*× 1 column vector of the population means. Thus

*E*[

*X*] =

_{i}*μ*. Let

_{i}*Y*= [y

_{i}] be an

*m*× 1 vector of unobserved

**common factors**where

*m ≤ k*. These factors play a role similar to the principal components in Principal Component Analysis.

We next suppose that each *x _{i}* can be represented as a linear combination of the factors as follows:

where the *ε _{i}* are the components which are not explained by the linear relationship. We further assume that the mean of each is 0 and the factors are independent with mean 0 and variance 1. We can consider the above equations to be a series of regression equations.

The coefficient* β _{ij}* is called the

**loading**of the

*i*th variable on the

*j*th factor. The coefficient

*ε*is called the

_{i}**specific factor**for the

*i*th variable. Let

*β*= [

*β*] be the

_{ij}*k*×

*m*

**matrix of loading factors**and let

*ε*= [

*ε*] be the

_{i}*k*× 1 column

**vector of specific factors**.

Define the **communality** of variable *x _{i}* to be

*φ*= and let

_{i}*ϕ*= var(

_{i}*ε*) and = var(

_{i}*x*).

_{i}**Observation**: Since *μ _{i}* =

*E*[

*x*] =

_{i}*E*[

*β*

_{i0}+ y

_{i}+

*ε*] =

_{i}*E*[

*β*

_{i}_{0}] +

*E*[y

_{i}] +

*E*[

*ε*] =

_{i}*β*

_{i}_{0}+ 0 + 0 =

*β*

_{i0}, it follows that the intercept term

*β*

_{i}_{0}=

*μ*, and so the regression equations can be expressed as

_{i}From the assumptions stated above it also follows that:

*E*[*x _{i}*] =

*μ*for all

_{i}*i*

*E*[

*ε*] = 0 for all

_{i}*i*(the specific factors are presumed to be random with mean 0)

cov(y_{i}, y_{j}) = 0 if* i ≠ j*cov(

*ε*) = 0 if

_{i}, ε_{j}*i ≠ j*

cov(y

*) = 0 for all*

_{i}, ε_{j}*i, j*

From Property A of Correlation Advanced and Property 3 of Basic Concepts of Correlation, we get the following:

From these equivalences it follows that the population covariance matrix *Σ* for *X* has the form

where is the *k × k* diagonal matrix with in the *i*th position on the diagonal.

**Observation**: Let *λ*_{1} ≥ … ≥ *λ _{k}* be the eigenvalues of

*Σ*with corresponding unit eigenvectors

*γ*

_{1}, …,

*γ*where each eigenvector

_{k}*γ*[

_{i}=*γ*] is a

_{ij}*k*× 1 column vector of the form

*γ*= [

_{i}*γ*]. Now define the

_{ij}*k × k*matrix

*β*= [

*β*] such that

_{ij}*β*

_{ij}=*γ*for all 1 ≤

_{ij}*i, j*≤

*k*. As observed in Linear Algebra Background, all the eigenvalues of

*Σ*are non-negative, and so the

*β*are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1 of Linear Algebra Background (Spectral Decomposition Theorem), it follows that

_{ij}As usual, we will approximate the population covariance matrix *Σ* by the sample covariance matrix *S* (for a given random sample). Using the above logic, it follows that

where *λ*_{1} ≥ … ≥ *λ _{k}* are the eigenvalues of

*S*(a slight abuse of notation since these are not the same as the eigenvalues of

*Σ*) with corresponding unit eigenvectors

*C*

_{1}

*, …, C*and

_{k}*L*= [

*b*] is the

_{ij}*k × k*matrix such that

*b*

_{ij}=*c*

_{ij. }As we saw previously

The sample versions of these are

We have also seen previously that

The sample version is therefore

Hi Charles,

Thanks for the wonderful material. I’ve gone through Factor analysis concepts multiple times. However I am not able to reconcile the following concepts. In PCA, we are trying to express y as a linear combination of x. Here x is known and the betas and y are unknown. By trying to maximize the variance of y subject to some constraints we are able to solve for the betas. Once we know beta we can calculate y. However in Factor analysis, we are doing the converse i.e. trying to express x as a linear combination of y. However here too the solution we get is the same as that in PCA. That is the decomposition of the covariance matrix of X gives us the betas. How can the matrix beta be the same in both situations.

Would appreciate if you could provide some explanation.

Regards,

Piyush,

I am sorry, but I have not had the time to study your comment in any detail. Can you give me a specific example where the two beta matrices are the same?

Charles

hi, i undestand the analogy that bij, the loading, is a piece of information that resides in the reduced model( i’ve read only PCA by now) but i don’t understand where cov(xi,yj) and cov(xi,xj) come from

(i thought i and j are different dimensions namely k and m)

…and i do not understand what i have to substitute them to verify the relationship say on the population version

Can you please provide an illustration?Otherwise the site is a great experience thus far and i’m sure it’s going to be that way until i consume it (what i can)

Regarding the first type of covariance, i and j take any value from 1 to k. Thus there are k x k different versions of cov(xi,xj).

Regarding the second type of covariance, i takes any value from 1 to k and j takes any value from 1 to m. Thus there are k x m different versions of cov(xi,yj).

Charles