Observation: By the fundamental theorem of algebra any nth degree polynomial p(x) has exactly n roots, i.e. n values for which p(x) = 0. In fact, if r1, …, rn are the n roots, then the polynomial can be expressed as an · (x – ri).
For example, the 2nd degree polynomial x2 – 1 has two roots, namely 1 and -1, and in fact this polynomial is equivalent to (x – 1)(x + 1). The 2nd degree polynomial x2 + 2x + 1 also has two roots, namely -1 and -1, and so can be expressed as (x + 1)(x + 1) or (x + 1)2.
But what about the polynomial x2 + 1? It too has two roots, namely and –. Unfortunately, as we know the square root of a negative number is not one of the numbers we ordinarily deal with, namely the decimal numbers, also called the real numbers. In order to capture the roots of all polynomials, even those with only real coefficients, such as x2 + 1, we have to deal with what are called complex numbers. These are numbers of the form a + bi where a and b are real numbers and i is an abbreviation for . These include the real numbers (i.e. those where b = 0), but also include non-real numbers, called imaginary numbers (namely those where b ≠ 0).
This discussion is relevant to eigenvalues. By definition, λ is an eigenvalue of the n × n matrix A if det(A – λI) = 0. Since det(A – λI) = 0 can be expressed as an nth degree polynomial, as described in Eigenvalues and Eigenvectors, it has n roots, i.e. n values for which det(A – λI) = 0. This means that every n × n matrix has n eigenvalues, but unfortunately it is not necessarily the case that all of the eigenvalues are real numbers. Fortunately, in the cases we are most interested in, namely symmetric matrices (i.e. those where A = AT), it turns out that all the eigenvalues are real. We look at this case next, and then we won’t need to consider imaginary numbers any further.
Property 1: All the eigenvalues of a symmetric (real) matrix are real
We only consider matrices all of whose elements are real numbers.
Observation: Before we proceed with the proof of this property, we quickly state a few properties of complex numbers.
If z = a + bi is a complex number, then z′ = a – bi is called the conjugate of z and |z| = is called the absolute value of z. Clearly, when z is a real number (i.e. b = 0) its conjugate is simply z, but when z is an imaginary number, then its conjugate is distinct from z. Addition, subtraction and multiplication are performed as for real numbers (noting that i times i = -1). If we consider vectors which contain complex numbers as elements, then the conjugate X′ of vector X is a vector with the same shape as X consisting of the conjugates of the elements in X.
First note that if X = [xi] , then X′ ∙ X = |xi|2 ≥ 0, i.e. the dot product of a vector and its conjugate is a non-negative real number. If X ≠ 0, then clearly the dot product is a positive real number.
In general, if z is a root of the polynomial p(x), then the conjugate of z is also a root of p(x). Thus if λ is an eigenvalue of an n × n matrix A, then the conjugate λ′ is also an eigenvalue. If X is an eigenvector which corresponds to λ (i.e. AX = λX), then it is easy to show that X′ is an eigenvector which corresponds to λ′ (i.e. AX′ = λ′X′).
Proof (Property 1): Let λ be an eigenvalue of the n × n matrix A. By definition, this means that det(A – λI) = 0. Thus there are an infinite number of non-trivial solutions to the equation (A – λI)X = 0, which can be found using Gaussian Elimination (extended to complex numbers if necessary). Now let X be one such solution, i.e. X is an eigenvector corresponding to λ. Using the observations made above and the fact that since A is symmetric, we have
Thus, λ(X′ ∙ X) = λ′(X′ ∙ X), but since X′ ∙ X is a positive real number we conclude that λ = λ′, which can only happen if λ is a real number.
Property 2: If A is a symmetric matrix and X and Y are eigenvectors associated with distinct eigenvalues of A, then X and Y are orthogonal.
Proof: Let c be the eigenvalue associated with X and d be the eigenvalue associated with Y, with c ≠ d.
Using the above observation
But since c ≠ d, it follows that X ∙ Y = 0.
Definition 1: A square matrix A is orthogonally diagonalizable if there exist an orthogonal matrix P and a diagonal matrix D such that A = PDP-1. Note that since P is an orthogonal matrix, by Property 6 of Orthogonal Vectors and Matrices, P-1 = PT, and so equivalently A = PDPT.
If A = PDPT is an n × n matrix where is the diagonal matrix whose main diagonal consists of the n eigenvalues of A and P is the n × n matrix whose columns are the n unit eigenvectors corresponding to these eigenvalues, then we call PDPT a spectral decomposition of A.
Property 3: If A is orthogonally diagonalizable, then A is symmetric.
Proof: Suppose that A = PDPT. It follows that
since diagonal matrices are symmetric and so DT = D. This proves that AT = A, and so A is symmetric.
Observation: We next show the converse of Property 3. In fact we show that any symmetric matrix has a spectral decomposition. First we show that this is true for a special case.
Property 4: Let A be a n × n matrix for which all the eigenvalues are distinct and let C be the n × n matrix whose columns are the n unit eigenvectors C1, …, Cn corresponding to the n eigenvalues λ1, …, λn of A and let D be the n × n diagonal matrix whose main diagonal consists of λ1, …, λn. Then A has a spectral decomposition A = CDCT.
Proof: By Property 2, C1, …, Cn are mutually orthogonal. By Property 3 of Orthogonal Vectors and Matrices, is an orthogonal matrix and so CCT = I. Now the jth column AC is ACj = λj Cj. Thus AC = CD, and so A = AI = ACCT = CDCT.
Observation: Note we didn’t specify that the matrix in Property 4 needs be symmetric. By Property 3 and 4, we see that if all the eigenvalues of a square matrix are distinct then the matrix must be symmetric.
Observation: Unfortunately not all symmetric matrices have distinct eigenvalues, as can be seen from the diagonal matrix with 1, 1, 2 on the main diagonal. As we know from Property 1 of Determinants and Linear Equations, the eigenvalues of this matrix are the values on the main diagonal, namely 1, 1 and 2, which are clearly not distinct. Since all diagonal matrices are symmetric, this provides a simple counterexample.
It still turns out that any symmetric matrix has a spectral decomposition (with distinct orthonormal eigenvectors), as can be seen from the Spectral Decomposition Theorem (see Spectral Decomposition.