Mathematics Prelims

June 26, 2009

The Characteristic Polynomial

Filed under: Algebra,Linear Algebra — cjohnson @ 7:18 pm

Last time we defined the eigenvalues and eigenvectors of a matrix, but didn’t really discuss how to actually calculate the eigenvalues or eigenvectors; we said that if it so happened that your matrix was similar to a diagonal matrix, the non-zero entries of the diagonal matrix were the eigenvalues, and the columns of the change-of-basis matrix were the eigenvectors.  Now we’re going to discuss how to find the eigenvalues using the matrix’s characteristic polynomial.

Notice that if \lambda is an eigenvalue of A with associated eigenvector v we have the following.

Av = \lambda v

\implies Av - \lambda v = 0

\implies (A - \lambda I) v = 0

Of course, v = 0 satisfies this equation, but that’s a trivial solution.  For any other, non-trivial, solution we’d require that A is non-singular, and so \text{det}(A - \lambda I) = 0.  Thus if \lambda is an eigenvalue of A, we must have \text{det}(A - \lambda I) = 0.

Now suppose that \omega is such that \text{det}(A - \omega I) = 0.  Then there is a non-trivial solution to (A - \omega I) u = 0, so Au = \omega u, and \omega is an eigenvalue.  We’ve shown that \lambda is an eigenvalue of A if and only if \text{det}(A - \lambda I) = 0.  Furthermore, \text{det}(A - \lambda I) is a polynomial in \lambda (this is obvious if A is 1 \times 1, and inductively we can show that this is true for n \times n matrices).  This means that with the characteristic polynomial, the problem of finding eigenvalues is reduced to finding the roots of a polynomial.

As an example, suppose

\displaystyle A = \left[ \begin{array}{ccc} 1 & -1 & 2 \\ 2 & 2 & -3 \\ 3 & 5 & 7 \end{array} \right]

Then the characterstic polynomial is

\displaystyle \text{det} \left( \left[ \begin{array}{ccc} 1 - \lambda & -1 & 2 \\ 2 & 2 - \lambda & -3 \\ 3 & 5 & 7 - \lambda \end{array} \right] \right) = -\lambda^3 + 10\lambda^2 - 34 \lambda + 60

\displaystyle = -(\lambda-6) \, (\lambda^2 - 4 \lambda + 10)

\displaystyle = (\lambda - 6) \, (\lambda - (2 - i \sqrt{6})) \, (\lambda - (2 + i \sqrt{6}))

So we see that the eigenvalues are 6, 2 - i \sqrt{6}, and 2 + i \sqrt{6} (notice the last two are complex conjugates of one another).

Now, once we’ve found the eigenvalues, the next step is to find the eigenvectors.  Since

Av = \lambda v

\implies (A - \lambda I) v = 0

what we want is to find the nullspace of A - \lambda I, since these are all the vectors that A - \lambda I will take to zero.  In our particular example, for \lambda = 6,

(A - \lambda I) v = 0

\implies \left( \left[ \begin{array}{ccc} 1 & -1 & 2 \\ 2 & 2 & -3 \\ 3 & 5 & 7 \end{array} \right] - \left[ \begin{array}{ccc} 6 & 0 & 0 \\ 0 & 6 & 0 \\ 0 & 0 & 6 \end{array} \right] \right) \left[ \begin{array}{c} v_1 \\ v_2 \\ v_3 \end{array} \right] = 0

\implies \left[ \begin{array}{ccc} -5 & -1 & 2 \\ 2 & -4 & -3 \\ 3 & 5 & 1 \end{array} \right] \left[ \begin{array}{c} v_1 \\ v_2 \\ v_3 \end{array} \right] = 0

Now we take the row-reduced echelon form of this matrix, since it shares the same null space:

\implies \left[ \begin{array}{ccc} 1 & 0 & -\frac{1}{2} \\ 0 & 1 & \frac{1}{2} \\ 0 & 0 & 0 \end{array} \right] \left[ \begin{array}{c} v_1 \\ v_2 \\ v_3 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right]

This tells us that

\text{NS}(A - 6I) = \left\{ \left[ \begin{array}{c} \frac{1}{2} v_3 \\ -\frac{1}{2} v_3 \\ v_3 \end{array} \right] : v_3 \in \mathbb{C} \right\}

So the eigenvectors associated with the eigenvalue \lambda = 6 are the multiples of \left[ 0.5, \, -0.5, \, 1 \right]^T.  We’d repeat the above process with \lambda = 2 \pm i \sqrt{6} to find the other eigenvectors.

June 25, 2009

Eigenvalues and Eigenvectors

Filed under: Algebra,Linear Algebra — cjohnson @ 2:39 pm

Let’s suppose that A is an n \times n matrix which is similar to a diagonal matrix, \text{diag}(\lambda_1, \lambda_2, ..., \lambda_n).  This means there is an invertible (change-of-basis) matrix P such that

\displaystyle A = P \text{diag}(\lambda_1, ..., \lambda_n) P^{-1}

Now since P is a change of basis matrix, each of its columns gives the coordinates to a basis vector of some basis.  Let’s call that basis \beta and let \beta_1 through \beta_n be the elements of that basis.  Now, if we take the above equation and multiply by P on the right, notice that

AP = P \text{diag}(\lambda_1, ..., \lambda_n)

\implies (AP)_{*i} = (P \text{diag}(\lambda_1, ..., \lambda_n))_{*i} = \lambda_i P_{*i}

That is, the i-th column of AP is equal to the i-th column of P \text{diag}(\lambda_1, ..., \lambda_n), which is just \lambda_i times the i-th column of P.  Since each column of AP is just a linear combination of the columns of A, though, we have

AP_{*i} = \lambda_i P_{*i}

This means that when we plug in the i-th column of P to the linear transformation represented by A, we get back a multiple of that column.  Calling the linear transformation \tau, we have that

\tau(\beta_i) = \lambda_i \beta_i.

Vectors such as \beta_i whose image under \tau is just a multiple of the vector are called eigenvectors of \tau.  That multiple, the \lambda_i above, is called an eigenvalue of \tau.  These eigenvectors and eigenvalues are associated with a particular linear transformation, so when we talk about the eigenvectors and eigenvalues of a matrix, we really mean the eigenvectors and eigenvalues of the transformation represented by that matrix.  Notice that this means that eigenvalues are independent of the chosen basis; since similar matrices represent the same transformation just with respect to different bases, similar matrices have the same eigenvalues.

We assumed that A was similar to a diagonal matrix above, but this isn’t always true.  If A is similar to a diagonal matrix, say A = P^{-1}DP, then as we’ve just shown, the columns of P are eigenvectors of A.  Since these form the columns of a non-singular matrix, the eigenvectors of A form a basis for the vector space.  Also, if the eigenvectors of A form a basis, let’s take those basis vectors as columns of P.

\displaystyle P^{-1} A P

\displaystyle = \left[ \begin{array}{c|c|c} \beta_1 & ... & \beta_n \end{array} \right]^{-1} A \left[ \begin{array}{c|c|c} \beta_1 & ... & \beta_n \end{array} \right]

\displaystyle = \left[ \begin{array}{c|c|c} \beta_1 & ... & \beta_n \end{array} \right]^{-1} \left[ \begin{array}{c|c|c} A \beta_1 & ... & A \beta_n \end{array} \right]

\displaystyle = \left[ \begin{array}{c|c|c} \beta_1 & \cdots & \beta_n \end{array} \right]^{-1} \left[ \begin{array}{c|c|c} \lambda_1 \beta_1 & \cdots & \lambda_n \beta_n \end{array} \right]

\displaystyle = \left[ \begin{array}{c|c|c} \beta_1 & \cdots & \beta_n \end{array} \right]^{-1} \left( \left[ \begin{array}{c|c|c} \beta_1 & \cdots & \beta_n \end{array} \right] \left[ \begin{array}{cccc} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & \cdots & 0 & \lambda_n \end{array} \right] \right)

\displaystyle = \text{diag}(\lambda_1, ..., \lambda_n)

So a matrix is diagonalizable (similar to a diagonal matrix) if and only if its eigenvectors form a basis for the vector space.

Definition of Similarity Using Linear Transformations

Filed under: Algebra,Linear Algebra — cjohnson @ 12:24 pm

Let’s suppose we have a linear transformation \tau on \mathbb{R}^3 which performs the following:

\displaystyle \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] \mapsto \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right]

\displaystyle \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right] \mapsto \left[ \begin{array}{c} 4 \\ 5 \\ 6 \end{array} \right]

\displaystyle \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right] \mapsto \left[ \begin{array}{c} 5 \\ 7 \\ 0 \end{array} \right]

Now, the matrix representation of this transformation with respect to the standard basis is clearly

\displaystyle \left[ \begin{array}{ccc} 1 & 4 & 5 \\ 2 & 5 & 7 \\ 3 & 6 & 0 \end{array} \right]

But suppose we were to use a different basis for \mathbb{R}^3, like say \zeta = \{ (2, 1, 0)^T, \, (1, 0, 1)^T, \, (3, 0, -1)^T \}.  We see that our transformation maps these basis vectors as follows:

\displaystyle \left[ \begin{array}{c} 2 \\ 1 \\ 0 \end{array} \right] \mapsto \left[ \begin{array}{c} 6 \\ 9 \\ 12 \end{array} \right]

\displaystyle \left[ \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \right] \mapsto \left[ \begin{array}{c} 6 \\ 9 \\ 3 \end{array} \right]

\displaystyle \left[ \begin{array}{c} 3 \\ 0 \\ -1 \end{array} \right] \mapsto \left[ \begin{array}{c} -2 \\ -1 \\ 9 \end{array} \right]

Notice that with the vectors we have on both the left and the right above are the coordinates with respect to the standard basis.  We’d like to see what the matrix representing \tau looks like with respect to the \zeta basis, so let’s convert the vectors on the right to \zeta-coordinates.  Recalling that a change of basis is simply a system of equations where the columns of the coefficient matrix are the coordinates of the basis vectors (and the inverse of this matrix if we want to go the other way), we have that

\displaystyle \left[ \begin{array}{c} 6 \\ 9 \\ 12 \end{array} \right] = \left[ \begin{array}{c} 9 \\ 6 \\ -6 \end{array} \right]_{\zeta}

\displaystyle \left[ \begin{array}{c} 6 \\ 9 \\ 3 \end{array} \right] = \left[ \begin{array}{c} 9 \\ -\frac{3}{4} \\ -\frac{15}{4} \end{array} \right]_{\zeta}

\displaystyle \left[ \begin{array}{c} -2 \\ -1 \\ 9 \end{array} \right] = \left[ \begin{array}{c} 19 \\ \frac{7}{4} \\ - \frac{5}{2} \end{array} \right]_{\zeta}

So with respect to our \zeta basis, the representation of \tau is

\displaystyle \left[ \begin{array}{ccc} 9 & 9 & 19 \\ 6 & -\frac{3}{4} & \frac{7}{4} \\ -6 & -\frac{15}{4} & -\frac{5}{2} \end{array} \right]

We will denote this matrix as _\zeta[\tau]_\zeta where the right-most subscript means that inputs are in \zeta-coordinates, and the left-most subscript means the outputs are in \zeta-coordinates as well.  Note that we could calculate _\zeta[\tau]_\zeta by going from \zeta coordinates to standard coordinates, using the earlier matrix, then going back to \zeta-coordinates.  That is,

\displaystyle _\zeta[\tau]_\zeta = _\zeta[I]_e \, _e[\tau]_e \, _e[I]_\zeta

where _e[I]_\zeta refers to the \zeta-to-standard basis change of basis matrix.  For notational convenience, define the following

A := _\zeta[\tau]_\zeta

B := _e[\tau]_e

P := _e[I]_\zeta

Then the above becomes

A = P^{-1} \, B \, P

Any matrices A and B for which there exists a P with satisfying this equation are called similar matrices.  Note that two matrices are similar if and only if they represent the same linear transformation, but with respect to different bases.  Also note that every non-singular matrix represents a change of basis matrix.  Similarity forms an equivalence relation on the set of square matrices / the set of linear transformations from a finite dimensional vector space to itself.

June 18, 2009

Linear Transformations and Matrix Representations

Filed under: Algebra,Linear Algebra — cjohnson @ 2:22 pm

A common theme in mathematics is (or seems to be) looking at sets with a particular structure, and then looking at functions between those sets which preserve that structure.  In groups we have homomorphisms; in topological spaces we have continuous maps; in general categories we have morphisms.  In the particular case of vector spaces, though, there are two particular “structures” we want to preserve: vector addition and scalar multiplication.  The maps which preserve these are what we refer to as linear transformations.

Specifically, suppose V and W are vector spaces over the field \mathcal{F}.  A function \tau : V \to W is called a linear transformation if for all scalars \alpha, \beta \in \mathcal{F} and for all vectors u, v \in V we have

\displaystyle \tau(\alpha u + \beta v) = \alpha \tau(u) + \beta \tau(v)

Note that because of this linearity, a linear transformation is completely determined by how it maps the basis vectors of the domain.  Suppose that \beta = \{ \beta_1, \beta_2, ..., \beta_n \} is a basis for V.  Let u be any vector in V with u = a_1 \beta_1 + ... + a_n \beta_n.  We then have

\tau(u) = \tau(a_1 \beta_1 + ... + a_n \beta_n) = a_1 \tau(\beta_1) + ... + a_n \tau(\beta_n).

So if we know each \tau(\beta_i), we can figure out where any other vector will be sent by \tau.  This does not mean that \tau(\beta_1), ..., \tau(\beta_n) is necessarily a basis for the range, \tau(V).  It could be that \tau(\beta_1) = \tau(\beta_2), in which case these vectors are linearly dependent and can’t both be in the basis.  We do have that \tau(\beta_1), ..., \tau(\beta_n) span \tau(V), however, so as long as they’re linearly independent they’ll form a basis.

The main thing we want to notice about linear transformations for right now is that if both V and W are finite dimensional, then a linear transformation \tau : V \to W can be represented as a matrix.  Suppose that V is n-dimensional with the \beta basis mentioned above, and that W is m-dimensional with basis \omega = \{ \omega_1, ..., \omega_m \}.  Note that the properties of matrix multiplication tell us that any m \times n matrix defines a linear transformation from V to W:

A(\alpha u + \gamma v) = A(\alpha u) + A( \gamma v) = \alpha A u + \gamma A v

Now suppose \tau : V \to W is any other linear transformation.  Suppose that the coordinate vector of \tau(\beta_i) with respect to the \omega basis is

\displaystyle \tau(\beta_i) = \left[ \begin{array}{c} a_{1i} \\ a_{2i} \\ \vdots \\ a_{mi} \end{array} \right]_\omega

Now let u = b_1 \beta_1 + ... + b_n \beta_n.  We then have

\displaystyle \tau(u)

\displaystyle \, = b_1 \tau(\beta_1) + b_2 \tau(\beta_2) + ... + b_n \tau(\beta_n)

\displaystyle \, = b_1 \left[ \begin{array}{c} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{array} \right] + b_2 \left[ \begin{array}{c} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{array} \right] + ... + b_n \left[ \begin{array}{c} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{array} \right]

\displaystyle \, = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & & & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{array} \right] \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_n \end{array} \right]

Thus a linear transformation between finite dimensional vector spaces can be represented as a matrix.  Notice that the entries of our matrix depend on our particular chosen bases: if one basis were altered, the matrix would change, even though the transformation is the same.  We will denote the matrix representing \tau with respect to the \omega and \beta bases as _\omega[ \tau ]_\beta

June 17, 2009

The Laplace/Cofactor Expansion

Filed under: Algebra,Linear Algebra — cjohnson @ 3:22 pm

We’ve yet to describe a way to calculate determinants in any easy way; we’ve seen some nice properties, but still have to resort to writing a non-elementary matrix as a product of elementary matrices in order to calculate its determinant.  What we want to do now is describe a recursive procedure for calculating a determinant by looking at determinants of submatrices.  Let’s first agree to call the submatrix of A with the i-th row and j-th column deleted the i,jminor of A, which we’ll denote \text{M}_{i,j}(A).  So, supposing

\displaystyle A = \left[ \begin{array}{ccc} 2 & 3 & 5 \\ 7 & 9 & 11 \\ 13 & 17 & 19 \end{array}\right]

if we delete the first row and second column we have

\displaystyle \text{M}_{1,2}(A) = \left[ \begin{array}{cc} 7 & 11 \\ 13 & 19 \end{array} \right]

Let’s also note that if the first row of A is all zeros except for the first entry, the determinant of A is simply the determinant of \text{M}_{1,1}(A) multiplied by that first entry.  That is,

\displaystyle \det \left[ \begin{array}{c|ccc} a_{11} & 0 & \cdots & 0 \\ \hline a_{21} & & & \\ \vdots & & \text{M}_{1,1}(A) & \\ a_{n1} & & & \end{array} \right] = a_{11} \det\left(\text{M}_{1,1}(A)\right)

To see this, first note that we can zero out the entries in the first column below the a_{11} by performing a sequence of elementary row operations that don’t change the determinant.  Now we clearly have

\displaystyle \det \left[ \begin{array}{c|ccc} a_{11} & 0 & \cdots & 0 \\ \hline a_{21} & & & \\ \vdots & & \text{M}_{1,1}(A) & \\ a_{n1} & & & \end{array} \right] = a_{11} \det \left[ \begin{array}{c|ccc} 1 & 0 & \cdots & 0 \\ \hline 0 & & & \\ \vdots & & \text{M}_{1,1}(A) & \\ 0 & & & \end{array} \right]

Supposing we can write \text{M}_{1,1}(A) as a product of elementary matrices, \text{M}_{1,1}(A) = E_1 \, E_2 \, ... \, E_m, to calculate its determinant, we can then obtain the matrix above by looking at the product

\displaystyle \left[ \begin{array}{cc} 1 & 0 \\ 0 & E_1 \end{array} \right] \, \left[ \begin{array}{cc} 1 & 0 \\ 0 & E_2 \end{array} \right] \, \cdots \, \left[ \begin{array}{cc} 1 & 0 \\ 0 & E_m \end{array} \right]

Each of these is then an elementary matrix whose determinant is the same as the determinant of the associated E_i matrix, so we have our result.

Now, applying the linearity we discussed last time,

\displaystyle \det \left[ \begin{array}{ccc} a_{11} & \cdots & a_{1n} \\ & R_2 & \\ & \vdots & \\ & R_n & \end{array} \right] = \det \left[ \begin{array}{ccccc} a_{11} & 0 & \cdots & 0 & 0 \\ & & R_2 & & \\ & & \vdots & & \\ & & R_n & & \end{array} \right] + \det \left[ \begin{array}{ccccc} 0 & a_{12} & 0 & \cdots & 0 \\ & & R_2 & & \\ & & \vdots & & \\ & & R_n & & \end{array} \right] + ... + \det \left[ \begin{array}{ccccc} 0 & 0 & \cdots & 0 & a_{1n} \\ & & R_2 & & \\ & & \vdots & & \\ & & R_n & & \end{array} \right]

Notice that we can can zero out the elements in the first column below a_{11} giving us

\displaystyle \det \left[ \begin{array}{ccccc} a_{11} & 0 & \cdots & 0 & 0 \\ & & R_2 & & \\ & & \vdots & & \\ & & R_n & & \end{array} \right] = a_{11} \det \left( \text{M}_{11}(A) \right)

In general, for the j-th column, we want to do a series of column swaps bringing the j-th column to the front of the matrix, but keeping the other columns in order.  (For this reason a single column swap won’t work, since that permutes the remaining columns.)  Each time we swap columns, the determinant is multiplied by -1.  If we move the j-th column to the left by swapping with column j-1, then with column j-2, and so on, we perform j-1 swaps.  Thus

\det \left[ \begin{array}{ccccc} \cdots & 0 & a_{1j} & 0 & \cdots \\ & & R_2 & & \\ & & \vdots & & \\ & & R_n & & \end{array} \right] = (-1)^{j-1} \det \left[ \begin{array}{c|ccc} a_{1j} & 0 & \cdots & 0 \\ \hline 0 & & & \\ \vdots & & \text{M}_{1,j}(A) & \\ 0 & & & \end{array} \right] = (-1)^{j-1} a_{1j} \det \left( \text{M}_{1,j}(A) \right)

So in total we have

\displaystyle \det(A) = \sum_{j=1}^n (-1)^{j-1} a_{1,j} \det \left( \text{M}_{1,j}(A) \right)

We could also perform this expansion along another row, but we’d have to perform some row swaps to move that row to the top first.  Supposing we decide to expand along the i-th row, we’ll perform i-1 swaps, each time multiplying the determinant by -1.  We’d multiply our result above, then by (-1)^{i-1}.  Distributing that across our sum though, we note that

(-1)^{i-1} \, (-1)^{j-1} = (-1)^{i + j - 2} = (-1)^{i + j} \, (-1)^{-2} = (-1)^{i + j}

So, if we expand along the i-th row, our formula becomes

\displaystyle \det(A) = \sum_{j=1} (-1)^{i+j} a_{i,j} \det \left( \text{M}_{i,j}(A) \right)

Each term of the sum with the a_{i,j} factor removed is called a cofactor; The i,j-th cofactor, which we’ll denote \text{cof}_{i,j}(A) is

\displaystyle \text{cof}_{i,j}(A) = (-1)^{i+j} \det \left( \text{M}_{i,j}(A) \right)

The procedure we’ve just described is known as the Laplace expansion (or cofactor expansion) for the determinant, and so now we have a more efficient way of calculating determinants.  (Notice that we could expand along a column instead of a row: just repeat the above procedure on the transpose of the matrix.)

June 16, 2009

Determinants Are Linear in Rows and Columns

Filed under: Algebra,Linear Algebra — cjohnson @ 8:55 pm

One easy consequence of our definition of determinant from last time is that any singular matrix must have determinant zero. Suppose A is a singular n \times n matrix and that P is the matrix which puts A into row reduced form. Then we have

\displaystyle \det(A)

\displaystyle = \det(P \, P^{-1} \, A)

\displaystyle = \det(P) \, \det(P^{-1}) \, \det(A)

\displaystyle = \det(P) \, \det(A) \, \det(P^{-1})

\displaystyle = \det(P \, A) \det(P^{-1})

If A is singular, once we put it in row reduced form it must have a row of zeros. We can now break PA up into a product of elementary matrices, one of which will have be of the form I_{(0R_i)}. We know that this matrix will have determinant zero, so the product will be zero, and thus \det(A) = 0. Likewise, if \det(A) = 0, we can write A as a product of elementary row matrices and one will be I_{(0R_i)}, so A is singular. Now we know that a matrix is singular if and only if its determinant is zero.

Suppose now that the first row of A can be written as \alpha + \beta for some vectors \alpha and \beta. We wish to show that

\displaystyle \det(A) = \det \left[ \begin{array}{c} \alpha + \beta \\ R_2 \\ \vdots \\ R_n \end{array} \right] = \det \left[ \begin{array}{c} \alpha \\ R_2 \\ \vdots \\ R_n \end{array} \right] + \det \left[ \begin{array}{c} \beta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

First suppose that the rows R_2 through R_n form a linearly dependent set. Then our matrix is singular so has determinant zero. The determinants on the right in the above equation are zero too, so our we have our result.

Suppose now that R_2 through R_n are linearly independent. We can then extend these to a basis for \mathcal{F}_{1 \times n} by adding a vector, call it \zeta. Then there exist scalars a_i, b_i for i = 1, ..., n such that

\alpha = a_1 \zeta + a_2 R_2 + ... + a_n R_n

\beta = b_1 \zeta + b_2 R_2 + ... + b_n R_n

Some simple manipulations from last time give us the following.

\displaystyle \det \left[ \begin{array}{c} \alpha \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = \det \left[ \begin{array}{c} a_1 \zeta + a_2 R_2 + ... + a_n R_n \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = \det \left[ \begin{array}{c} a_1 \zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = a_1 \det \left[ \begin{array}{c} \zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

And likewise,

\displaystyle \det \left[ \begin{array}{c} \beta \\ R_2 \\ \vdots \\ R_n \end{array} \right] = b_1 \det \left[ \begin{array}{c}\zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

Now we combine these results,

\displaystyle \det \left[ \begin{array}{c} \alpha + \beta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = \det \left[ \begin{array}{c} (a_1 + b_1) \zeta + (a_2 + b_2) R_2 + ... + (a_n + b_n) R_n \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = (a_1 + b_1) \det \left[ \begin{array}{c} \zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = a_1 \det \left[ \begin{array}{c} \zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right] + b_1 \det \left[ \begin{array}{c} \zeta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

\displaystyle = \det \left[ \begin{array}{c} \alpha \\ R_2 \\ \vdots \\ R_n \end{array} \right] + \det \left[ \begin{array}{c} \beta \\ R_2 \\ \vdots \\ R_n \end{array} \right]

Since we can swap rows without altering the determinant, this result holds for any rows.  A similar argument shows the result also holds for columns.

June 13, 2009


Filed under: Algebra,Linear Algebra — cjohnson @ 9:10 pm

I remember that when I took linear algebra, I had learned determinants in a very “algorithmic” sort of way; a determinant to me was a function defined on square matrices by a particular recursive procedure.  In Charles Cullen’s Matrices and Linear Transformations, however, he defines a determinant not by a rule, but by two properties which completely characterize the determinant.  A determinant, according to Cullen, is an \mathcal{F}_{n \times n} \to \mathcal{F} function which satisfies the following.

  1. For any two n \times n matrices A and B, \det(AB) = \det(A) \det(B)
  2. The determinant of \text{diag}(k, 1, 1, ..., 1) is k.

That’s it.  That’s all you need to define the determinant.  Of course, now we have to worry about whether a function with these properties even exists or not, and if so, is that function is unique?  Before answering either of those questions, though, we need to establish that every n \times n matrix can be written as a product of elementary matrices (where by an elementary matrix we mean one which results in an elementary row (or column) operation when multiplied on the left (or right), including zeroing out a row or column).  To see this, recall that for every matrix A there exist non-singular matrices P and Q such that

\displaystyle PAQ = \left[ \begin{array}{cc} I_r & 0 \\ 0 & 0 \end{array} \right]

where r = \text{rank}(A).  In the above, P represents a sequence of elementary row operations, and Q gives a sequence of elementary column operations; P and Q are the products of elementary matrices.  Since these are non-singular we have

\displaystyle A = P^{-1} \left[ \begin{array}{cc} I_r & 0 \\ 0 & 0 \end{array} \right] Q^{-1}

The middle matrix is clearly a product of matrices of the form \text{diag}(1, ..., 1, 0, 1, ..., 1).  If we agree to also call such matrices elementary, then every square matrix is a product of elementary matrices.

Supposing we have a \det function satisfying the properties given above, we can calculate the determinant of an elementary matrix pretty easily.  In the following, I_{(kR_i)} means we multiply the i-th row by k in the identity matrix; I_{(R_i \leftrightarrow R_j)} means we swap rows i and j; I_{(kR_i + R_j)} means we add k times row i to row j.

First we’ll look at scalar multiples of a particular row:

\displaystyle \det\left( I_{(kR_i)} \right)

\displaystyle \, = \det\left( I_{(R_i \leftrightarrow R_1)} \, I_{(kR_1)} \, I_{(R_i \leftrightarrow R_j)} \right)

\displaystyle \, = \det \left( I_{(R_i \leftrightarrow R_1)} \right) \, \det \left( I_{(kR_1)} \right) \, \det \left( I_{(R_i \leftrightarrow R_1)} \right)

\displaystyle \, = \det \left( I_{(kR_1)} \right) \, \det \left( I_{(R_i \leftrightarrow R_1)} \right) \, \det \left( I_{(R_i \leftrightarrow R_1)} \right) (these are scalars in a field, so they commute)

\displaystyle \, = \det \left( I_{(kR_1)} \right) \, \det \left( I_{(R_i \leftrightarrow R_1)} \, I_{(R_i \leftrightarrow R_1)} \right)

\displaystyle \, = \det \left( I_{(kR_1)} \right) \, \det \left( I \right)

\displaystyle \, = \det \left( I_{(kR_1)} \right)

\displaystyle \, = k

Using similar identities we can show \det \left( I_{(R_i \leftrightarrow R_j)} \right) = -1 and \det \left( I_{kR_i + R_j} \right) = 1.  Now we know the determinants for all elementary matrices.  Since every square matrix is a product of elementary matrices, and we can split the determinant up across a product of matrices, we can even calculate the determinant of any square matrix based solely on the two properties given earlier.  (Note this isn’t necessarily an efficient way to calculate the determinant, just a possible way.)

At this point it shouldn’t seem too surprising that the two properties above are all we need to define a determinant: every square matrix is a product of elementary matrices, and we can “massage” elementary matrices into a form whose determinant we can calculate.  In coming posts we’ll see how to expand this to define the Laplace / cofactor expansion for the determinant; the relationship between determinants and inverses; and Cramer’s rule, which tells us how to compute a single coordinate in the solution vector to a system of equations.

Change of Basis

Filed under: Algebra,Linear Algebra — cjohnson @ 4:19 pm

In any non-trivial vector space there will be several possible bases we could pick.  In \mathbb{R}^3, for instance, we could use \{ [1, 0, 0]^T, [0, 1, 0]^T, [0, 0, 1]^T \} or \{ [1, 2, 0]^T, [3, 0, 1]^T, [4, 2, -1]^T \}.  This first basis is known as the standard basis, and in general for an n-dimensional vector space over \mathcal{F}, we’ll refer to \{ [1, 0, 0, ..., 0]^T, [0, 1, 0, ..., 0]^T, ..., [0, ..., 0, 1]^T \} as the standard basis, and will let e_i denote the vector with a 1 in the i-th position, and zeros elsewhere.

When we write down a vector like [1, 2, 3]^T we implicitly mean that these are the coordinates to use with the vectors in the standard basis; these are the coefficients we multiply the e_i basis vectors by to get describe our vector.

\displaystyle \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right] = 1 \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]+ 2 \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right] + 3 \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]

But how would we find the appropriate coordinates if we were to use \{ [1, 2, 0]^T, [3, 0, 1]^T, [4, 2, -1]^T \} as our basis?  Let’s suppose our coefficients are \alpha_1, \alpha_2, \alpha_3.  Then what we want to do is find the values of the \alpha_i such that

\displaystyle \alpha_1 \left[ \begin{array}{c} 1 \\ 2 \\ 0 \end{array} \right] + \alpha_2 \left[\begin{array}{c}3 \\ 0 \\ 1\end{array}\right] + \alpha_3\left[\begin{array}{c}4 \\ 2 \\ -1\end{array}\right] = \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array}\right]

So what we have is a system of equations:

\displaystyle \left[ \begin{array}{ccc} 1 & 3 & 4 \\ 2 & 0 & 2 \\ 0 & 1 & -1 \end{array} \right] \left[ \begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array} \right]= \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right]

Notice that since the columns of this matrix form a basis for \mathbb{R}^3, the matrix is invertible, and so we can easily solve for the \alpha_i.

\displaystyle \left[ \begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array} \right] = \left[ \begin{array}{ccc} 1 & 3 & 4 \\ 2 & 0 & 2 \\ 0 & 1 & -1 \end{array} \right]^{-1} \left[ \begin{array}{c} 1 \\ 2 \\ 3 \end{array} \right] = \left[ \begin{array}{c} 5/2 \\ 3/2 \\ -3/2\end{array} \right]

In general, if we have a basis A = \{ a_1, ..., a_n \}, we will write

\displaystyle \left[ \begin{array}{c} \alpha_1 \\ \vdots \\ \alpha_n \end{array} \right]_A

to mean that the \alpha_i are the coefficients of the a_i vectors in our A basis, and will let _AI be the matrix converts the coordinates for the standard basis to coordinates in our A basis.  Likewise, I_A will be the matrix which converts coordinates from the A basis to coordinates with the standard basis.

Generalizing on the argument above we see that we can calculate I_A by simply using our basis vectors as our columns.  For _AI we take the inverse of this matrix.

\displaystyle I_A = \left[ \, \left. a_1 \, \right| \, \left. a_2 \, \right| \, \left. \cdots \, \right| \, a_n \, \right]

\displaystyle _AI = \left[ \, \left. a_1 \, \right| \, \left. a_2 \, \right| \, \left. \cdots \, \right| \, a_n \, \right]^{-1}

Now suppose we have two bases, A = \{ a_1, ..., a_n \} and B = \{ b_1, ..., b_n \}.  The matrix which will take coordinates with respect to the A basis and convert them into coordinates for the B basis is denoted _BI_A.  One way to calculate _BI_A is to convert our A-coordinates into standard coordinates, and then convert those into B-coordinates:

\displaystyle _BI_A = _BI \, I_A

Alternatively, we could note that

\displaystyle [ a_i ]_B = _BI_A [ a_i ]

\displaystyle \, = _BI_A \left[ \begin{array}{c} 0 \\ \vdots \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{array} \right]_A (with the 1 in the i-th spot)

\displaystyle \, = \left(_BI_A\right)_{*i} (recall M_{*i} is the notation I use for the i-th column of M)

So the i-th column of _BI_A is simply i-th vector from our A basis, but in B-coordinates.

June 11, 2009

The Rank-Nullity Theorem

Filed under: Algebra,Linear Algebra — cjohnson @ 10:02 pm
Tags: , ,

In the last post we defined the column and row space of a matrix as the span of the columns (in the case of the column space) or rows (for the row space) of the matrix.  There’s a third important subspace of a matrix, called the null space, which is the set of all vectors which A maps to zero.  That is, if we think of an m \times n matrix A as a function from \mathcal{F}_{n \times 1} to \mathcal{F}_{m \times 1}, then the null space of A, \text{NS}(A), is simply the kernel of the map.

\displaystyle \text{NS}(A) := \{ X \in \mathcal{F}_{n \times 1} : AX = 0 \}

The dimension of the null space is sometimes called the nullity of the matrix.  There’s an important relationship between the column space, row space, and null space which we’ll now state and prove: if A is an m \times n matrix, then \dim(\text{CS}(A)) + \dim(\text{NS}(A)) = n.

We begin by assuming \{ v_1, ..., v_t \} is a basis for \text{NS}(A).  Since \text{NS}(A) is a subspace of \mathcal{F}_{n \times 1}, we may extend \{ v_1, ..., v_t \} to a basis for all of \mathcal{F}_{n \times 1} by adding r := n - t properly chosen vectors.  Call these vectors u_1, ..., u_r so that \{ v_1, ..., v_t, u_1, ..., u_r \} is a basis for \mathcal{F}_{n \times 1}.  Since this is a basis, let w \in \mathcal{F}_{n \times 1} and suppose

\displaystyle w = \alpha_1 v_1 + ... \alpha_t v_t + \beta_1 u_1 + ... + \beta_r u_r

Now we multiply w on the left by A.  Since Aw \in \text{CS}(A) we have

\displaystyle Aw = A( \alpha_1 v_1 + ... + \alpha_t v_t + \beta_1 u_1 + ... + \beta_r u_r)

\displaystyle = \alpha_1 A v_1 + ... \alpha_t A v_t + \beta_1 A u_1 + ... + \beta_r A u_r

\displaystyle = \beta_1 A u_1 + ... + \beta_r A u_r

where the last step follows from the fact each of v_1, ..., v_t are mapped to zero by A.

Since our choice of w was arbitrary, \{ Au_1, ..., Au_r \} spans \text{CS}(A).  If we can now show that \{ Au_1, ..., Au_r \} are linearly independent we’ll have that \dim(\text{CS}(A)) = r and have proven our theorem.

So now we want to find the \gamma_1, ..., \gamma_r such that

\displaystyle \gamma_1 A u_1 + ... + \gamma_r A u_r = 0

\displaystyle \implies A ( \gamma_1 u_1 + ... + \gamma_r u_r) = 0

\displaystyle \implies \gamma_1 u_1 + ... + \gamma_r u_r \in \text{NS}(A)

\displaystyle \implies \gamma_1 u_1 + ... + \gamma_r u_r = \delta_1 v_1 + ... + \delta_t v_t

\displaystyle \implies \gamma_1 u_1 + ... + \gamma_r u_r - \delta_1 v_1 - ... - \delta_t v_t = 0

Now since \{ v_1, ..., v_t, u_1, ..., u_r \} is a basis for \mathcal{F}_{n \times 1}, we have that \gamma_i = \delta_j = 0, so \{ Au_1, ..., Au_r \} is a linearly independent set which spans \text{CS}(A), so \dim(\text{CS}(A)) = r where r = t - n.  Since t = \dim(\text{NS}(A)), we have our result.

Notice that if Ax = 0, clearly PAx = 0, so that \text{NS}(A) \subseteq \text{NS}(PA).  If P is non-singular, however, it has a trivial null space and so \text{NS}(A) = \text{NS}(PA).  Combining this with the above theorem we have that if P is non-singular,

\displaystyle \dim(\text{CS}(A)) = n - \dim(\text{NS}(A))

\displaystyle = n - \dim(\text{NS}(PA))

\displaystyle = \dim(\text{CS}(PA))

Likewise, \displaystyle \dim(\text{RS}(A)) = \dim(\text{RS}(AQ)) if Q is non-singular.  (Apply the previous argument with \text{CS}(Q^T A^T).)

Picking up from last time, we have if P and Q are the non-singular matrices such that \displaystyle PAQ = \left[ \begin{array}{cc} I_r & 0 \\ 0 & 0 \end{array} \right], then

\displaystyle \dim(\text{CS}(A)) = \dim(\text{CS}(AQ))

\displaystyle = \dim(\text{CS}(PAQ))

\displaystyle = r

\displaystyle = \dim(\text{RS}(PAQ))

\displaystyle = \dim(\text{RS}(PA))

\displaystyle = \dim(\text{RS}(A))

So the dimension of the column space equals the dimension of the row space.  This common value is called the rank of A, and is denoted \text{rank}(A).  For this reason the theorem we proved above is known as the rank-nullity theorem.

Column Space and Row Space

Filed under: Algebra,Linear Algebra — cjohnson @ 8:06 pm

Suppose we have a system of equations with m equations and n unknowns.  How can we tell if this system has a solution?  Writing the system as Ax = b, we see that the left-hand side gives us a particular linear combination of the columns of A.  That is, the system Ax = b can be thought of as

\displaystyle x_1 \left[ \begin{array}{c} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{array} \right] + x_2 \left[ \begin{array}{c} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{array} \right] + ... + x_n \left[\begin{array}{c} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{array} \right] = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \end{array} \right]

And so we see that the system has a solution if and only if b can be expressed as a linear combination of the columns of A.  In fact, taking the set of all possible b vectors for which Ax = b has a solution, we have a vector space called the column space of A, which we’ll normally denote \text{CS}(A).  There are a couple of ways we can think of the column space:

  1. The set of all linear combinations of the columns of A
  2. The set of all vectors b such that Ax = b has a solution
  3. The set of all Ax for any n \times 1 vector x

These are all equivalent, but sometimes it may be more helpful to think of the column space in one way than another.

A similar concept is that of the row space, which is the set of all linear combinations of the rows of A and denoted \text{RS}(A).  Notice that \text{RS}(A) = \text{CS}(A^T).

Now suppose that we multiply A by a matrix Q on the right.  How might the column space of AQ differ from A?  Using the third characterization of a column space above, and assuming Q is n \times p, we have that

\displaystyle \text{CS}(AQ) = \{ (AQ)X : X \in \mathcal{F}_{p \times 1} \}

\displaystyle = \{ A(QX) : X \in \mathcal{F}_{p \times 1} \}

\displaystyle \subseteq \{ AY : Y \in \mathcal{F}_{n \times 1} \}

So the column space of AQ is contained in the column space of A, and in particular \dim(\text{CS}(AQ)) \leq \dim(\text{CS}(A)).  The above is telling us that linear combinations of the columns of AQ are actually linear combinations of the columns of A.  One way to see this is to realize that each column of AQ is actually a linear combination of combination of columns of A.  Letting A_{*i} denote the i-th column of A, we can write AQ as follows.

\displaystyle AQ = \left[ \left. \sum_{i=1}^n q_{i,1} A_{*i} \, \right| \, \left. \sum_{i=1}^n q_{i,2} A_{*i} \, \right| \, ... \, \left| \, \sum_{i=1}^n q_{i,p} A_{*i} \right. \right]

Now if we take a linear combination of these columns we have

\displaystyle c_1 \left( \sum_{i=1}^n q_{i,1} A_{*i} \right) + ... + c_p \left( \sum_{i=1}^n q_{i,p} A_{*i} \right)

\displaystyle = \sum_{j=1}^p c_j \left( \sum_{i=1}^n q_{i,j} A_{*i} \right)

\displaystyle = \sum_{j=1}^p \sum_{i=1}^n c_j q_{i,j} A_{*i}

\displaystyle = \sum_{i=1}^n \left( \sum_{j=1}^p c_j q_{i,j} \right) A_{*i}

Our choice of the c_j values is arbitrary — they can be whatever we’d like them to be — but the q_{i,j} are fixed since they’re entries in Q.  This restricts the possible linear combinations of the rows, so \text{CS}(AQ) \subseteq \text{CS}(A).  Repeating the same argument but using rows instead of columns, or just using the above argument with \text{RS}(A) = \text{CS}(A^T), we have that \text{RS}(PA) \subseteq \text{RS}(A).  Notice if Q (or P) is non-singular, then we have CS(A) \subseteq CS((AQ)Q^{-1}).  Combined with the above we have equality when P or Q is non-singular.

Now what we’d like to show is that the dimension of the row space and the dimension of the column space are actually equal for any matrix A.  Simply pick P to be the nonsingular matrix so that PA is A in row-reduced echelon form, and pick Q to be the nonsingular matrix so that AQ is A in column-reduced echelon form.  Notice then

PAQ = \left[ \begin{array}{cc} I_r & 0 \\ 0 & 0 \end{array} \right]

where I_r is the r \times r identity matrix for some r.  Since matrix multiplication is associative, PAQ = (PA)Q.  Putting the matrix into row-reduced echelon form with the PA moves all the non-zero rows to the top with each row having a leading one and the rest of that column zeroed out.  When we multiply this by Q we put the matrix into column-reduced form which moves all the non-zero columns to the left and zeros out everything on in the rows with leading ones.  With this matrix we clearly have \dim(\text{CS}(PAQ)) = \dim(\text{RS}(PAQ)) = r.  We can show that when P and Q are both nonsingular that \dim(\text{CS}(A)) = \dim(\text{CS}(PAQ)), using the rank-nullity theorem which we’ve yet to prove.

Next Page »

Blog at