Mathematics Prelims

June 11, 2009

Column Space and Row Space

Filed under: Algebra, Linear Algebra — cjohnson @ 8:06 pm

Suppose we have a system of equations with m equations and n unknowns.  How can we tell if this system has a solution?  Writing the system as Ax = b, we see that the left-hand side gives us a particular linear combination of the columns of A.  That is, the system Ax = b can be thought of as

\displaystyle x_1 \left[ \begin{array}{c} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{array} \right] + x_2 \left[ \begin{array}{c} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{array} \right] + ... + x_n \left[\begin{array}{c} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{array} \right] = \left[ \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \end{array} \right]

And so we see that the system has a solution if and only if b can be expressed as a linear combination of the columns of A.  In fact, taking the set of all possible b vectors for which Ax = b has a solution, we have a vector space called the column space of A, which we’ll normally denote \text{CS}(A).  There are a couple of ways we can think of the column space:

  1. The set of all linear combinations of the columns of A
  2. The set of all vectors b such that Ax = b has a solution
  3. The set of all Ax for any n \times 1 vector x

These are all equivalent, but sometimes it may be more helpful to think of the column space in one way than another.

A similar concept is that of the row space, which is the set of all linear combinations of the rows of A and denoted \text{RS}(A).  Notice that \text{RS}(A) = \text{CS}(A^T).

Now suppose that we multiply A by a matrix Q on the right.  How might the column space of AQ differ from A?  Using the third characterization of a column space above, and assuming Q is n \times p, we have that

\displaystyle \text{CS}(AQ) = \{ (AQ)X : X \in \mathcal{F}_{p \times 1} \}

\displaystyle = \{ A(QX) : X \in \mathcal{F}_{p \times 1} \}

\displaystyle \subseteq \{ AY : Y \in \mathcal{F}_{n \times 1} \}

So the column space of AQ is contained in the column space of A, and in particular \dim(\text{CS}(AQ)) \leq \dim(\text{CS}(A)).  The above is telling us that linear combinations of the columns of AQ are actually linear combinations of the columns of A.  One way to see this is to realize that each column of AQ is actually a linear combination of combination of columns of A.  Letting A_{*i} denote the i-th column of A, we can write AQ as follows.

\displaystyle AQ = \left[ \left. \sum_{i=1}^n q_{i,1} A_{*i} \, \right| \, \left. \sum_{i=1}^n q_{i,2} A_{*i} \, \right| \, ... \, \left| \, \sum_{i=1}^n q_{i,p} A_{*i} \right. \right]

Now if we take a linear combination of these columns we have

\displaystyle c_1 \left( \sum_{i=1}^n q_{i,1} A_{*i} \right) + ... + c_p \left( \sum_{i=1}^n q_{i,p} A_{*i} \right)

\displaystyle = \sum_{j=1}^p c_j \left( \sum_{i=1}^n q_{i,j} A_{*i} \right)

\displaystyle = \sum_{j=1}^p \sum_{i=1}^n c_j q_{i,j} A_{*i}

\displaystyle = \sum_{i=1}^n \left( \sum_{j=1}^p c_j q_{i,j} \right) A_{*i}

Our choice of the c_j values is arbitrary — they can be whatever we’d like them to be — but the q_{i,j} are fixed since they’re entries in Q.  This restricts the possible linear combinations of the rows, so \text{CS}(AQ) \subseteq \text{CS}(A).  Repeating the same argument but using rows instead of columns, or just using the above argument with \text{RS}(A) = \text{CS}(A^T), we have that \text{RS}(PA) \subseteq \text{RS}(A).  Notice if Q (or P) is non-singular, then we have CS(A) \subseteq CS((AQ)Q^{-1}).  Combined with the above we have equality when P or Q is non-singular.

Now what we’d like to show is that the dimension of the row space and the dimension of the column space are actually equal for any matrix A.  Simply pick P to be the nonsingular matrix so that PA is A in row-reduced echelon form, and pick Q to be the nonsingular matrix so that AQ is A in column-reduced echelon form.  Notice then

PAQ = \left[ \begin{array}{cc} I_r & 0 \\ 0 & 0 \end{array} \right]

where I_r is the r \times r identity matrix for some r.  Since matrix multiplication is associative, PAQ = (PA)Q.  Putting the matrix into row-reduced echelon form with the PA moves all the non-zero rows to the top with each row having a leading one and the rest of that column zeroed out.  When we multiply this by Q we put the matrix into column-reduced form which moves all the non-zero columns to the left and zeros out everything on in the rows with leading ones.  With this matrix we clearly have \dim(\text{CS}(PAQ)) = \dim(\text{RS}(PAQ)) = r.  We can show that when P and Q are both nonsingular that \dim(\text{CS}(A)) = \dim(\text{CS}(PAQ)), using the rank-nullity theorem which we’ve yet to prove.

June 10, 2009

Every Vector Space Has a Basis

Filed under: Algebra, Linear Algebra — cjohnson @ 8:12 pm
Tags: , , , ,

I just realized a few hours ago that the algebra prelim is six weeks from today, and went into “that’s way closer than I realized”-freakout mode.  Given that, I’d like to start posting on algebra.  In particular, for the next little bit I want to post on linear algebra since that’s what I’ve been studying the most recently.  A nice place to start might be showing that every vector space has a basis, since this is a fact that I remember being used throughout my linear algebra classes but don’t seem to remember ever being shown (presumably to avoid talking about the axiom of choice and Zorn’s lemma).  This proof is adopted from the one that appears in an appendix to Larry Grove’s Algebra.

Recall that Zorn’s lemma says that in a partially ordered set in which every chain has an upper bound, there exists a maximal element.  We will use this to show that there must be a maximal linearly independent subset in a vector space, and argue that this set is a basis.

Let V be any vector space and define the set L as the collection of all linearly independent subsets of V.

\displaystyle L := \{ S \subset V : S \text{ is a linearly independent set } \}

Note that as L is a collection of sets, it is partially ordered by \subseteq.  Now let C be any chain in L.  Each element of this chain, being a linearly independent set of vectors in V, forms the basis for some subspace of V.  Furthermore, if C_1 and C_2 are in the chain with C_1 \subseteq C_2, the subspace of V generated by C_1 is contained in the subspace generated by C_2.

Define a set U as follows:

\displaystyle U := \bigcup_{S \in C} S

Defining U in this way, we have a collection of vectors in V in which every subset is linearly independent, and so U \in L.  This means that C has an upper bound.  Since our choice of the chain C was arbitrary, this means every chain in L has an upper bound.  By Zorn’s lemma this means L has a maximal set.  Let M be such a maximal set (recall Zorn’s lemma says that L will have at least one maximal set, and not that the maximal set is unique).

Since M \in L, M is a linearly independent set of vectors in V.  This means that the span of M is some subspace of V.  Suppose this subspace is proper.  Then there exists some vector, call it v, which can not be represented as a linear combination of the elements in M.  This implies that M \cup \{v\} is a linearly independent set.  However, as M \subsetneq (M \cup \{v\}) this contradicts the maximality of M.  We must conclude that V equals the span of M, and so M forms a basis for V.

Extreme Points, Part 1 — Geometric Intuition

Filed under: Linear Programming — cjohnson @ 4:51 pm

Recall from earlier that in a linear program we have a set of constraints, Ax \leq b, each of which defines a half-space.  The intersection of all these half-spaces, along with our standing assumption x \geq 0 defines our feasible region: the region in space where satisfying all of our constraints.  In general, the intersection of a finite number of half-spaces is called a polyhedral set, and if the set is bounded set call it a polytope.

One thing to notice about polyhedral sets is that they’re convex.  To see this, suppose that our polyhedral set is defined by Ax \leq b.  Now let x_0 and x_1 be two points satisfying these constraints (in each of the corresponding half-spaces).  Let \lambda \in [0, 1].  We have the following.

\displaystyle A(\lambda x_0 + (1 - \lambda) x_1) = A (\lambda x_0) + A((1 - \lambda) x_1)

\displaystyle \, \, \, = \lambda A x_0 + (1 - \lambda) A x_1

\displaystyle \, \, \, \leq \lambda b + (1 - \lambda) b

\displaystyle \, \, \, = (\lambda + 1 - \lambda) b

\displaystyle \, \, \, = b

So A(\lambda x_0 + (1 - \lambda) x_1) \leq b and the set is convex.

Now, if we had only a few (two or three) variables, then we could explicitly graph the polyhedral set and we’d notice that it has some sharp “corners” when two lines (or planes) that form the boundary of a half-space intersect.  These points are of interest to us because if an optimal solution exists it must occur at one of these corner points.

As an example, consider the following linear program:

Maximize \displaystyle \frac{1}{2} x_1 + x_2

Subject to the following:

\displaystyle x_2 \leq 3

\displaystyle x_1 + x_2 \leq 5

\displaystyle x_1, x_2 \geq 0

Graphing these we have

Extreme Points

The extreme points for our feasible region are the five corners above, each of which is the intersection of two of the lines determined by our constraints.  Now let’s consider all of the points where our objective function would be equal to 2.  These are the points satisfying the equation \frac{1}{2} x_1 + x_2 = 2 — all such points lie on a line, and in particular are the points on the red line below.

z=2Note that if we were to consider the points where our objective function is 3 (the points lying on the line \frac{1}{2} x_1 + x_2 = 3), we’d have a similar picture, but with our line simply shifted up one unit:

z=3Remember that our goal is to maximize our objective function while satisfying all of the constraints / staying within the feasible region.  Geometrically, then, what we should do is move this line up as far as we can — as our objective increases when we shift the line up — and stay in the feasible region.  Each of these lines is called an objective contour.  Looking at the set of all possible values the objective function could take on we have something similar to the following (I say “something similar” because the picture would just be solid red if we looked at all the contours).

contoursFrom this picture we see that the optimal solution occurs at the top-right extreme point.  Since an extreme point is the intersection of two lines, we can simply calculate the intersection point of those two lines to see to find the precise point where optimality is obtained (these are called binding constraints since they’re satisfied by equality).

Those lines are x_2 = 3 and x_1 + x_2 = 5, so optimality occurs at (x_1, x_2) = (2, 3) and our objective function takes on the value 4.

This geometric argument is nice and intuitive, but falls apart if we have more than three variables (since we’re in at least four dimensional space where such pictures can’t be drawn).  In the next post we’ll see how to characterize the extreme points of the feasible region algebraically.  Later we’ll see how to traverse the extreme points of the feasible region, using that algebraic description, in such a way that we always move to a “closer to optimal” extreme point, until we finally get there (or determine no such optimal point exists).

June 9, 2009

An Quick Aside on Convexity

Filed under: General, Linear Programming — cjohnson @ 9:19 pm

I’ve been meaning to write a post on the extreme points of a feasible set for the last few days, and in that post I need to talk about convexity.  Originally I was going to just define convex set and convex combination inside of that post, but decided it might be better for those to be defined in their own post.  It’s been busy at school the last couple of days, but I’m still hoping to get a couple more LP posts written this week.

Intuitively a convex set is one which doesn’t have any “inward bumps.”  In the image below, the region on the left is convex while the region on the right is not.Convex Sets

To say the set doesn’t have any “inward bumps” is rather vague, though.  A more precise notion is to say that for any two points in the set, the line segment connecting those points lies entirely in the set.  We see the region on the left has this property, while the region on the right does not: pick two points in opposite “legs,” and the line segment joining them will not lie completely within the region.  Of course, when we start talking about higher dimensional spaces simply drawing the line segment doesn’t really work.  What we’ll do instead is pick tow points, say x_0 and x_1 and take a convex combination of these points.  If all such convex combinations lie in the set, the set is convex.

By a convex combination of the points x_0 and x_1, we mean the point \lambda x_0 + (1 - \lambda) x_1 where \lambda \in [0, 1].  (Recall that we’re dealing with points inside of a vector space, so by adding two points we’re adding the associated vectors.)  One important thing to note about convex sets is that if X is convex, then for any finite collection of points x_1, x_2, ..., x_n \in X we have that

\displaystyle \sum_{i=1}^n \lambda_i x_i \in X

where

\displaystyle \sum_{i=1}^n \lambda_i = 1

\displaystyle \lambda_1, ..., \lambda_n \in [0, 1]

We will show this inductively where our base case, n = 2, follows immediately from our definition of convexity.  Now assume for the result holds for n.  We will show the result holds for n+1:  Let x_1, ..., x_{n+1} \in X and \lambda_1, ..., \lambda_{n+1} \in [0, 1] with \sum_{i=1}^{n+1} \lambda_i = 1.  We begin by rewriting our sum:

\displaystyle \sum_{i=1}^{n+1} \lambda_i x_i = \lambda_1 x_1 + ... + \lambda_n x_n + \lambda_{n+1} x_{n+1}

\displaystyle = \frac{1 - \lambda_{n+1}}{1 - \lambda_{n+1}} \left( \lambda_1 x_1 + ... \lambda_n x_n \right) + \lambda_{n+1} x_{n+1}

\displaystyle = (1 - \lambda_{n+1}) \left( \frac{\lambda_1}{1 - \lambda_{n+1}} x_1 + ... + \frac{\lambda_n}{1 - \lambda_{n+1}} x_n \right) + \lambda_{n+1} x_{n+1}

Notice that \sum_{i=1}^{n+1} \lambda_i = 1 implies \sum_{i=1}^n \lambda_i = 1 - \lambda_{n+1}, so the sum of the coefficients in the expression inside parenthesis is one.  Since each of these x_i points is in our convex set X, we have a convex combination and thus the sum is in the set — call this point \bar{x}.  Now we have (1 - \lambda_{n+1}) \bar{x} + \lambda_{n+1} x_{n+1} is a convex combination of points inside X, and so we have our result.

June 5, 2009

LP Part 3: The Feasible Region

Filed under: Linear Programming — cjohnson @ 3:18 pm

Suppose we want to consider all of the points in the plane that satisfy the inequality y \leq x + 1.  What does such a set look like? Well, one way to visualize it is to draw the line y = x + 1 and then take everything under or on the line.  This is an example of a half-space: a “line” that cuts the space in two, and then we take everything on one side of that “line.”  I write “line” because in higher dimensions this isn’t actually a line.  In three dimensions, for instance, it’s a plane.  In four dimensions it’s a hyperplane.  We’ll use the word hyperplane to just mean a linear equality regardless of dimension.  Recall that a linear equality is simply an equation of the form c_1 x_1 + c_2 x_2 + ... + c_n x_n = b; no exponents, no logs or square roots, just constant multiples added up.  And so a half-space is everything on one side of a hyperplane.

In a linear program we tend to have a few of these linear inequalities  in the form of our constraints.  When we actually write out Ax \leq b, we see that each row in A corresponds to a particular inequality and so each constraint defines a half-space.  We’re only concerned with points that satisfy all of the constraints, however, and thus want our optimal solution to occur within the intersection of these half-spaces.  This intersection of all the half-spaces defined by our constraints is called the feasible region of the program.

As far as the feasible region is concerned, there are four cases we need to be aware of when it comes time to actually solve a linear program: the feasible region is empty; the feasible region is not empty, but is bounded; the region is not bounded, but the objective function is;  neither the feasible region nor the objection function are bounded.

The first case occurs when our constraints “cancel each other out.”  For instance if we had the constraints 3x_1 + 2x_2 - 5x_3 \geq 3 and 3x_1 + 2x_2 + 5x_3 \leq 2.  The first constraint puts us on one half of a hyperplane, and then the second tries to get us back on the other side.  Since both of these inequalities can’t hold at the same time, there are no points inside our feasible region.  That’s a simple example of an infeasible LP.  Another, slightly more complicated example occurs if our constraints are

\left[ \begin{array}{cc} 1 & -1 \\ -1 & -1 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \geq \left[ \begin{array}{c} -1 \\ -2 \\ 2 \end{array}\right]

Multiplying this out, we have the following.

An Infeasible Region

An Infeasible Region

In the above picture, areas that satisfy an inequality are shaded in blue, and areas satisfying two equalities are in a dark blue (the shading is additive).  Notice that no area is in a blue any darker than the areas satisfying two inequalities; no area satisfies all three inequalities.  Since no point satisfies all three inequalities, our feasible region is empty.

We’ll say the feasible region is bounded if we can place a circle of finite radius around it, encompassing the entire feasible region.  If the feasible region is bounded, then our objective must be bounded.  As we will see later, our objective is minimized or maximized only at “corners” of the feasible region.  Such a corner occurs only when some number of our constraints are satisfied by equality (the particular number depending on the dimension of the problem).  In two dimensions, for instance, corners occur when two lines cross, and we’re only on a line if we satisfy a constraint with equality.  In three dimensions we’d need three planes to intersect.  Since we only concern ourselves with problems which have a finite number of constraints, we only have a finite number of these corners (this number may be huge, though, so just enumerating all possible corners isn’t a reasonable strategy).

If our feasible region is unbounded, however, our objective function may or may not be bounded.  It could be that we can make the objective arbitrarily large by picking numbers that are farther and farther out, but it could also be that picking huge numbers isn’t particularly helpful.  For example, suppose our only restrictions were x_1, x_2 \geq 0, so we can pick any pair of points in the first quadrant.  If our goal is to maximize x_1 + x_2, then we can make this value as big as we want by picking big enough numbers.  If our goal is to minimize x_1 + x_2, however, then the best we can do is to pick x_1 = x_2 = 0.

For a slightly less trivial example, suppose we want to maximize x_2 - x_1 subject to the constraints x_1 \geq 0, x_2 \geq 0, x_2 \leq 1.  Now our feasible region is an infinite strip, a portion of which is given below.

Unbounded Feasible Region, Bounded Objective

Despite being able to pick very large numbers, our maximum occurs at the point x_1 = 0, x_2 = 2.  If we were to increase x_1 any our value would drop since we’re subtracting x_1, and we’ve already made x_2 as large as we can.  So we have a bounded objective function, even though our feasible region is unbounded.  We will generally refer to a linear program as being bounded if it’s objective function is bounded, and unbounded if it’s objective function is unbounded.  Note that for the LP to be unbounded, the feasible region must be unbounded.  If the LP is bounded, however, the feasible region may be either bounded or unbounded.

In the next post we’ll look at the extreme points of the feasible region, which are the “corners” alluded to earlier, and give an argument for why the maximum/minimum must occur at an extreme point.

June 4, 2009

Linear Programming, Part II – Slack and Surplus

Filed under: Linear Programming — cjohnson @ 6:57 pm

Recall from last time that a linear program is an optimization problem of the form

Maximize c^T x

Subject to Ax \leq b and x \geq 0

Where the system of constraints Ax \leq b is just a convenient way to write something of the form

\displaystyle a_{11} x_1 + a_{12} x_2 + ... + a_{1n} x_n \leq b_1

\displaystyle a_{21} x_1 + a_{22} x_2 + ... + a_{2n} x_n \leq b_2

\displaystyle \vdots

\displaystyle a_{m1} x_1 + a_{m2} x_2 + ... + a_{mn} x_n \leq b_m

Notice that our use of maximization, instead of minimization, is immaterial as any maximization problem can be turned into an equivalent minimization problem and vice versa.  The problem of maximizing f(x) is equivalent (has precisely the same solutions) as the problem of minimizing -f(x).

Likewise, our use of \leq inequalities instead of \geq is more a matter of taste and convenience than necessity: f(x) \leq b is the same thing as -f(x) \geq -b.  What we would like to do, though, is get rid of those inequalities altogheter.  If we can turn these constraints into a system of equations of the form Ax = b, then we’ll be able to apply techniques from linear algebra to help us manipulate the LP (linear program).  Later we’ll see how to cleverly use elementary row operations to traverse the set of potential optimal solutions.  But first we have to lose those nasty little inequalities.

So here’s the idea: say we have x \leq 5.  We can turn this into an equation by inserting a non-negative variable, y, which equals 5 - x.  That is, y takes up the slack left by x, and x \leq 5 becomes x + y = 5.  Algebraically, if we set y = 5 - x, we have x + y = x + (5 - x) = 5 and that’s why we have equality.

If our inequality goes the other way, say x \geq 7, we’ll just subtract off the surplus: set y = x - 7 (note that this means y \geq 0) and so x \geq 7 becomes x - y = 7.  So in the system of inequalities above we can get a system of equations by tacking on a slack, or taking out the surplus, for each inequality.  The earlier system then becomes

\displaystyle a_{11} x_1 + a_{12} x_2 + ... + a_{1n} x_n + s_1 = b_1

\displaystyle a_{21} x_1 + a_{22} x_2 + ... + a_{2n} x_n + s_2 = b_2

\displaystyle \vdots

\displaystyle a_{m1} x_1 + a_{m2} x_2 + ... + a_{mn} x_n + s_m = b_m

Where s_i is the slack (surplus) we add (subtract) for the i-th constraint.  Now our matrices will of course have to change a little bit to take these new variables into account.  Originally, A was an m \times n matrix, x was a n-vector, and b was a m-vector.  For our system of equations we’ll let \hat{A} = [ A, \, I ] and \hat{x} = [ x, \, s_1, \, ..., \, s_m ]^T where I is the m \times m identity matrix; note that b doesn’t change.  Note too that we append the identity because we have only slack variables; if some of those had been surplus variables, the corresponding entries we append would be negative.

So by throwing in an extra variable for each inequality constraint we have a system of equations which is amenable to the techniques of linear algebra.  If it were to happen that the resultant matrix was invertible (which requires it to be square — an equal number of equations and unknowns) then we could just multiply by the inverse and find the solution.  This doesn’t work for our general problem though since there will normally be infinitely many possible solutions.  Our goal will be to weed out the best solution from those possibilities.  In the next post we’ll look at this space of possible solutions.

June 3, 2009

An Introduction to Linear Programming, Part I

Filed under: Linear Programming — cjohnson @ 8:42 pm

In a first semester calculus class you learn how to optimize a differentiable function by looking at what happens where the derivative is zero (and the endpoints if you’re concerned with a closed interval).  The idea being what comes up must come down: if the function goes up and then goes down, a maximum must occur somewhere between (and similarly for a minimization problem), so you look for the precise point that “up” turns to “down.”  What I want to talk about in the next few posts is how to maximize (or minimize) a linear function in the presence of linear constraints.  (Note that as it’s taught in calculus, such optimization problems have no additional constraints.)

That is, we’ll look at functions of the form c^T x where c and x are vectors in \mathbb{R}^n.  The c^T x is the dot product of these two vectors, and is just a convenient way of writing c_1 x_1 + c_2 x_2 + ... + c_n x_n.  So the x_i are our variables and the c_i are constants normally referred to as cost coefficients.

An example of such a function is the cost of, say, building a house.  Let’s suppose x_1 is the amount of lumber used which costs $2 per pound (I have no idea how much lumber costs, so you’ll have to excuse me if that’s a completely ridiculous value); x_2 is the amount of brick which we’ll assume is $3 per pound; and let’s say x_3 is the amount of roofing we use and it costs $5 per pound.  Here our cost for building the house would be 2x_1 + 3x_2 + 5x_3, and so we could just say c = [2,\, 3,\,5]^T and x = [x_1,\, x_2,\, x_3]^T.  This function, c^T x, we will refer to as our objective function, and our goal will be to make this as small (or large) as possible given certain restrictions (linear constraints).

Now, by a linear constraint we mean that there are limits on the resources we use, and we can express these limits in the same way we expressed the cost.  In our house building example, for instance, we may not be able to use any more than 2000lb of brick, so we’d have a constraint x_2 \leq 2000.  Similarly let’s assume we must use less than 1500lb of lumber and 300lb of roofing materials: x_1 \leq 1500, and x_3 \leq 300.  If we further assume that our house must weigh at least 2100lb (for whatever reason) then we’d have the constraint x_1 + x_2 + x_3 \geq 2100.  Just to make things more complicated, let’s assume we have to use at least 50lb of roofing materials: x_3 \geq 50.  (I know this is a goofy example with silly constraints, but it still serves the purpose of showing us what a linear programming problem is.  The last two constraints are there only so we don’t have a trivial solution.)

Now our linear program is simply the following:

Minimize 2x_1 + 3x_2 + 5x_5

Subject to the following:

x_1 \leq 1500

x_2 \leq 2000

x_3 \leq 300

x_1 + x_2 + x_3 \geq 2100

x_3 \geq 50

x_1, x_2, x_3 \geq 0

This last constraint simply says that we can’t have a negative amount of brick, lumber, or roof.

So, our problem is that we want to minimize the cost of building a house when we’re faced with certain restrictions on amounts of material we can use.  (Now you should see why the last two constraints are needed to prevent a trivial solution: if you want to minimize the cost and there’s nothing telling you can’t, just use zero brick, zero lumber, and zero roofing!)

It is generally convenient to rewrite this set of constraints as a matrix.  In our particular problem we could set up the matrix

A = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & -1 & -1 \\ 0 & 0 & -1 \end{array} \right]

Then if we let b = [1500,\, 2000,\, 300,\, -2100,\, -50], our constraints could be expressed simply as

Ax \leq b

x\geq 0

Where we will adopt the convention that if \alpha and \beta are vectors of the same size, \alpha \leq \beta means that each component of \alpha is less than or equal to the corresponding component of \beta; \alpha_i \leq \beta_i for all i.  (Notice that in order to write our problem like this we had to multiply the last two inequalities by -1 so that we’d have \leq instead of the \geq we started with.)

This is all linear programming means: maximize (or minimize) the function c^T x under the condition that Ax \leq b and x \geq 0.  The term linear programming, by the way, is due to George Dantzig who studied such problems while trying to optimize schedules for the US Air Force in the 1940s.  The Air Force referred to their schedules as “programs,” and so such problems came to be known as linear programs.

In the next post we’ll discuss slack and surplus variables which allow us to turn the inequalities above into equalities, which will be necessary for us to solve linear programs algorithmically.

March 18, 2009

The Fundamental Group, Part II – The Basepoint

Filed under: Topology — cjohnson @ 2:07 pm

We constructed the fundamental group by looking at the homotopy classes of loops with a fixed basepoint.  The question that naturally arises is how does the fundamental group change if we move the basepoint.  To begin let’s suppose we’re in a path connected space X and we have two basepoints we’re concerned about, say x_0 and x_1.  Since X is path connected, there exists some path f from x_0 to x_1.  We can use this to convert x_1 loops into x_0 loops.

What we’ll do is take a loop, say g, based at x_1 and use our path to get from x_0 to x_1, follow the loop, then take the reverse path back to x_0.  This is illustrated in the image below (adapted from an image in Hatcher’s book).

Connected Loops

This new path f * g * \overline{f} is a loop based at x_0.  Now, if two x_1-loops, say g_1 and g_2 are homotopic, then so are the associated x_0-loops: g_1 \simeq g_2 \implies f * g_1 * \overline{f} \simeq f * g_2 * \overline{f}.

If we now construct a map \beta_f : \pi_1(X, x_1) \to \pi_1(X, x_0) by setting \beta_f [g] = [f * g * \overline{f}] we see that the map will be well-defined since our construction of the x_0-loops respects homotopy.  But what happens when we concatenate two x_1-loops?

Suppose g and h are loops based at x_1.  We want to see what this map does to the homotopy class of g * h.

\displaystyle \beta_f([g] \cdot [h]) = \beta_f[g * h]

\displaystyle = [f * g * h * \overline{f} ]

\displaystyle = [f * g * \overline{f} * f * h * \overline{f} ]

\displaystyle = [f * g * \overline{f}] \cdot [f * h * \overline{f}]

\displaystyle = \beta_f [g] \cdot \beta_f [h]

So our \beta_f map is in fact a homomorphism.  Notice that f * g * \overline{f} \simeq f * h * \overline{f} implies g \simeq h, which means that our map is injective.

To show the map is surjective, let h be an x_0-loop, then \overline{f} * h * f is an x_1-loop.  Applying \beta_f we have \beta_f[\overline{f} * h * f] = [f * \overline{f} * h * f * \overline{f}] = [h], and so our map is surjective.  Together with injectivity above we have shown that in a path connected space the fundamental group of \pi_1(X, x_0) and \pi_1(X, x_1) are isomorphic.  We then simply write \pi_1(X) since the choice of basepoint is basically irrelevant.

If our space is not path connected, but x_0 and x_1 are in the same path component, the above gives us that \pi_1(X, x_0) \cong \pi_1(X, x_1).

An Aside on Path Connectedness

Filed under: Topology — cjohnson @ 9:31 am

Long ago we talked about topological spaces being connected, which meant we couldn’t partition the space into two disjoint non-empty open sets.  There is a stronger notion that we’ll need when working with the fundamental group called path connectedness.  We say that a space X is path connected if for every pair of points x, y \in X there exists a path in the space, f : [0, 1] \to X, such that f(0) = x and f(1) = y; we can connect every pair of points with a path.

If a space is not path connected, it can be decomposed into a collection of disjoint subsets with are path connected.  These subsets are called the path components of the space.  More specifically, path connectedness is an equivalence relation on the space and the path components are the equivalence classes.  Clearly every point is path connected to itself (reflexivity); if x and y are connected by path f, then the reverse path \overline{f} connects y and x (symmetry); and the concatenation operation from last time gives us transitivity.

We said that path connectedness was a stronger condition than connectedness.  This means that path connectedness implies regular ol’ connectedness.  To see this, suppose that X is path connected.  Let u, v \in X be connected by path f.  Let U be an open set containing u, and V an open set containing v.  Now consider the preimages f^{-1}(U) and f^{-1}(V).  Since f is a path it is continuous, so these are two open subsets of [0, 1].  We must have f^{-1}(U) \cup f^{-1}(V) = [0, 1] since f(0) = u and f(1) = v.  We know that [0, 1] is connected, however.  This implies that U \cup V must be connected (otherwise their preimages would be disjoint open sets covering [0, 1]).  Since U and V were arbitrary, we can generalize to get that X is connected.

Notice that this also means that a continuous path must stay inside of one path component.

March 17, 2009

The Fundamental Group, Part I – Connecting Paths

Filed under: Topology — cjohnson @ 4:27 pm

Imagine that you have two paths, f : [0, 1] \to X and g : [0, 1] \to X.  If it should happen that the terminal point of f is the initial point of g (that is, if f(1) = g(0)), then we can basically concatenate f and g into a new path h.  To do this formally we need to define h : [0, 1] \to X, which means we have to reparameterize f and g a little bit.  We’ll define h so that it behaves as f on [0, 1/2] and as g on [1/2, 1] (notice this will be well-defined since we’re assuming the terminal point of f is the initial point of g).

\displaystyle h(t) = \left\{ \begin{array}{ll} f(2t) &: t \in [0, 1/2] \\ g(2t - 1) &: t \in [1/2, 1] \end{array} \right.

We will let the asterisk denote this concatenation operation: f * g = h.

In the case that f and g are loops with the same basepoint, we’re just gluing the two loops together.  In the image below the basepoint is represented by the red dot.  We’re taking these two loops and gluing them together so that we first wind around the first loop (the one on the left) and then around the second loop.

Concatenated Loops

Note that the red dot is the same basepoint for both loops, even though in our picture it doesn’t stay aligned (just for graphical convenience).

If f, g and h are loops (with the same basepoint) and f \simeq g, we can show that f * h \simeq g * h.  Letting \{ f_t : [0, 1] \to X \} be the homotopy taking f to g, we construct the homotopy \{ h_t : [0, 1] \to X \} by defining h_t = (f_t * h).  This always behaves as h on the “second half” of the loop, and deforms to g on the first half.  Of course, this also can be used to show that h * f \simeq h * g.

We can now use * to help us form an operation on the homotopy classes of loops.  We will simply define {}[f] \cdot [g] as {}[f * g].  This operation is clearly associative, since our concatenation is associative. The identity element of this operation is the homotopy class of the constant function.  Keeping in mind that these are loops, the constant function is the function that maps everything to the basepoint.  Let c denote this map.  Then we have that {}[f] \cdot [c] = [f * c] = [f], where the last equality follows from the fact that with f * c we traverse f twice as fast as normal, then just sit at the basepoint.  This is homotopic to f, and so f and f * c live in the same homotopy class.

Finally, our operation is invertible.  To define the inverse operation we will take a loop f and will let \overline{f} denote the reverse loop: \overline{f}(t) = f(1 - t).  This reverse loop will have the same shape as the initial loop, but traverse the loop in the opposite direction.  Now notice that f * \overline{f} can be compressed into the constant map by constructing a homotopy that traverses most of f, then goes ahead and starts coming back through \overline{f}.  This homotopy slowly reels f * \overline{f} back to the basepoint, which eventually turns it into the constant map.  Since f * \overline{f} \simeq c (c being the constant basepoint map), we have that {}[f*\overline{f}] = [c], and similarly {}[\overline{f}*f] = [c].

So, our operation is associtiative, has an identity, and is invertible, so we have a group of the homotopy classes of loops for a fixed basepoint.  This is known as the fundamental group and is denoted \pi_1(X, x_0), where X is the space and x_0 is the basepoint.  In the next post we’ll look at what happens when the basepoint changes.

« Previous PageNext Page »

Blog at WordPress.com.