Mathematics Prelims

June 10, 2009

Extreme Points, Part 1 — Geometric Intuition

Filed under: Linear Programming — cjohnson @ 4:51 pm

Recall from earlier that in a linear program we have a set of constraints, Ax \leq b, each of which defines a half-space.  The intersection of all these half-spaces, along with our standing assumption x \geq 0 defines our feasible region: the region in space where satisfying all of our constraints.  In general, the intersection of a finite number of half-spaces is called a polyhedral set, and if the set is bounded set call it a polytope.

One thing to notice about polyhedral sets is that they’re convex.  To see this, suppose that our polyhedral set is defined by Ax \leq b.  Now let x_0 and x_1 be two points satisfying these constraints (in each of the corresponding half-spaces).  Let \lambda \in [0, 1].  We have the following.

\displaystyle A(\lambda x_0 + (1 - \lambda) x_1) = A (\lambda x_0) + A((1 - \lambda) x_1)

\displaystyle \, \, \, = \lambda A x_0 + (1 - \lambda) A x_1

\displaystyle \, \, \, \leq \lambda b + (1 - \lambda) b

\displaystyle \, \, \, = (\lambda + 1 - \lambda) b

\displaystyle \, \, \, = b

So A(\lambda x_0 + (1 - \lambda) x_1) \leq b and the set is convex.

Now, if we had only a few (two or three) variables, then we could explicitly graph the polyhedral set and we’d notice that it has some sharp “corners” when two lines (or planes) that form the boundary of a half-space intersect.  These points are of interest to us because if an optimal solution exists it must occur at one of these corner points.

As an example, consider the following linear program:

Maximize \displaystyle \frac{1}{2} x_1 + x_2

Subject to the following:

\displaystyle x_2 \leq 3

\displaystyle x_1 + x_2 \leq 5

\displaystyle x_1, x_2 \geq 0

Graphing these we have

Extreme Points

The extreme points for our feasible region are the five corners above, each of which is the intersection of two of the lines determined by our constraints.  Now let’s consider all of the points where our objective function would be equal to 2.  These are the points satisfying the equation \frac{1}{2} x_1 + x_2 = 2 — all such points lie on a line, and in particular are the points on the red line below.

z=2Note that if we were to consider the points where our objective function is 3 (the points lying on the line \frac{1}{2} x_1 + x_2 = 3), we’d have a similar picture, but with our line simply shifted up one unit:

z=3Remember that our goal is to maximize our objective function while satisfying all of the constraints / staying within the feasible region.  Geometrically, then, what we should do is move this line up as far as we can — as our objective increases when we shift the line up — and stay in the feasible region.  Each of these lines is called an objective contour.  Looking at the set of all possible values the objective function could take on we have something similar to the following (I say “something similar” because the picture would just be solid red if we looked at all the contours).

contoursFrom this picture we see that the optimal solution occurs at the top-right extreme point.  Since an extreme point is the intersection of two lines, we can simply calculate the intersection point of those two lines to see to find the precise point where optimality is obtained (these are called binding constraints since they’re satisfied by equality).

Those lines are x_2 = 3 and x_1 + x_2 = 5, so optimality occurs at (x_1, x_2) = (2, 3) and our objective function takes on the value 4.

This geometric argument is nice and intuitive, but falls apart if we have more than three variables (since we’re in at least four dimensional space where such pictures can’t be drawn).  In the next post we’ll see how to characterize the extreme points of the feasible region algebraically.  Later we’ll see how to traverse the extreme points of the feasible region, using that algebraic description, in such a way that we always move to a “closer to optimal” extreme point, until we finally get there (or determine no such optimal point exists).

June 9, 2009

An Quick Aside on Convexity

Filed under: General, Linear Programming — cjohnson @ 9:19 pm

I’ve been meaning to write a post on the extreme points of a feasible set for the last few days, and in that post I need to talk about convexity.  Originally I was going to just define convex set and convex combination inside of that post, but decided it might be better for those to be defined in their own post.  It’s been busy at school the last couple of days, but I’m still hoping to get a couple more LP posts written this week.

Intuitively a convex set is one which doesn’t have any “inward bumps.”  In the image below, the region on the left is convex while the region on the right is not.Convex Sets

To say the set doesn’t have any “inward bumps” is rather vague, though.  A more precise notion is to say that for any two points in the set, the line segment connecting those points lies entirely in the set.  We see the region on the left has this property, while the region on the right does not: pick two points in opposite “legs,” and the line segment joining them will not lie completely within the region.  Of course, when we start talking about higher dimensional spaces simply drawing the line segment doesn’t really work.  What we’ll do instead is pick tow points, say x_0 and x_1 and take a convex combination of these points.  If all such convex combinations lie in the set, the set is convex.

By a convex combination of the points x_0 and x_1, we mean the point \lambda x_0 + (1 - \lambda) x_1 where \lambda \in [0, 1].  (Recall that we’re dealing with points inside of a vector space, so by adding two points we’re adding the associated vectors.)  One important thing to note about convex sets is that if X is convex, then for any finite collection of points x_1, x_2, ..., x_n \in X we have that

\displaystyle \sum_{i=1}^n \lambda_i x_i \in X

where

\displaystyle \sum_{i=1}^n \lambda_i = 1

\displaystyle \lambda_1, ..., \lambda_n \in [0, 1]

We will show this inductively where our base case, n = 2, follows immediately from our definition of convexity.  Now assume for the result holds for n.  We will show the result holds for n+1:  Let x_1, ..., x_{n+1} \in X and \lambda_1, ..., \lambda_{n+1} \in [0, 1] with \sum_{i=1}^{n+1} \lambda_i = 1.  We begin by rewriting our sum:

\displaystyle \sum_{i=1}^{n+1} \lambda_i x_i = \lambda_1 x_1 + ... + \lambda_n x_n + \lambda_{n+1} x_{n+1}

\displaystyle = \frac{1 - \lambda_{n+1}}{1 - \lambda_{n+1}} \left( \lambda_1 x_1 + ... \lambda_n x_n \right) + \lambda_{n+1} x_{n+1}

\displaystyle = (1 - \lambda_{n+1}) \left( \frac{\lambda_1}{1 - \lambda_{n+1}} x_1 + ... + \frac{\lambda_n}{1 - \lambda_{n+1}} x_n \right) + \lambda_{n+1} x_{n+1}

Notice that \sum_{i=1}^{n+1} \lambda_i = 1 implies \sum_{i=1}^n \lambda_i = 1 - \lambda_{n+1}, so the sum of the coefficients in the expression inside parenthesis is one.  Since each of these x_i points is in our convex set X, we have a convex combination and thus the sum is in the set — call this point \bar{x}.  Now we have (1 - \lambda_{n+1}) \bar{x} + \lambda_{n+1} x_{n+1} is a convex combination of points inside X, and so we have our result.

June 5, 2009

LP Part 3: The Feasible Region

Filed under: Linear Programming — cjohnson @ 3:18 pm

Suppose we want to consider all of the points in the plane that satisfy the inequality y \leq x + 1.  What does such a set look like? Well, one way to visualize it is to draw the line y = x + 1 and then take everything under or on the line.  This is an example of a half-space: a “line” that cuts the space in two, and then we take everything on one side of that “line.”  I write “line” because in higher dimensions this isn’t actually a line.  In three dimensions, for instance, it’s a plane.  In four dimensions it’s a hyperplane.  We’ll use the word hyperplane to just mean a linear equality regardless of dimension.  Recall that a linear equality is simply an equation of the form c_1 x_1 + c_2 x_2 + ... + c_n x_n = b; no exponents, no logs or square roots, just constant multiples added up.  And so a half-space is everything on one side of a hyperplane.

In a linear program we tend to have a few of these linear inequalities  in the form of our constraints.  When we actually write out Ax \leq b, we see that each row in A corresponds to a particular inequality and so each constraint defines a half-space.  We’re only concerned with points that satisfy all of the constraints, however, and thus want our optimal solution to occur within the intersection of these half-spaces.  This intersection of all the half-spaces defined by our constraints is called the feasible region of the program.

As far as the feasible region is concerned, there are four cases we need to be aware of when it comes time to actually solve a linear program: the feasible region is empty; the feasible region is not empty, but is bounded; the region is not bounded, but the objective function is;  neither the feasible region nor the objection function are bounded.

The first case occurs when our constraints “cancel each other out.”  For instance if we had the constraints 3x_1 + 2x_2 - 5x_3 \geq 3 and 3x_1 + 2x_2 + 5x_3 \leq 2.  The first constraint puts us on one half of a hyperplane, and then the second tries to get us back on the other side.  Since both of these inequalities can’t hold at the same time, there are no points inside our feasible region.  That’s a simple example of an infeasible LP.  Another, slightly more complicated example occurs if our constraints are

\left[ \begin{array}{cc} 1 & -1 \\ -1 & -1 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x_1 \\ x_2 \end{array} \right] \geq \left[ \begin{array}{c} -1 \\ -2 \\ 2 \end{array}\right]

Multiplying this out, we have the following.

An Infeasible Region

An Infeasible Region

In the above picture, areas that satisfy an inequality are shaded in blue, and areas satisfying two equalities are in a dark blue (the shading is additive).  Notice that no area is in a blue any darker than the areas satisfying two inequalities; no area satisfies all three inequalities.  Since no point satisfies all three inequalities, our feasible region is empty.

We’ll say the feasible region is bounded if we can place a circle of finite radius around it, encompassing the entire feasible region.  If the feasible region is bounded, then our objective must be bounded.  As we will see later, our objective is minimized or maximized only at “corners” of the feasible region.  Such a corner occurs only when some number of our constraints are satisfied by equality (the particular number depending on the dimension of the problem).  In two dimensions, for instance, corners occur when two lines cross, and we’re only on a line if we satisfy a constraint with equality.  In three dimensions we’d need three planes to intersect.  Since we only concern ourselves with problems which have a finite number of constraints, we only have a finite number of these corners (this number may be huge, though, so just enumerating all possible corners isn’t a reasonable strategy).

If our feasible region is unbounded, however, our objective function may or may not be bounded.  It could be that we can make the objective arbitrarily large by picking numbers that are farther and farther out, but it could also be that picking huge numbers isn’t particularly helpful.  For example, suppose our only restrictions were x_1, x_2 \geq 0, so we can pick any pair of points in the first quadrant.  If our goal is to maximize x_1 + x_2, then we can make this value as big as we want by picking big enough numbers.  If our goal is to minimize x_1 + x_2, however, then the best we can do is to pick x_1 = x_2 = 0.

For a slightly less trivial example, suppose we want to maximize x_2 - x_1 subject to the constraints x_1 \geq 0, x_2 \geq 0, x_2 \leq 1.  Now our feasible region is an infinite strip, a portion of which is given below.

Unbounded Feasible Region, Bounded Objective

Despite being able to pick very large numbers, our maximum occurs at the point x_1 = 0, x_2 = 2.  If we were to increase x_1 any our value would drop since we’re subtracting x_1, and we’ve already made x_2 as large as we can.  So we have a bounded objective function, even though our feasible region is unbounded.  We will generally refer to a linear program as being bounded if it’s objective function is bounded, and unbounded if it’s objective function is unbounded.  Note that for the LP to be unbounded, the feasible region must be unbounded.  If the LP is bounded, however, the feasible region may be either bounded or unbounded.

In the next post we’ll look at the extreme points of the feasible region, which are the “corners” alluded to earlier, and give an argument for why the maximum/minimum must occur at an extreme point.

June 4, 2009

Linear Programming, Part II – Slack and Surplus

Filed under: Linear Programming — cjohnson @ 6:57 pm

Recall from last time that a linear program is an optimization problem of the form

Maximize c^T x

Subject to Ax \leq b and x \geq 0

Where the system of constraints Ax \leq b is just a convenient way to write something of the form

\displaystyle a_{11} x_1 + a_{12} x_2 + ... + a_{1n} x_n \leq b_1

\displaystyle a_{21} x_1 + a_{22} x_2 + ... + a_{2n} x_n \leq b_2

\displaystyle \vdots

\displaystyle a_{m1} x_1 + a_{m2} x_2 + ... + a_{mn} x_n \leq b_m

Notice that our use of maximization, instead of minimization, is immaterial as any maximization problem can be turned into an equivalent minimization problem and vice versa.  The problem of maximizing f(x) is equivalent (has precisely the same solutions) as the problem of minimizing -f(x).

Likewise, our use of \leq inequalities instead of \geq is more a matter of taste and convenience than necessity: f(x) \leq b is the same thing as -f(x) \geq -b.  What we would like to do, though, is get rid of those inequalities altogheter.  If we can turn these constraints into a system of equations of the form Ax = b, then we’ll be able to apply techniques from linear algebra to help us manipulate the LP (linear program).  Later we’ll see how to cleverly use elementary row operations to traverse the set of potential optimal solutions.  But first we have to lose those nasty little inequalities.

So here’s the idea: say we have x \leq 5.  We can turn this into an equation by inserting a non-negative variable, y, which equals 5 - x.  That is, y takes up the slack left by x, and x \leq 5 becomes x + y = 5.  Algebraically, if we set y = 5 - x, we have x + y = x + (5 - x) = 5 and that’s why we have equality.

If our inequality goes the other way, say x \geq 7, we’ll just subtract off the surplus: set y = x - 7 (note that this means y \geq 0) and so x \geq 7 becomes x - y = 7.  So in the system of inequalities above we can get a system of equations by tacking on a slack, or taking out the surplus, for each inequality.  The earlier system then becomes

\displaystyle a_{11} x_1 + a_{12} x_2 + ... + a_{1n} x_n + s_1 = b_1

\displaystyle a_{21} x_1 + a_{22} x_2 + ... + a_{2n} x_n + s_2 = b_2

\displaystyle \vdots

\displaystyle a_{m1} x_1 + a_{m2} x_2 + ... + a_{mn} x_n + s_m = b_m

Where s_i is the slack (surplus) we add (subtract) for the i-th constraint.  Now our matrices will of course have to change a little bit to take these new variables into account.  Originally, A was an m \times n matrix, x was a n-vector, and b was a m-vector.  For our system of equations we’ll let \hat{A} = [ A, \, I ] and \hat{x} = [ x, \, s_1, \, ..., \, s_m ]^T where I is the m \times m identity matrix; note that b doesn’t change.  Note too that we append the identity because we have only slack variables; if some of those had been surplus variables, the corresponding entries we append would be negative.

So by throwing in an extra variable for each inequality constraint we have a system of equations which is amenable to the techniques of linear algebra.  If it were to happen that the resultant matrix was invertible (which requires it to be square — an equal number of equations and unknowns) then we could just multiply by the inverse and find the solution.  This doesn’t work for our general problem though since there will normally be infinitely many possible solutions.  Our goal will be to weed out the best solution from those possibilities.  In the next post we’ll look at this space of possible solutions.

June 3, 2009

An Introduction to Linear Programming, Part I

Filed under: Linear Programming — cjohnson @ 8:42 pm

In a first semester calculus class you learn how to optimize a differentiable function by looking at what happens where the derivative is zero (and the endpoints if you’re concerned with a closed interval).  The idea being what comes up must come down: if the function goes up and then goes down, a maximum must occur somewhere between (and similarly for a minimization problem), so you look for the precise point that “up” turns to “down.”  What I want to talk about in the next few posts is how to maximize (or minimize) a linear function in the presence of linear constraints.  (Note that as it’s taught in calculus, such optimization problems have no additional constraints.)

That is, we’ll look at functions of the form c^T x where c and x are vectors in \mathbb{R}^n.  The c^T x is the dot product of these two vectors, and is just a convenient way of writing c_1 x_1 + c_2 x_2 + ... + c_n x_n.  So the x_i are our variables and the c_i are constants normally referred to as cost coefficients.

An example of such a function is the cost of, say, building a house.  Let’s suppose x_1 is the amount of lumber used which costs $2 per pound (I have no idea how much lumber costs, so you’ll have to excuse me if that’s a completely ridiculous value); x_2 is the amount of brick which we’ll assume is $3 per pound; and let’s say x_3 is the amount of roofing we use and it costs $5 per pound.  Here our cost for building the house would be 2x_1 + 3x_2 + 5x_3, and so we could just say c = [2,\, 3,\,5]^T and x = [x_1,\, x_2,\, x_3]^T.  This function, c^T x, we will refer to as our objective function, and our goal will be to make this as small (or large) as possible given certain restrictions (linear constraints).

Now, by a linear constraint we mean that there are limits on the resources we use, and we can express these limits in the same way we expressed the cost.  In our house building example, for instance, we may not be able to use any more than 2000lb of brick, so we’d have a constraint x_2 \leq 2000.  Similarly let’s assume we must use less than 1500lb of lumber and 300lb of roofing materials: x_1 \leq 1500, and x_3 \leq 300.  If we further assume that our house must weigh at least 2100lb (for whatever reason) then we’d have the constraint x_1 + x_2 + x_3 \geq 2100.  Just to make things more complicated, let’s assume we have to use at least 50lb of roofing materials: x_3 \geq 50.  (I know this is a goofy example with silly constraints, but it still serves the purpose of showing us what a linear programming problem is.  The last two constraints are there only so we don’t have a trivial solution.)

Now our linear program is simply the following:

Minimize 2x_1 + 3x_2 + 5x_5

Subject to the following:

x_1 \leq 1500

x_2 \leq 2000

x_3 \leq 300

x_1 + x_2 + x_3 \geq 2100

x_3 \geq 50

x_1, x_2, x_3 \geq 0

This last constraint simply says that we can’t have a negative amount of brick, lumber, or roof.

So, our problem is that we want to minimize the cost of building a house when we’re faced with certain restrictions on amounts of material we can use.  (Now you should see why the last two constraints are needed to prevent a trivial solution: if you want to minimize the cost and there’s nothing telling you can’t, just use zero brick, zero lumber, and zero roofing!)

It is generally convenient to rewrite this set of constraints as a matrix.  In our particular problem we could set up the matrix

A = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & -1 & -1 \\ 0 & 0 & -1 \end{array} \right]

Then if we let b = [1500,\, 2000,\, 300,\, -2100,\, -50], our constraints could be expressed simply as

Ax \leq b

x\geq 0

Where we will adopt the convention that if \alpha and \beta are vectors of the same size, \alpha \leq \beta means that each component of \alpha is less than or equal to the corresponding component of \beta; \alpha_i \leq \beta_i for all i.  (Notice that in order to write our problem like this we had to multiply the last two inequalities by -1 so that we’d have \leq instead of the \geq we started with.)

This is all linear programming means: maximize (or minimize) the function c^T x under the condition that Ax \leq b and x \geq 0.  The term linear programming, by the way, is due to George Dantzig who studied such problems while trying to optimize schedules for the US Air Force in the 1940s.  The Air Force referred to their schedules as “programs,” and so such problems came to be known as linear programs.

In the next post we’ll discuss slack and surplus variables which allow us to turn the inequalities above into equalities, which will be necessary for us to solve linear programs algorithmically.

Blog at WordPress.com.