Skip to main content

Section 36.2 Classification and linear programming

Linear programming offers a very different, geometric approach to classification. Think of observations with \(m\) explanatory variables as points in \(\mathbb R^m\text{,}\) which are colored (say, red or blue) according to their category. Our goal is to find a plane (or surface) that separates the points of different colors as cleanly as possible.

To help with visualization, consider the case \(m=2\) when the explanatory variables \((x_1, x_2)\) are more conveniently labeled \((x, y)\text{.}\) We are looking for a line \(y = ax+b\) such that the points of one category are above it, and the points of other category are below it. If such a line exists, it is not unique, so we want to choose it to maximize the width of separation. That is, we look for maximal \(w\ge 0\) such that \(y_k\ge ax_k + b + w\) in the first set (above the line) and \(y_k\le ax_k +b - w\) in the second set (below the line). This is a linear programming problem, with three variables \(a, b, w\text{,}\) objective function \(w\text{,}\) and the linear constraints stated above.

The same idea applies to separation by a curve (for example, a polynomial).

Find and plot the best parabola separating the points \((2,3)\text{,}\) \((4,6)\text{,}\) and \((7,4)\) (the “top” group) from \((3,4)\) and \((5,5)\) (the “bottom” group).

Solution

We maximize \(w\) subject to constraints \(y_k\ge ax_k^2 + bx_k + c + w\) in the first group and \(y_k\le ax_k^2 + bx_k + c - w\) in the second. Rewrite the constraints as

\begin{equation*} ax_k^2 + bx_k + c + w \le y_k \end{equation*}

in the first group and

\begin{equation*} -ax_k^2 - bx_k - c + w \le -y_k \end{equation*}

in the second. Also, \(w\ge 0\) is required by the geometric meaning of \(w\) as the “width” of separation. Given the vectors xt, yt describing the first group and xb, yb for the second, we can form a system of linear inequalities \(A\mathbf x\le \mathbf b\) as below

xt = [2; 4; 7];
yt = [3; 6; 4];
xb = [3; 5];
yb = [4; 5];
A = [xt.^2 xt xt.^0 xt.^0; -xb.^2 -xb -xb.^0 xb.^0; 0 0 0 -1];
b = [yt; -yb; 0];
opt = linprog([0; 0; 0; -1], A, b);

t = linspace(min([xt; xb]), max([xt; xb]), 1000);
y = opt(1)*t.^2 + opt(2)*t + opt(3);
plot(xt, yt, 'b*', xb, yb, 'r*', t, y, 'k')

Instead of including \(-w \le 0\) as one of the inequalities in the system \(A\mathbf x\le b\text{,}\) we can impose this constraint separately as a lower bound in linprog.

A = [xt.^2 xt xt.^0 xt.^0; -xb.^2 -xb -xb.^0 xb.^0];
b = [yt; -yb];
opt = linprog([0; 0; 0; -1], A, b, [], [], [-Inf; -Inf; -Inf; 0]);

The result is the same. With either version of the code, if the data points are changed so that there is no parabola that separates them completely, we get the message “No feasible solution found. Linprog stopped because no point satisfies the constraints.”

Complete separation of the categories may be impossible (which makes the linear programming problem infeasible, as Matlab would report). Perhaps some data points have incorrect measurement values, or are mislabeled by category, or perhaps (quite likely) one simply cannot predict the outcome with certainty, based on the explanatory variables we have. In such a case we can still look for a separating line or curve, but the separation will be incomplete, with some points left on the “wrong” side.

We look for a line that minimizes the sum of penalties, the penalty being the amount by which a point falls into the wrong group. Formally, we minimize \(\sum p_k\) where \(p_k\ge 0\) are the penalties, subject to constraints \(y_k\ge ax_k + b - p_k\) for points in the “top” category and \(y_k \le ax_k + b + p_k\) for points in the “bottom” category. This is still a linear programming problem, but there are more variables now: \(a, b, p_1, \dots, p_n\text{,}\) and the objective function is \(\sum p_k\text{.}\)

Find and plot the best line that separates (probably not completely) 8 random points from 7 other random points, which are generated as follows:

xt = randn(8, 1);
yt = randn(8, 1) + 1;
xb = randn(7, 1);
yb = randn(7, 1);
Solution

We minimize \(\sum p_k\) subject to constraints \(y_k\ge ax_k + b - p_k\) in the first group and \(y_k\le ax_k + b + p_k\) in the second. Rewrite the constraints as

\begin{equation*} ax_k + b - p_k \le y_k \end{equation*}

in the first group and

\begin{equation*} -ax_k - b - p_k \le -y_k \end{equation*}

in the second. We also require \(p_k \ge 0\text{.}\) Given the vectors xt, yt describing the first group and xb, yb for the second, we can form a system of linear inequalities \(A\mathbf x\le \mathbf b\) and separately impose the nonnegativity requirement on \(p_k\) (though not on the coefficients \(a, b\) of the line).

A = [xt xt.^0; -xb -xb.^0];
n = numel(xt) + numel(xb);
A = [A -eye(n)]; 
b = [yt; -yb];
opt = linprog([0; 0; ones(n, 1)], A, b, [], [], [-Inf; -Inf; zeros(n, 1)]);

t = linspace(min([xt; xb]), max([xt; xb]), 1000);
y = opt(1)*t + opt(2);
plot(xt, yt, 'b*', xb, yb, 'r*', t, y, 'k')

The reason that the presence of penalties changes the matrix as A = [A -eye(n)]; is that we subtract \(p_1\) from the first inequality, \(p_2\) from the second, and so on. The penalties make it possible for all the constraints to be satisfied, but their sum \(p_1+\cdots+p_n\) needs to be minimized. In this example, the variables are \(a, b, p_1, \dots, p_n\) and accordingly, the coefficients of the objective function are \(0, 0, 1, \dots, 1\text{.}\)