Convex Polytopes

10 January 2022

Convex Polytopes

Convex polytopes, or simply polytopes, are analogues of (convex) polygons in higher dimensions. There are two obvious ways to generalise a polygon, depending on which observation you made:

Polygons are convex hulls of 2D points. So we could define polytopes to be convex hulls of points in \(\Real^d\).
Polygons are bounded intersections of finitely many halfplanes. So we could define polytopes to be bounded intersections of finitely many halfspaces in \(\Real^d\).

Both definitions are reasonable; amazingly they are also equivalent. We will prove this fundamental result in the next chapter.

Human intuition encounters a huge barrier in visualising even a 4-dimensional polytope, despite the clear picture we all had in \(\Real^2\) and \(\Real^3\). So what’s the point of studying something we could not see? Simply because polytopes arise naturally from a linear system of inequalities

\[A \vec{x} \leq \vec{b}.\]

Systems of this kind could model various theoretical and practical problems. The central question is, of course, understanding the structure of its solution space. But the space turns out to be a polytope: Each row of the system encodes a halfspace!

The study of polytopes is a dance between geometry and algebra. We draw geometric ideas and mechanisms from 2D and 3D, but implement them via pure algebraic arguments. For a good start, we develop some linear algebra that we shall constantly use.

Affine Spaces

Intuitive but impractical definitions

Recall that \(V \subseteq \Real^d\) is a linear space if it is closed under scaling and addition (thus also linear combinations). That is, for all \(\vec{v}, \vec{v}' \in V\) and \(\lambda\in \Real\), the vectors \(\lambda \vec{v}\) and \(\vec{v} + \vec{v}'\) again lie in \(V\). The condition enforces the space to extend “flat” through the origin.

For our purpose, however, we want to drop the restriction of “passing through the origin” while maintaining flatness:

Definition. \(U \subseteq \Real^d\) is an affine space if \(U - \vec{u}^* := \set{ \vec{u}-\vec{u}^* : \vec{u} \in U}\) is a linear space for some \(\vec{u}^* \in U\). In a word, it is a shifted linear space.

Actually the “for some” implies “for all”: Given \(U\) and \(\vec{u}^*\) as specified above, we claim that \(U - \widetilde{\vec{u}}^* = V := U - \vec{u}^*\) for all vector \(\widetilde{\vec{u}}^* \in U\). This is another testament to the flatness of the space as it looks identical wherever we stand on it. For one direction, we represent any element from LHS as

\[\vec{u} - \widetilde{\vec{u}}^* = (\vec{u}-\vec{u}^*) - (\widetilde{\vec{u}}^*-\vec{u}^*)\]

which belongs to \(V\) since it is a linear combination of two vectors in \(V\). The converse direction is an exercise; it gives you hands-on experience that “adding \(n\) elements and then subtracting \(n-1\) elements will leave you in the affine space”. This general principle is continued in Lemma 1.

Due to the uniqueness of the underlying linear space, we naturally transfer some concepts to affine space accordingly:

The dimension of an affine space is defined as the dimension of its underlying linear space.
The affine span of \(S \subseteq \Real^d\) is defined as \(\aff(S) := \vec{s}^* + \span (S - \vec{s}^*)\), where the choice of \(\vec{s}^* \in S\) is immaterial. (Prove it!) Essentially, we pretend \(\vec{s}^*\) as the origin, extend from there a flat space, and then restore the true origin.
The vectors in \(S\) are called affinely independent if the vectors in \(S - \vec{s}^*\) are linearly independent. We played the “moving origin” trick again, and this time you should be convinced that the choice of \(\vec{s}^* \in S\) cannot matter.

These definitions are mentally appealing but practically unwieldy, so below we shall build some handles in more explicit forms.

Going practical and technical

In almost every practical use, an affine space is contructed by one of the two approaches:

as the solution space of some equation system \(A \vec{x} = \vec{b}\); or
as the affine span of some finite set.

You are encouraged to verify that they indeed construct affine spaces. Also, as an aside, both schemes have the ability to represent all possible affine spaces.

In the first approach, the underlying linear space is exactly the kernel of \(A\), hence the constructed affine space has dimension \(n-\rank(A)\); that is easy!

The second approach poses greater challenge in computing the dimension. To this end, we need an algebraic characterisation of affine spans.

Lemma 1. \(\aff \set{\vec{s}_1, \dots, \vec{s}_n} = \set{ \sum_{i=1}^n \lambda_i \vec{s}_i : \sum_{i=1}^n \lambda_i = 1 }\).

Proof. Denote for convenience \(S := \set{\vec{s}_1, \dots, \vec{s}_n}\). Let us fix an arbitrary \(j \in [n]\) and show

\[\vec{s}_j + \span(S - \vec{s}_j) = \set{ \sum_{i=1}^n \lambda_i \vec{s}_i : \sum_{i=1}^n \lambda_i = 1 }.\]

For the ⊆ inclusion, we observe that

\[\vec{s}_j + \sum_{i=1}^n \beta_i (\vec{s}_i - \vec{s}_j) = \sum_{i \neq j} \beta_i \vec{s}_i + \left( 1 - \sum_{i \neq j} \beta_i \right) \vec{s}_j.\]

The coefficients clearly sum up to 1. The converse inclusion ⊇ follows exactly the same argument. ∎

Lemma 2. Suppose \(S := \set{\vec{s}_1, \dots, \vec{s}_n}\). We append a one to each vector to obtain an augmented matrix \(S^+ := \begin{pmatrix} \vec{s}_1 & \cdots & \vec{s}_n \\ 1 & \cdots & 1 \end{pmatrix}\). Then \(\dim(\aff(S)) = \rank(S^+) - 1\).

Proof. Let \(r := \rank(S^+)\). Without loss of generality we assume the first \(r\) columns form a basis for \(S^+\), thus for all \(j \in [n]\) we may decompose

\[\begin{pmatrix} \vec{s}_j \\ 1 \end{pmatrix} = \sum_{i=1}^r \lambda_{ij} \begin{pmatrix} \vec{s}_i \\ 1 \end{pmatrix}.\]

So \(1 = \sum_{i=1}^r \lambda_{ij}\). By Lemma 1 we see \(\vec{s}_j \in \aff(\vec{s}_1, \dots, \vec{s}_r)\) for all \(j \in [n]\). Hence \(\aff(S) = \aff(\vec{s}_1, \dots, \vec{s}_r)\). (Why?)

Now its dimension is just \(\dim(\span(\vec{s}_2 - \vec{s}_1, \dots, \vec{s}_r - \vec{s}_1))\) by definitions. But this equals

\[\rank \begin{pmatrix} \vec{s}_2 - \vec{s}_1 & \cdots & \vec{s}_r - \vec{s}_1 \\ 0 & \cdots & 0 \end{pmatrix} = \rank \begin{pmatrix} \vec{s}_1 & \cdots & \vec{s}_r \\ 1 & \cdots & 1 \end{pmatrix} - 1 = r-1\]

and the proof is complete. ∎

Remark. Our choice of “one” is rather arbitrary. You may append any consistent non-zero number for each column, since we may always rescale the last row in our argument.

Corollary 3.

\(\dim(\aff(S)) \leq \vert S \vert - 1\). Equality holds iff the vectors in \(S^+\) are linearly independent, iff the vectors in \(S\) are affinely independent.
\(\dim(\aff(S)) \leq \dim(\span(S)) \leq \dim(\aff(S)) + 1\).

Proof. For the first claim, recall \(\rank(S^+) \leq \vert S \vert\). The equality part is simply checking definitions. For the second claim, observe \(\rank(S^+) - 1 \leq \rank(S) \leq \rank(S^+)\). ∎

V-Polytopes and H-Polytopes

As we have mentioned, there are two equally-compelling ways to define a polytope: either via “vertices” or via “hyperplanes”:

Definition. A V-polytope is a convex hull of some finite point set in \(\Real^d\). An H-polytope is a bounded intersection of finitely many hyperplanes in \(\Real^d\). We define the dimension of a polytope \(P\) as \(\dim(P) := \dim(\aff(P))\).

Next we proceed to define “faces” which capture the boundary structure of a polytope.

Definition. Let \(P \subseteq \Real^d\) be a polytope. \(F \subseteq \Real^d\) is a face of \(P\) if there exists a supporting hyperplane \(H: \vec{h}^\T \vec{x} = h_0\) such that

all points \(\vec{p} \in P\) are lying on one side of \(H\), that is \(\vec{h}^\T \vec{p} \leq h_0\); and
\(F = P \cap H\).

We define the dimension of the face as \(\dim(F) := \dim(\aff(F))\).

It is not even clear from the definition if \(P\) has finitely many faces. It does in 2D and 3D and certainly “should” in higher dimensions. But we have to wait one more chapter for a rigorous proof.

Note that both \(\emptyset\) and \(P\) are faces of \(P\), which we call trivial faces. The former is supported by the “contradicting hyperplane” \(\vec{0}^\T \vec{x} = 1\). The latter is supported by any “far away” hyperplane \(\vec{h}^\T \vec{x} = \infty\).

Conventionally, 0-dimensional faces are called vertices and \((\dim(P)-1)\)-dimensional faces are called facets.

Up to now we abused the name “dimension” multiple times. A summary would help clarify:

Object	Definition of dimension
linear space	size of basis
affine space	dimension of underlying linear space
polytope / face	dimension of its affine span

For the moment let us take for granted the equivalence of V- and H-polytopes, and illustrate their specialisations. This also serves as a motivation for proving their equivalence.

Lemma 4.

The intersection of two polytopes is a polytope.
The Minkowski sum of two polytopes is a polytope. (The Minkowski sum of two sets is defined as \(A + B := \set{ \vec{a}+\vec{b}: \vec{a} \in A, \vec{b} \in B }\).)

Proof. The first claim is immediate from the definition of H-polytope. The second claim follows from the fact that \(\conv(A) + \conv(B) = \conv(A+B)\). The ⊇ direction is trivial. To see the converse, we take an arbitrary \(\vec{x} \in \conv(A) + \conv(B)\) and break it into convex combinations:

\[\begin{align*} \vec{x} &= \sum_{i=1}^n \alpha_i \vec{a}_i + \sum_{j=1}^m \beta_i \vec{b}_i \\ &= \sum_{i=1}^n \sum_{j=1}^m \alpha_i \beta_j (\vec{a}_i + \vec{b}_j) ~\in \conv(A+B) \end{align*}\]

since \(\sum_{i=1}^n \sum_{j=1}^m \alpha_i \beta_j = 1\). ∎

Important Classes of Polytopes

Simplices

A \(k\)-simplex is the convex hull of \(k+1\) affinely independent points. It is a V-polytope of dimension \(k\) (why?). Since every V-polytope of dimension \(k\) has at least \(k+1\) defining points (Corollary 3), a \(k\)-simplex is indeed the simplest one among all candidates, hence its name.

Hypercubes and Crosspolytopes

A \(k\)-hypercube is defined as \([-1,1]^k\), which is a \(k\)-dimensional H-polytope with \(2^k\) vertices and \(2k\) facets.

Exercise. Prove:

The claim about dimension;
The count of vertices;
If an H-polytope is defined by \(\bigcap_{i=1}^m H_1^-\), then every facet is supported by some of the halfplanes \(H_1, \dots, H_m\);
The count of facets.

A \(k\)-crosspolytope is defined by \(\set{ \vec{x} \in \Real^k : \lvert \vec{1}^\T \vec{x} \rvert \leq 1 }\). What’s the dimension of it? What are the vertices and facets? (In fact, crosspolytopes are “polar duals” of hypercubes, meaning that \(i\)-dimensional faces of crosspolytope one-one correspond to \((k-i-1)\) dimensional faces of hypercube.)

Simplicial polytopes

A simplicial polytope has all its faces being simplices. Simplices themselves are of course simplicial polytopes, but there are many other candidates, for instance the crosspolytopes. On the other hand, hypercubes are not simplicial.

In a later chapter we will spend much time counting faces of a polytope, especially answering how #facets depends on #vertices for a given dimension. Simplicial polytopes are important because they make very efficient use of vertices (recall what we discussed for a simplex!) and are less restrictive than simplices (a simplex contains only constantly many vertices once the dimension is fixed).

Cyclic polytopes

A cyclic polytope is a special simplicial polytope constructed as follows. Let \(\Phi(t) := (t,t^2, \dots, t^d)^\T\) be the moment curve in \(\Real^d\). Choose \(n\) distinct real numbers \(t_1, \dots, t_n\). The V-polytope \(P := \conv (\Phi(t_1), \dots, \Phi(t_n))\) is called a cyclic polytope.

As you could imagine, the moment curve is quite non-linear, so cyclic polytopes should have an edgy boundary. Indeed it does, and actually it maximises #facets among all polytopes of \(n\) vertices and \(d\) dimension. The beautiful theory is awaiting shortly ahead.

Return↩︎