Favorite Theorems

Happy New Year y’all! I’m looking forward to writing some real posts soon – this one’s just for fun. (Maybe they all are?) My first effort at catching up on my blog reading was to read the last eight posts from f(t). Kate gives us a post on her favorite theorem, including the proof. It turns out to be Cantor’s diagonal proof of the nondenumerability of the reals. I really enjoyed it and got excited to catalogue some of mine.

Here’s a top-10 list. It’s a little capricious. But I’ll stand behind each of these theorems being amazing. They span a lot of levels, from early elementary school thru graduate level stuff. Partly, especially for the lower-level ones, I’m making a case that these guys deserve more respect than they typically get. Below the list I put some comments on each. I got kinda technical toward the end, and I feel a bit self-conscious about that. I’ll be excited if you read any of this but definitely feel free to stop, and then give me sh*t about it, when I get too abstruse.

I. The commutativity of multiplication. (I’m not kidding. I think this is dope. More below.)
II. a÷b (division) = a/b (fraction)
III. The Pythagorean theorem.
IV. Every rational number has a terminating or repeating decimal expansion.
V. If a is a root of a polynomial p(x), then x-a divides p(x).
VI. The fundamental theorem of calculus.
VII. Every prime of the form 4n+1 is a sum of squares in exactly one way.
VIII. Every element of SOn is a composition of rotations in orthogonal planes.
IX. A function C->C that is analytic in a neighborhood has a convergent Taylor expansion equal to itself in that neighborhood.
X. The fundamental theorem of Galois theory.

I. The commutativity of multiplication
Most students and teachers I know treat this as so trivial and obvious that I must be some sort of ridiculous person to dignify it with the word “theorem.” But seriously now. When they told you what 5×7 was, depending slightly on your teacher, it was something like “five 7’s.” (This is the language used by my first grade teacher Judy Lazrus. She was amazing, but this isn’t especially an example of that.) 7+7+7+7+7. That means 7×5 is 5+5+5+5+5+5+5. Is it obvious that these are going to come out the same?

NO, it’s NOT. Not even to me, and I’ve had a lot of time to get used to it. Certainly not to a six year old. There really is a theorem in here. If you’ve worked with kids who are first learning about multiplication, you know this.

A formal, utterly rigorous proof would begin with something like the Peano axioms and build up addition and then multiplication in a formal way from scratch. This is cool, but I have a different proof in mind. It’s less formal; less “rigorous” by professional standards; however, it’s the one that gives this theorem such import to me, it’s totally convincing, and it’s accessible to a six-year-old. You probably already know it; perhaps it’s how you convinced yourself multiplication really is commutative way back when; but I think it deserves special attention.

Represent 5×7 as an array – five rows of 7 dots:
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
* * * * * * *
Now interchange the roles of the rows and columns. Instead of seeing the array as a stack of rows, one on top of the other, see it as a line of columns, side by side. Each column is formed with one dot from each row, so the size of each column equals the number of rows, while the number of dots in each row becomes the number of columns. 5 rows of 7 becomes 7 columns of 5. So 5×7 = 7×5. The same argument could be used with any pair of natural numbers: mxn = nxm.

Why am I making a big deal of this? Two reasons. One is that learning multiplication’s commutativity is something that happens very early in a typical math education, and this is a chance for students who are just beginning their mathematical journey to have a taste of real mathematics – an elegant, powerful argument giving a totally convincing proof of an unexpected result. The other is that the particular method of proof here (interchanging rows and columns) is one with great power, whose usefulness extends well beyond high school. For example, this is the idea behind the proof of Burnside’s formula in group theory, or the proof in number theory that the number of partitions of n with at most k parts equals the number with each part at most k, and many other advanced results. I love the idea of putting an idea of this vast power and scope into the hands of tiny children.

II. a÷b (division) = a/b (fraction)
I can’t back this one up quite as well – I don’t even know how to state this claim in precise mathematical language, and I can’t give you anything I would consider a proof. But I know there’s something surprising in here because I’ve watched a lot of kids be surprised by it.

When I learned division in school, I learned that 10÷2 means the number of 2’s that are needed to add up to 10. 10/2 means what you get if you accumulate ten halves. These are definitely not the same thing. In the former case, I am imagining stacking up 2’s till I get to 10. The answer, five, is counting boxes of 2. I’m thinking about whole things, and the 5 is counting boxes of more than 1 whole thing. There’s certainly no fraction anywhere in sight – everything is whole. In the latter, the 5 is constructed from gluing together 10 halves. When I imagine it I can still see the glue. Such a difference in visual images – why is the number coming out the same?

I think there are a lot of ways to get convinced, and unlike my first theorem I’m not attached to a specific argument. What I feel sure of is that as kids come to learn about division, and fractions, this identity deserves a lot of thought, and offers something to get excited about.

III. The Pythagorean theorem
You know it, y’all, this thing is amazing. I think the Pythagorean theorem is for many of us one of the great missed opportunities of our math education. Because most adults can recite it like a mantra but only a very few people (one is my former roommate Thierry – big ups to your middle school math teacher, Thierry!) recall it with any of the awe it deserves.

This is the earliest example I can think of in the curriculum of a really surprising connection between algebra and geometry. Somehow right-anglyness is precisely captured by sum-of-squaresiness and vice versa. Somehow “right triangle” is a geometrical picture, and “a^2+b^2=c^2” an algebraic description, of precisely the same underlying mathematical structure. I think that’s amazing. I’m still not over it.

I have three favorite proofs:
a) Euclid’s. Not very accessible to the kiddies but has the virtue of actually partitioning the square on the hypotenuse into two parts equal in area to the squares on the legs.
b) The one where you put four copies of the triangle inside a square of side length a+b and if you arrange them one way the remaining area is c^2 and if you arrange them another way it’s a^2+b^2. Here’s a picture.
c) The one where you draw an altitude from the hypotenuse, like this, and then work from the similarity of the three triangles.

There are a zillion other proofs. There are books and books. One I’m very excited about, by Bob & Ellen Kaplan (who wrote Out of the Labyrinth, which I was also very excited about), is called Hidden Harmonies: The Lives and Times of the Pythagorean Theorem and is due out in August. Part of what I love about this theorem is the amazing diversity in the ways it is proven.

Okay now I have to speed through the rest:

IV. Every rational number has a terminating or repeating decimal expansion.
The awesome Larry Zimmerman says that a good math problem “has a future.” Meaning that the ideas in the problem have a reach that extends to other problems, possibly to other whole domains of mathematics. My excitement about the proof of commutativity of multiplication, above, has to do with this. Likewise, this theorem. First of all, it’s a handy characterization of rationals, and looks ahead in a natural way to interesting calculus-laden questions having to do with convergence of series, especially geometric series. But maybe even more important to me is the proof, which involves the division algorithm and the pigeonhole principle. Both of which spin off great showers of mathematical consequences…

V. If a is a root of a polynomial p(x), then x-a divides p(x).
One of which is this! The division algorithm generalizes in a natural way from the natural numbers…
b divided by a has a remainder less than a
…to polynomials (techically, polynomial rings over a field):
p(x) divided by q(x) has a remainder which is a polynomial of degree less than q.
Because of this, we get this awesome connection between the behavior of a polynomial as a function (for what x is p(x) equal to zero?) and its behavior as an object to be factored (what other polynomials divide p(x)?)

The converse is less startling to me – if x-a is a factor of p(x), then when x=a, x-a=0, so p(x)=0 at x=a. The interesting part is the result I stated – all we have to know is p(a)=0 to know x-a is a factor. And as I mentioned, this comes from the division algorithm:
p(x) / (x-a) has a remainder of degree lower than x-a – i.e. a constant.
So p(x) = (some polynomial)*(x-a) + const.
If p(a) = 0, then evaluating this equation at a gives
0 = (some polynomial, evaluated at a)*0 + const.
The constant must be 0, i.e. the remainder must be 0, i.e. x-a is a factor of p(x). Boom!

VI. The fundamental theorem of calculus
Oh my goodness. Don’t even get me started. How am I trying to do this quickly? If the pythagorean theorem felt like it was connecting disparate realms (the geometry of a right angle and the algebra of a^2+b^2=c^2), the fundamental theorem of calculus is connecting disparate planets. If f tells you about g’s speed, then g tells you about f’s… area? WHAT??? I need to move on because I’m going to wax poetical all night long if I don’t stop. This was the theorem that sealed the deal between me and math.

VII. Every prime of the form 4n+1 is a sum of squares in exactly one way.
VIII. Every element of SOn is a composition of rotations in orthogonal planes.
IX. A function C->C that is analytic in a neighborhood has a convergent Taylor expansion equal to itself in that neighborhood.

I have a thing for when you’re confused about some question about really familiar, concrete objects and you eventually answer it by considering it in the broader context of really fanciful, crazy objects. All three of these theorems represent for me the wierd power of complex numbers and other crazy abstractions to answer questions that are about real numbers. (In the first case, natural numbers.)

There are proofs of Fermat’s theorem about primes of the form 4n+1 being sums of squares that don’t involve complex numbers, but the proof I relate to the most certainly does. I just wrote it out but then I realized that it is kind of technical so I’m putting it at the end as an appendix. I don’t know if the theorem about SOn has proofs that don’t involve complex numbers but the proof I know certainly and powerfully does. A sketch of it is at the end too.

The Taylor expansion is a slightly different but related story for me. It used to confuse me that functions like sin x and e^x are equal to their Taylor expansion everywhere, but then there is that funky example: f(x)=e^-(1/x^2) if x isn’t 0 and f(x)=0 if x is 0
This function is continuous and differentiable for all real x. At x=0, it and all its derivatives are 0. So its Taylor expansion at 0 is identically 0. But the function itself is totally not 0 at all. What the hell?

Once again, broadening out into the complex plane explains everything. The function f(x) = e^-(1/x^2) isn’t defined at x=0. When you’re just looking at the real line, it looks like you can plug the hole with f(0)=0 and everything will be smooth and nice. But if you look at the whole complex plane, x=0 is an essential singularity of f. f’s behavior in the neighborhood of 0 is actually completely bananas. They fooled us by only showing us a tiny, well-behaved slice when they showed us the real line. If a function is really continuous and differentiable in a whole complex neighborhood, it will equal its Taylor expansion there. Whew!

X. The fundamental theorem of Galois theory
I’m not even going to state this one. I could use avoiding the technicalities as an excuse, but the real reason is that I’m avoiding spoilers. The F. T. of G. T. is the punchline of an amazing mathematical journey. The way I learned it kind of killed it. The teacher stated it, showed us examples, and then spent a few classes proving it. Then, it was applied to the problem it had originally come into being to deal with (the insolubility of the quintic). I had been told the answer before I had asked the question, as it were. I feel a sense of loss that I never got to see it shimmering in the distance as I searched. This spring I’m teaching an informal course on group theory, hopefully culminating in Galois theory, for some teacher friends. I’m gonna do my darndest to bring out the fundamental theorem in an organic, motivated and natural way. I’ll let you know how it goes.


Proof that an odd prime p is a sum of 2 squares if and only if p has the form 4n+1.

Let p be a prime. If p=a^2+b^2 then p factors in the ring of Gauss integers – it equals (a+bi)(a-bi). Likewise, the only way for p to factor in the ring of Gauss integers is (a+bi)(a-bi). (This is because if (a+bi) is a factor of p, say p=(a+bi)(c+di), then its conjugate (a-bi) is too, and (a+bi)*(a-bi) is a real factor of p^2, so it is 1, p or p^2. But if it were 1 or p^2, the factorization p=(a+bi)(c+di) would be trivial – in the first case, a+bi would be a unit, in the second case, c+di would. So if (a+bi) is really a nontrivial factor of p, then p=(a+bi)(a-bi).) So p is a sum of squares if and only if p stops being prime when you venture out into the ring of Gauss integers.

I.e. p is a sum of squares if and only if (p) is not a prime ideal of Z[x]/(x^2+1).
I.e. p is a sum of squares if and only if Z[x]/(x^2+1,p) is not an integral domain.
I.e. p is a sum of squares if and only if Fp[x]/(x^2+1) is not an integral domain.
I.e. p is a sum of squares if and only if x^2+1 factors in Fp[x].
I.e. p is a sum of squares if and only if -1 is a square in the field Fp.

Now the multiplicative group of Fp is cyclic and order p-1, and (if p>2) -1 has order 2 in this group, so if g is a generator for the group, -1=g^[(p-1)/2]. -1 is a square if and only if the exponent is even, i.e. if p-1 is a multiple of 4.

So if p is a prime >2, it is a sum of squares if and only if p-1 is a multiple of 4, in other words if p has the form 4n+1.

The uniqueness of the representation as a sum of two squares follows from the fact that Z[x]/(x^2+1) is a unique factorization domain, which comes from the fact that it has a Euclidean division algorithm (big ups to the division algorithm again!). Thus p only factors one way: p=a^2+b^2=(a+bi)(a-bi). If p=c^2+d^2 too, then p would =(c+di)(c-di) too, but this contradicts unique factorization.

Outline of proof that any element of SOn is a composition of rotations in orthogonal planes:
Take an element P of SOn and regard it as having complex entries, even though they’re really real. Then P is unitary, so by the spectral theorem for unitary operators, it has an orthonormal basis of eigenvectors (which may be complex). The eigenvalues all have absolute value 1, their product is 1, and the complex ones come in conjugate pairs because P’s entries, and hence the coefficients of its characteristic polynomial, are real. Consider the subspace A of C^n spanned by a pair of eigenvectors associated to a pair of conjugate eigenvalues. Then consider the subspace A’ of A consisting of strictly real vectors (i.e. A’ is A intersect R^n). A’ is a plane, and the restriction of the action of P to A’ is a rotation. (This step takes some doing.) Other similar subspaces constructed as A’ was are all orthogonal to it since the basis of eigenvectors is orthonormal. If there are real eigenvalues, they are all 1 or pairs of -1s since the product of all eigenvalues is det P=1. Every pair of -1s gives a rotation thru pi in yet a new orthogonal plane, and the 1’s represent fixed subspaces. Thus the net effect of P is a composition of orthogonal rotations.

One corollary is the amazing fact that the composition of two rotations in 3-space about completely different axes, is a rotation! This is a corollary because in 3-space there isn’t room for 2 orthogonal planes – that would require 4 dimensions. So every element of SO3 is a rotation in a single plane. Since SO3, as a group, is closed, composing 2 rotations must give a new element of SO3 which must also be a rotation.


14 thoughts on “Favorite Theorems

  1. I made it all the way to VII before I got lost! Do I get a 70?! I pass! Woo!

    Incorporating a classroom proof for I. would require third grade teachers who don’t pee their pants at the mention of the word “proof.” I know they’re out there. Just frightfully thin on the ground.

    I’m always amazed at how II comes as a serious shock to kids in Algebra 2.

    Now I need you to package up some of your enthusiasm and sell it on Amazon so I can buy some and take a hit on Wednesday mornings.

  2. I’m calling II a definition.

    Now, VII, I didn’t know that, and it is cool. But aren’t there a bunch of similar true statements?

    I still like prime factorization being unique (down to order). Way underrated.


    1. Re: VII – yes, definitely. In a way it’s just the solution of one very specific problem in diophantine analysis: for what p does x^2 + y^2 = p have a solution? You could just as well ask for what p does x^2+5y^2=p have a solution, or any number of similar or not-so-similar problems. Historically I think it was an early discovery in this family – I know it’s attributed to Fermat. It stands out for me because the result feels simple and relevant to me – I care about primes and squares, and don’t typically see them as all too related, so this relationship strikes me as deep and unexpected.

      You know, uniqueness of factorization was on my first draft of the list and at the last second I replaced it with the one about rational numbers having repeating or terminating decimals. I love both those theorems as expressions of the power of the division algorithm.

  3. A quite delectable one I discovered recently is an enumeration of the positive rationals in which each rational occurs exactly once, already expressed in lowest terms. I wish I could find the original paper–the authors really knew how to tell a story. They presented the method initially as the solution to an apparently unrelated problem.

    Another favorite of mine is Urysohn’s Lemma in point-set topology.

    1. I guess the enumeration was more clever than just doing the usual diagonal thing and skipping the fractions not in lowest terms? (I.e. 1/1, 1/2, 2/1, 1/3, 3/1, 1/4, 2/3, 3/2, 4/1, 1/5, 5/1, …?) If you find it I’d love the reference.

      I’ve never studied Urysohn’s lemma but its name rang a bell and I realized that I’d seen a painting about it by artist-mathematician Lun-Yi Tsai, who creates these awesome paintings of mathematical ideas.

    1. Yes, that enumeration is gorgeous! I can’t decide if my favorite part is the connection to the counting problem, or the (shockingly simple) proof that every reduced fraction occurs exactly once in the tree.

    1. Mmmm… good point.

      I think this might be because inductive proofs often feel… less explanatory, to me, somehow? Like, proving that the nth triangular number is n(n+1)/2 can be done via induction, with a straightforward computation:
      Base case: 1(2)/2 = 1
      Induction step: (n-1)(n-1+1)/2 + n = n(n+1)/2
      (I vaguely remember doing this when I first learned what induction was.)
      But this proof doesn’t make me feel it. I’d prefer a more geometric argument (like cutting an n x (n+1) array of dots in half along a diagonal) or a more combinatorial one (count the number of handshakes possible among n+1 people in 2 ways: first person shakes n hands, next shakes n-1, …, so total shakes = n+(n-1)+…+1 = nth triangular number; or, all n+1 folks shake n hands, but this counts each shake two times, so total shakes = n(n+1)/2). I feel like I get more insight out of these arguments.

      But this is fickle taste. Much props to induction’s power and scope, which can’t be denied. So, let me nominate another great theorem, which is proved by induction:

      The Spectral Theorem from linear algebra.

  4. I tried to find a way to explain to kids why the order in which two numbers are multiplied does not affect the result, and I found only that way good enough 😀

    Other approach that I thought, but resigned to elaborate, was that:

    5 x 3 = 3 + 3 + 3 + 3 + 3
    3 x 5 = 5 + 5 + 5

    as both the number of terms as the value of each term are different, it’s not evident if the results are equal or not

    but if one number is subtracted from the other and the difference is zero, then they are equal

    we can use

    3 + 3 + 3 + 3 + 3 – 5 = 2 + 2 + 2 + 2 + 2
    5 + 5 + 5 – 3 = 4 + 4 + 4

    in the first case:

    3 + 3 + 3 + 3 + 3 – 5 – 5 – 5 = 0
    so they are equal

    But I think the rows and columns argument is much better.

    By the way, very nice this post! Thanks for it.

  5. Concerning II. The easiest way I can think of to make this equality clear is to show that 1/n simply signifies a quantity wich taken n times equals 1. Then for every natural number k, k=k*1 = ((1/n)*n)*k = (1/n)*nk. Thus k contains nk such quantities, so k times more than 1. Thus 1/n is k-times smaller part of k, than it was of 1. So, to get n-th part of k we must multiply 1/n by k and we get k/n. It’s just an exposition of the fact, that the same fraction of greater number is itself greater by the same coefficient (and vice versa). So, for example 5-th part of 7 is 7/5 (7 times greater than 5-th part of 1, which is 1/5).

    The other way to show that m:n is the same as m/n might go like this. For case when mn. Sorry for my English.

  6. The other way to show that m:n is the same as m/n might go like this. For case when m<n. Let a=m and b=n-m. We have 'a' quantities and want do distribute them into 'a+b' equal quantities. 'a+b' is greater than 'a', so clearly resulting quantities will be smaller than this at hand. What we have to do is to take from each initial quantity (there are 'a' of them) some part of it (the same from each), such that the sum of this parts divided by 'b' (number of "spots" we must "fill in"), would be equal to what's left from some our initial quantity after we took from it aforementioned part.

  7. So, let ‘k’ be ‘1/b’ of that supposed part. Then from our initial quantities we take sum of parts which equals ‘a*(b*k)’ and divide it between ‘b’ spots. So every spot is a quantity ‘a*k’ and – from assumption – equals every quantity which has left after we’ve substracted from initial quantities aforementioned parts. Before we did that every such quantity was greater by b*k, so it was a*k + b*k = (a+b)k. Our new quantities equal a*k. So “new” to “old” is: ak/(a+b)k = a/a+b = m/n.

    In very similar way we could show that equality holds for m>n. Sorry for my English and many post (i looks like there is some limitation concerning number of signs in one post).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s