Math ed bloggers love Star Wars. This post is extremely long, and involves a fair amount of math, so in the hopes of keeping you reading, I promise a Star Wars reference toward the end. Also, you can still get the point if you skip the math, though that would be sad.

The historical research project I gave myself this spring in order to prep my group theory class (which is over now – why am I still at it?) has had me working slowly through two more watershed documents in the history of math:

*Disquisitiones Arithmeticae*

by Carl Friedrich Gauss

(in particular, “Section VII: Equations Defining Sections of a Circle”)

and

* Mémoire sur les conditions de résolubilité des équations par radicaux*

by Evariste Galois

I’m not done with either, but already I’ve been struck with something I wanted to share. Mainly it’s just some cool math, but there’s a pedagogically relevant idea in here too -

Take-home lesson: The first time a problem is solved the solution uses only simple, pre-existing ideas. The arguments and solution methods are ugly and specific. Only later do new, more difficult ideas get applied, which allow the arguments and solution methods to become elegant and general.

The ugliness and specificity of the arguments and solution methods, and the desire to clean them up and generalize them, are thus a natural motivation for the new ideas.

This is just one historical object lesson in why “build the machinery, then apply it” is a pedagogically unnatural order. Professors delight in using the heavy artillery of modern math to give three-sentence proofs of theorems once considered difficult. (I’ve recently taken courses in algebra, topology, and complex analysis, with three different professors, and deep into each course, the professor gleefully showcased the power of the tools we’d developed by tossing off a quick proof of the fundamental theorem of algebra.) Now, this is a very fun thing to do. But if the goal is to make math accessible, then this is not the natural order.

The natural order is to try to answer a question *first*. Maybe we answer it, maybe we don’t. But the desire for and the development of the new machinery come most naturally from direct, hands-on experience with the limitations of the old machinery. And that means using it to try to answer questions.

I’m not saying anything new here. But I just want to show you a really striking example from Gauss. (Didn’t you always want to see some original Gauss? No? Okay, well…)

* * * * *

I am reading a 1966 translation of the *Disquisitiones* by Arthur A. Clarke which I have from the library. An original Latin copy is online here. I don’t read Latin but maybe you do.

I’m focusing on the last section in the book, but at one point Gauss makes use of a result he proved much earlier:

Article 42.

If the coefficients of two functions of the formare all rational and not all integers, and if the product of and is

then not all the coefficients can be integers.

Note that even the statement of Gauss’ proposition here would be cleaned up by modern language. Gauss doesn’t even have the word “polynomial.” The word “monic” (i.e., leading coefficient 1) would also have been handy. In modern language he could have said, “The product of two rational monic polynomials is not an integer polynomial if any of their coefficients are not integers.”

But this is not the most dramatic difference between Gauss’ statement (and proof – just give me a sec) and the “modern version.” On page 400 of Michael Artin’s *Algebra* textbook (which I can’t stop talking about only because it is where I learned like everything I know), we find:

(3.3)

Theorem.Gauss’s Lemma:A product of primitive polynomials in is primitive.

The sense in which this lemma is Gauss’s is precisely the sense in which it is really talking about the contents of Article 42 from *Disquisitiones* which I quoted above.

*Huh?*

First of all, what’s ? Secondly, what’s a primitive polynomial? Third and most important, what does this have to do with the above? Clearly they both have something to do with multiplying polynomials, but…

Okay. is just the name for the set of polynomials with integer coefficients. (Apologies to those of you who know this already.) So a polynomial in is really just a polynomial with integer coefficients. This notation was developed long after Gauss.

More substantively, a “primitive polynomial” is an integer polynomial whose coefficients have gcd equal to 1. I.e. a polynomial from which you can’t factor out a nontrivial integer factor. E.g. is primitive, but is not because you can take out a 2. This idea is from after Gauss as well.

So, “Gauss’s Lemma” is saying that if you multiply two polynomials each of whose coefficients do not have a common factor, you will not get a common factor among all the coefficients in the product.

What does this have to do with the result Gauss actually stated?

That’s an exercise for you, if you feel like it. (Me too actually. I feel confident that the result Artin states has Gauss’s actual result as a consequence; less sure of the converse. What do you think?) (Hint, if you want: take Gauss’s monic, rational polynomials and clear fractions by multiplying each by the lcm of the denominators of its coefficients. In this way replace his original polynomials with integer polynomials. Will they be primitive?)

Meanwhile, what I really wanted to show you are the two proofs. Original proof: ugly, long, specific, but containing only elementary ideas. Modern proof: cute, elegant, general, but involving more advanced ideas.

Here is a very close paraphrase of Gauss’ original proof of his original claim. Remember, and are monic polynomials with rational coefficients, not all of which are integers, and the goal is to prove that ‘s coefficients are not all integers.

Demonstration.Put all the coefficients of and in lowest terms. At least one coefficient is a noninteger; say without loss of generality that it is in . (If not, just switch the roles of and .) This coefficient is a fraction with a denominator divisible bysomeprime, say . Find the term in among all the terms in whose coefficient’s denominator is divisible by thehighestpower of . If there is more than one such term, pick the one with the highest degree. Call it , and let the highest power of that divides the denominator of be . ( since was chosen to divide the denominator of some coefficient in at least once.). The key fact about the choice of is, in Gauss’s words, that its “denominator involveshigherpowers of than the denominators of all fractional coefficients that precede it, andno lowerpowers than the denominators of all succeeding fractional coefficients.”Gauss now divides by to guarantee that at least one term in it (at the very least, the leading term) has a fractional coefficient with a denominator divisible by , so that he can play the same game and choose the term of with having a denominator divisible by more times than any preceding fractional coefficient and at least as many times as each subsequent coefficient. Let the highest power of dividing the denominator of be . (Having divided the whole of by guarantees that , just like .)

I’ll quote Gauss word-for-word for the next step:

“Let those terms in which precede be , , etc. and those which follow be , , etc.; in like manner the terms which precede will be , , etc. and the terms which follow will be , , etc. It is clear that in the product of and the coefficient of the term will

“The term will be a fraction, and if it is expressed in lowest terms, it will involve powers of in the denominator. If any of the other terms is a fraction, lower powers of p will appear in the denominators because each of them will be the product of two factors, one of them involving no more than powers of , the other involving fewer than such powers; or one of them involving no more than powers of , the other involving fewer than such powers. Thus will be of the form , the others of the form where is positive and are free of the factor , and the sum will

The numerator is not divisible by and so there is no reduction that can produce powers of lower than .”

(This is on pp. 25-6 of the Clarke translation.)

This argument guarantees that the coefficient of in , expressed in lowest terms, has a denominator divisible by . Thus the coefficient of the same term in has a denominator divisible by . Since and are each at least 1, this means the denominator of this term is divisible by at least once, and so a fraction. Q.E.D.

Like I said – *nasty*, right? But the concepts involved are just fractions and divisibility. Compare a modern proof of “Gauss’ Lemma” (the statement I quoted above from Artin – a product of primitive integer polynomials is primitive).

Proof.Let the polynomials be and . Pick any prime number , and reduce everything mod . and are primitive so they each have at least one coefficient not divisible by . Thus and . By looking at the leading terms of and mod we see that the product must be nonzero mod as well. This implies that contains at least one coefficient not divisible by . Since this argument works for any prime , it follows that there is no prime dividing every coefficient in , which means that it is primitive. Q.E.D.^{1}

Clean and quick. *If* you’re familiar with the concepts involved, it’s way easier to follow than Gauss’s original. *But,* you have to first digest a) the idea of reducing everything mod ; b) the fact that this operation is compatible with all the normal polynomial operations; and c) the crucial fact that because is prime, the product of two coefficients that are not will also be nonzero mod .

Now Gauss actually had access to all of these ideas. In fact it was in the *Disquisitiones Arithmeticae* itself that the world was introduced to the notation “.” So in a way it’s even more striking that he didn’t think to use them here when they would have cleaned up so much.

What bugged me out and made me excited to share this with you was the realization that these two proofs are *essentially the same proof*.

*What?*

I’m not gonna spell it out, because what’s the fun in that? But here’s a hint: that term that Gauss singled out in his polynomial ? Think about what would happen to that term (in comparison with all the terms before it) if you a) multiplied the whole polynomial by the lcm of the denominators to clear out all the fractions and yield a primitive integer polynomial, and then b) reduced everything mod p.

(If you are into this sort of thing, I found it to be an awesome exercise, that gave me a much deeper understanding of both proofs, to flesh out the equivalence, so I recommend that.)

* * * * *

What’s the pedagogical big picture here?

I see this as a case study in the value of approaching a problem with unsophisticated tools before learning sophisticated tools for it. To begin with, this historical anecdote seems to indicate that this is the *natural flow*. I mean, everybody always says Gauss was *the greatest mathematician of all time*, and even *he* didn’t think to use reduction mod on this problem, even though he was developing this tool on the surrounding pages of the very the same book.

In more detail, why is this more pedagogically natural than “build the (sophisticated) machine, then apply it”?

First of all, the machine is inscrutable before it is applied. Think about being handed all the tiny parts of a sophisticated robot, along with assembly instructions, but given no sense of how the whole thing is supposed to function once it’s put together. And then trying to follow the instructions. This is what it’s like to learn sophisticated math ideas machinery-first, application-later. I felt this way this spring in learning the idea of Brouwer degree in my topology class. Now that everything is put together, I have a strong desire to go back to the beginning and do the whole thing again knowing what the end goal is. The ideas felt so airy and insubstantial the first time through. I never felt grounded.

Secondly, the quick solution that is powered by the sophisticated tools *loses something* if it’s not coupled with some experience working on the same problem with less sophisticated tools. The aesthetic delight that professors take in the short and elegant solution of the erstwhile-difficult problem comes from an intimacy with this difficulty that the student *skips* if she just learns the power tools and then zaps it. Likewise, if the goal is to gain insight into the problem, the short, turbo-powered solution often feels very illuminating to someone (say, the professor) who knows the long, ugly solution, but like pure magic, and therefore not illuminating at all, to someone (say, a student) who doesn’t know any other way. There is something tenuous and spindly about knowing a high-powered solution *only*.

Here I can cite my own experience with Gauss’s Lemma, the subject of this post. I remember reading the proof in Artin a year ago and being satisfied at the time, but I also remember being unable to recall this proof (even though it’s so simple! maybe *because* it’s so simple!) several months later. You read it, it works, it’s like *poof! done!* It’s almost like a sharp thin needle that passes right through your brain without leaving any effect. (Eeew… sorry that was gross.) The process of working through Gauss’ original proof, and then working through how the proofs are so closely related, has made my understanding of Artin’s proof far deeper and my appreciation of its elegance far stronger. Before, all I saw was a cute short argument that made something true. I now see in it *the mess that it is elegantly cleaning up*.

I’ve had a different form of the same experience as I fight my way through Galois’ paper. (I am working through the translation found in Appendix I of Harold Edwards’ 1984 book *Galois Theory*. This is a great way to do it because if at any point you are totally lost about what Galois means, you can usually dig through the book and find out what Edwards thinks he means.) I previously learned a modern treatment of Galois theory (essentially the one found in Nathan Jacobson’s *Basic Algebra I* – what a ridiculous title from the point of view of a high school teacher!). When I learned it, I “followed” everything but I *knew* my understanding was not where I wanted it to be. Here the words “spindly” and “tenuous” come to mind again. The arguments were built one on top of another till I was looking at a tall machine with a lot of firepower at the very top but supported by a series of moving parts I didn’t have a lot of faith in.

An easy mark for Ewoks, and I knew it.

This version of Galois theory was all based on concepts like fields, automorphisms, vector spaces, separable and normal extensions, of which Galois himself had access to *none*. The process of fighting through Galois’ original development of his theory and trying to understand how it is related to what I learned before has been slowly filling out and reinforcing the lower regions of this structure for me. Coupling the sophisticated with the less sophisticated approach has given the entire edifice some solidity.

Thirdly, and this is what I feel like I hear folks (Shawn Cornally, Dan Meyer, Alison Blank, etc.) talk about a lot, but it bears repeating, is this:

If you attack a problem with the tools you have, and either you can’t solve it, or you can solve it but your solution is messy and ugly, like Gauss’s solution above (if I may), *then you have a reason to want better tools.* Furthermore, the *way* in which your tools are failing you, or in which they are being inefficient, may be a hint to you for how the better tools need to look.

Just as an example, think about how awesome reduction mod is going to seem if you are already fighting (as Gauss did) with a whole bunch of adding stuff up some of which is divisible by and some of which is not. What if you could treat everything divisible by as *zero* and then summarily forget about it? How convenient would that be?

I want to bring this back to the K-12 level so let me give one other illustration. A major goal of 7th-9th grade math education in NY (and elsewhere) is getting kids to be able to solve all manner of single-variable linear equations. The basic tool here is “doing the same thing to both sides.” (As in, dividing both sides of the equation by 2, or subtracting 2x from both sides…) For the kids this is a powerful and sophisticated tool, one that takes a lot of work to fully understand, because it involves the extremely deep idea that you can change an equation without changing the information it is giving you.

There is no reason to bring out this tool in order to have the kiddies solve . It’s even unnatural for solving . Both of these problems are susceptible to much less abstract methods, such as “working backwards.” The “both sides” tool is not naturally motivated until the variable appears on both sides of the equation. I used to let students solve whatever way made sense to them, but then try to impose on them the understanding that what they had “really done” was to add 21 to both sides and then divide both sides by 4, *so that later* when I gave them equations with variables on both sides, they’d be ready. This was weak because I was working against the natural pedagogical flow. They didn’t need the new tool yet because I hadn’t given them problems that brought them to the limitations of the old tool. Instead, I just tried to force them to reimagine what they’d already been doing in a way that felt unnatural to them. Please, if a student answers your question and can give you any mathematically sound reason, no matter how elementary, accept it! If you would like them to do something fancier, try to give them a problem that forces them to.

Basically this whole post adds up to an excuse to show you some cool historical math and a plea for due respect to be given to unsophisticated solutions. There is no rush to the big fancy general tools (except the rush imposed by our various dark overlords). They are learned better, and appreciated better, if students, teachers, mathematicians first get to try out the tools we already have on the problems the fancy tools will eventually help us answer. It worked for Gauss.

[1] This is the substance of the proof given in Artin but I actually edited it a bit to make it (hopefully) more accessible. Artin talks about the ring homomorphism and the images of and (he calls them and ) under this homomorphism.

ADDENDUM 8/10/11:

I recently bumped into a beautiful quote from Hermann Weyl that I had read before (in Bob and Ellen Kaplan’s *Out of the Labyrinth*, p. 157) and forgotten. It is entirely germane.

Beautiful general concepts do not drop out of the sky. To begin with, there are definite, concrete problems, with all their undivided complexity, and these must be conquered by individuals relying on brute force. Only then come the axiomatizers and systematizers and conclude that instead of straining to break in the door and bloodying one’s hands one should have first constructed a magic key of such and such a shape and then the door would have opened quietly, as if by itself. But they can construct the key only because the successful breakthrough enables them to study the lock front and back, from the outside and from the inside. Before we can generalize, formalize and axiomatize there must be mathematical substance.

Very nice!

A caution though. I’m not sure if even your revised method would be productive; that is, allowing students to plow through x + 7 = 10 etc. through brute force and then transitioning straight to 3x + 4 = x – 6. Some would need all the material repeated again — that is, start by solving just one step, then two step, etc. This is doable in an efficient way if Phase 1 is fairly short but also runs the risk that students will cling to brute force methods. I’ve seen it happen before.

An alternate approach is to start with the algebraic form but give a preview of what is to come before starting anything in an attempt to convince students to work things out the slow way. This isn’t as philosophically satisfying or pure (and some students will remain unconvinced), but it can mitigate some problems if the other method is logistically impossible.

Hmmm… the danger of clinging to the brute force methods… yes I’ve seen that too. I can’t base this claim on much in the way of field testing, but I feel sure that what I’m recommending here can be made to work. The problem to push it to the next level isn’t 3x+4 = x – 6, though, it’s 3x+4 = 4x. And, ideally, some sort of physical model of the problem, so that people can reason concretely about it. (Taking a leaf from a mentor of mine, Maurice Page, I used to use plastic cups and poker chips.)

Anyway, my real point here is that I think in my own teaching of this topic in Algebra I, by insisting that the kids do something to both sides from an early point, I was asking them to divorce equation solving from their own reasoning processes a bit. This past year as a teacher trainer I’ve seen a fair amount of this as well. “Solve x + 7 = 10″ is a concrete question that can be thought about sensibly and answered easily. If I just pose the question and take any mathematically valid answer, I’m encouraging the kids to see the question as logical and whatever the natural contours of their reasoning as authoritative. If on the other hand I force them to subtract 7 from both sides before they see the point, I’m divorcing the activity from their reasoning. This is what I’m recommending against.

I have faith that an alternative that doesn’t do this can be made to work, but I have only limited experience with this. I like your idea of giving a preview of the hard problems for which the advanced method will be needed; but I would never again want to try to convince students they “have to” or “should” subtract 7 from both sides in order to solve x + 7 = 10. I just feel like this is sending the wrong message. The only context I could see doing this is, once we develop “doing to both sides” as a method for harder problems, going back to x +7 = 10 as a check to see that it really works on problems we already understand well.

I haven’t finished reading the whole post, but I thought I should share a resolution to your first question about the equivalence of Gauss and Artin’s statements of the lemma.

Artin’s version (product of primitive integral polynomials is primitive) is equivalent to Gauss’s version. That Artin’s version implies Gauss’s version seems relatively straightforward. Here’s a proof of the converse in the form “The failure of Artin’s version implies the failure of Gauss’s version” (which is simply proof by contrapositive)

proof: (rough and dirty) Suppose f, g are integral primitive polynomials and that f*g is not primitive. Then some integer k>1 divides f*g; fg/k is integral. Well, then let f/k be a polynomial with rational coefficients, which must be non-integral since f is primitive. (f/k)*g=fg/k is integral, contradicting Gauss’s version of Gauss’s lemma.

Thanks! I thought of this, but Gauss’s version of his lemma requires that the polynomials both be monic. There’s no guarantee that either f/k or g is monic. (In the proof, Gauss uses the monicness assumption to guarantee the existence of a fraction with denominator divisible by p at least once in Q/p – “the coefficient of the first term, 1/p, for example.” – p. 25)

In fact, without the requirement that the polynomials be monic, or some other requirement to rule out what follows, the claim is false: take (x/2 + 1/2) times (2x + 2) for example. I.e. take two integer polynomials f and g and multiply f/k with kg for an integer k chosen so that not all coefficients in f/k are integers.

My doubt about Gauss’ version -> Artin’s version comes from the fact that it seems to me that monicness might be an unnecessarily strong condition to rule out this possibility.

Ahh, I see. Tricky. I’m inclined to agree with you, although I should give it some more thought.

1) the two statements are logically equivalent since they’re both true. 2) I don’t see any obvious way to get Artin’s from Gauss’s though I definitely see the other direction, so I think it’s fair to call Artin’s version a generalization.

>The “both sides” tool is not naturally motivated until the variable appears on both sides of the equation.

I use the ‘both sides tool’ so automatically at this point, I have no idea what would naturally motivate it for a student. But for me it’s sure easier than working backwards when negative numbers are involved.

Hi Ben,

Thanks for this wonderful essay. I’ve had similar ideas as I develop my abstract algebra course to follow one of the original proofs of the Fundamental Theorem of Algebra.

What gives me great joy with this approach is exactly what you point to – how struggling with the simple, low level tools really gives you a familiarity with the material. And, as a bonus, the high level proof hits you like a wonderful punchline. You get it! The approach that you criticize is as if you wander in to the room just in time for the punchline, without having heard the joke itself.

One of my favorite quotes illustrates this very well: (from http://www.simonsingh.net/Andrew_Wiles.html)

“You enter the first room of the mansion and it’s completely dark. You stumble around bumping into the furniture but gradually you learn where each piece of furniture is. Finally, after six months or so, you find the light switch, you turn it on, and suddenly it’s all illuminated. You can see exactly where you were. Then you move into the next room and spend another six months in the dark. So each of these breakthroughs, while sometimes they’re momentary, sometimes over a period of a day or two, they are the culmination of, and couldn’t exist without, the many months of stumbling around in the dark that precede them.”

Cheers!

Japheth

That’s a beautiful quote from Wiles. I’d heard it up till “…another six months in the dark” but the part after that is new to me. I love that he’s explicitly saying that the stumbling around is what makes the breakthroughs possible.