Teaching proof writing

I’m at BEAM 7 (formerly SPMPS) right now. I just taught a week-long, 18 hour course on number theory to 11 awesome middle schoolers. I’ve done this twice before, in 2013 and 2014. (Back then it was 20 hrs, and I totally sorely missed those last two!) The main objective of the course is some version of Euclid’s proof of the infinitude of the primes. In the past, what I’ve gotten them to do is to find the proof and convince themselves of its soundness in a classroom conversation. I actually wrote a post 4 years ago in which I recounted how (part of) the climactic conversation went.

This year, about halfway through, I found myself with an additional goal: I wanted them to write down proofs of the main result and the needed lemmas, in their own words, in a way a mathematician would recognize as complete and correct.

I think this happened halfway through the week because until then I had never allowed myself to fully acknowledge how separate a skill this is from constructing a proof and defending its soundness in a classroom conversation.

At any rate, this was my first exercise in teaching students how to workshop a written proof since the days before I really understood what I was about as an educator, and I found a structure that worked on this occasion, so I wanted to share it.

Let me begin with a sample of final product. This particular proof is for the critical lemma that natural numbers have prime factors.

Theorem: All natural numbers greater than 1 have at least one prime factor.

Proof: Let N be any natural number > 1. The factors of N will continue descending as you keep factoring non-trivially. Therefore, the factoring of the natural number will stop at some point, since the number is finite.

If the reader believes that the factoring will stop, it has to stop at a prime number since the factoring cannot stop at a composite because a composite will break into more factors.

Since the factors of N factorize down to prime numbers, that prime is also a factor of N because if N has factor Y and Y has a prime factor, that prime factor is also a factor of N. (If a\mid b and b\mid c then a\mid c.)

There was a lot of back and forth between them, and between me and them, to produce this, but all the language came from them, except for three suggestions I made, quite late in the game:

1) I suggested the “Let N be…” sentence.
2) I suggested the “Therefore” in the first paragraph.
3) I suggested the “because” in the last paragraph. (Priorly, it was 2 separate sentences.)

Here’s how this was done.

First, they had to have the conversation where the proof itself was developed. This post isn’t especially about that part, so I’ll be brief. I asked them if a number could be so big none of its factors were prime. They said, no, this can’t happen. I asked them how they knew. They took a few minutes to hash it out for themselves and their argument basically amounted to, “well, even if you factor it into composite numbers, these themselves will have prime factors, so QED.” I then expressed that because of my training, I was aware of some possibilities they might not have considered, so I planned on honoring my dissatisfaction until they had convinced me they were right. I proceeded to press them on how they knew they would eventually find prime factors. It took a long time but they eventually generated the substance of the proof above. (More on how I structure this kind of conversation in a future post.)

I asked them to write it down and they essentially produced only the following two sentences:

1. The factoring of the natural number will stop at a certain point, since the number is finite.
2. If X (natural) has a factor Y, and Y has a prime factor, that prime factor is also a factor of X.

This was the end product of a class period. Between this one and the next was when it clicked for me that I wanted proof writing to be a significant goal. It was clear that they had all the parts of the argument in mind, at least collectively if not individually. But many of the ideas and all of the connective tissue were missing from their class-wide written attempt. On the one hand, given how much work they had already put in, I felt I owed it to them to help them produce a complete, written proof that would stand up to time and be legible to people outside the class. On the other, I was wary to insert myself too much into the process lest I steal any of their sense of ownership over the finished product. How to scaffold the next steps in a way that gave them a way forward, and led to something that would pass muster outside the class, but left ownership in their hands?

Here’s what I tried, which at least on this occasion totally worked. (Quite slowly, fyi.)

I began with a little inspirational speech about proof writing:

Proof writing is the power to force somebody to believe you, who doesn’t want to.

The point of this speech was to introduce a character into the story: The Reader. The important facts about The Reader are:

(1) They are ornery and skeptical. They do not want to believe you. They will take any excuse you give them to stop listening to you and dismiss what you are saying.

(2) If you are writing something down that you talked about earlier, your reader was not in the room when you talked about it.

Having introduced this character, I reread their proof to them and exposed what The Reader would be thinking. I also wrote it down on the board for them to refer to:

1. The factoring of the natural number will stop at a certain point, since the number is finite.

(a) What does finiteness of the number have to do with the conclusion that the factoring will stop? (b) Why do you believe the numbers at which the factoring stops will be prime?

2. If X (natural) has a factor Y, and Y has a prime factor, that prime factor is also a factor of X.

What does this have to do with anything?

(I don’t have a photo of the board at this stage. I did do The Reader’s voice in a different color.)

Then I let them work as a whole class. I had the students run the conversation completely and decide when they were ready to present their work to The Reader again. In one or two more iterations of this, they came up with all of the sentences in the proof quoted above except for “Let N be…” and minus the “Therefore” and “because” mentioned before. They started to work on deciding an order for the sentences. At this point it seemed clear to me they knew the proof was theirs, so I told them I (not as The Reader but as myself) had a suggestion and asked if I could make it. They said yes, and I suggested which sentence to put first. I also suggested the connecting words and gave my thinking about them. They liked all the suggestions.

This is how it was done. From the first time I gave the reader’s feedback to the complete proof was about 2 hours of hard work.

Let me highlight what for me was the key innovation:

It’s that the feedback was not in the teacher’s (my) voice, but instead in the voice of a character we were all imagining, which acted according to well-defined rules. (Don’t believe the proof unless forced to; and don’t consider any information about what the students are trying to communicate that is not found in the written proof itself.) This meant that at some point I could start to ask, “what do you think The Reader is going to say?” I was trying to avoid the sense that I was lifting the work of writing the proof from them with my feedback, and this mode of feedback seemed to support making progress with the proof while avoiding this outcome.


As you may have guessed, the opening phrase of the sentence “If the reader believes…” in the final proof is an artifact of the framing in terms of The Reader. Actually, at the end, the kids had an impulse to remove this phrase in order to professionalize the sentence. I encouraged them to keep it because I think it frames the logical context of the sentence so beautifully. (I also think it is adorable.)


Uhm sayin

Dan Meyer’s most recent post is about how in order to motivate proof you need doubt.

This is something I was repeatedly and inchoately hollering about five years ago.

As usual I’m grateful for Dan’s cultivated ability to land the point cleanly and actionably. Looking at my writing from 5 years ago – it’s some of my best stuff! totally follow those links! – but it’s long and heady, and not easy to extract the action plan. So, thanks Dan, for giving this point (which I really care about) wings.

I have one thing to add to Dan’s post! Nothing I haven’t said before but let’s see if I can make it pithy so it can fly too.

Dan writes that an approach to proof that cultivates doubt has several advantages:

  1. It motivates proof
  2. It lowers the threshold for participation in the proof act
  3. It allows students to familiarize themselves with the vocabulary of proof and the act of proving
  4. It makes proving easier

I think it makes proving not only easier but way, way easier, and I have something to say about how.

Legitimate uncertainty and the internal compass for rigor

Anybody who has ever tried to teach proof knows that the work of novice provers on problems of the form “prove X” is often spectacularly, shockingly illogical. The intermediate steps don’t follow from the givens, don’t imply the desired conclusion, and don’t relate to each other.

I believe this happens for an extremely simple reason. And it’s not that the kids are dumb.

It happens because the students’ work is unrelated to their own sense of the truth! You told them to prove X given Y. To them, X and Y look about equally true. Especially since the problem setup literally informed them that both are true. Everything else in sight looks about equally true too.

There is no gradient of confidence anywhere. Thus they have no purchase on the geography of the truth. They are in a flat, featureless wilderness where all the directions look the same, and they have no compass. So they wander in haphazard zigzags! What the eff else can they do??

The situation is utterly different if there is any legitimate uncertainty in the room. Legitimate uncertainty is an amazing, magical, powerful force in a math classroom. When you don’t know and really want to know, directions of inquiry automatically get magnetized for you along gradients of confidence. You naturally take stock of what you know and use it to probe what you don’t know.

I call this the internal compass for rigor.

Everybody’s got one. The thing that distinguishes experienced provers is that we have spent a lot of time sensitizing ours and using it to guide us around the landscape of the truth, to the point where we can even feel it giving us a validity readout on logical arguments relating to things we already believe more or less completely. (This is why “prove X” is a productive type of exercise for a strong college math major or a graduate student, and why mathematicians agree that the twin prime conjecture hasn’t been proven yet even though everybody believes it.)

But novice provers don’t know how to feel that subtle tug yet. If you say “prove X” you are settling the truth question for them, and thereby severing their access to their internal compass for rigor.

Fortunately, the internal compass is capable of a much more powerful pull, and that’s when it’s actually giving you a readout on what to believe. Everybody can and does feel this pull. As soon as there’s something you don’t know and want to know, you feel it.

This means that often it’s enough merely to generate some legitimate mathematical uncertainty in the students, and some curiosity about it, and then just watch and wait. With maybe a couple judicious and well-thought-out hints at the ready if needed. But if the students resolve this legitimate uncertainty for themselves, well, then, they have probably more or less proven something. All you have to do is interview them about why they believe what they’ve concluded and you will hear something that sounds very much like a proof.

Wherein This Blog Serves Its Original Function

The original inspiration for starting this blog was the following:

I read research articles and other writing on math education (and education more generally) when I can. I had been fantasizing (back in fall 2009) about keeping an annotated bibliography of articles I read, to defeat the feeling that I couldn’t remember what was in them a few months later. However, this is one of those virtuous side projects that I never seemed to get to. I had also met Kate Nowak and Jesse Johnson at a conference that summer, and due to Kate’s inspiration, Jesse had started blogging. The two ideas came together and clicked: I could keep my annotated bibliography as a blog, and then it would be more exciting and motivating.

That’s how I started, but while I’ve occasionally engaged in lengthy explication and analysis of a single piece of writing, this blog has never really been an annotated bibliography. EXCEPT FOR RIGHT THIS VERY SECOND. HA! Take THAT, Mr. Things-Never-Go-According-To-Plan Monster!

“Opportunities to Learn Reasoning and Proof in High School Mathematics Textbooks”, by Denisse R. Thompson, Sharon L. Senk, and Gwendolyn J. Johnson, published in the Journal for Research in Mathematics Education, Vol. 43 No. 3, May 2012, pp. 253-295

The authors looked at HS level textbooks from six series (Key Curriculum Press; Core Plus; UCSMP; and divisions of the major publishers Holt, Glencoe, and Prentice-Hall) and analyzed the lessons and problem sets from the point of view of “what are the opportunities to learn about proof?” To keep the project manageable they just looked at Alg. 1, Alg. 2 and Precalc books and focused on the lessons on exponents, logarithms and polynomials.

They cast the net wide, looking for any “proof-related reasoning,” not just actual proofs. For lessons, they were looking for any justification of stated results: either an actual proof, or a specific example that illustrated the method of the general argument, or an opportunity for students to fill in the argument. For exercise sets, they looked at problems that asked students to make or investigate a conjecture or evaluate an argument or find a mistake in an argument in addition to asking students to actually develop an argument.

In spite of this wide net, they found that:

* In the exposition, proof-related reasoning is common but lack of justification is equally common: across the textbook series, 40% of the mathematical assertions about the chosen topics were made without any form of justification;

* In the exercises, proof-related reasoning was exceedingly rare: across the textbook series, less than 6% of exercises involved any proof-related reasoning. Only 3% involved actually making or evaluating an argument.

* Core Plus had the greatest percentage of exercises with opportunities for students to develop an argument (7.5%), and also to engage in proof-related reasoning more generally (14.7%). Glencoe had the least (1.7% and 3.5% respectively). Key Curriculum Press had the greatest percentage of exercises with opportunities for students to make a conjecture (6.0%). Holt had the least (1.2%).

The authors conclude that mainstream curricular materials do not reflect the pride of place given to reasoning and proof in the education research literature and in curricular mandates.

“Expert and Novice Approaches to Reading Mathematical Proofs”, by Matthew Inglis and Lara Alcock, published in the Journal for Research in Mathematics Education, Vol. 43 No. 4, July 2012, pp. 358-390

The authors had groups of undergraduates and research mathematicians read several short, student-work-typed proofs of elementary theorems, and decide if the proofs were valid. They taped the participants’ eye movements to see where their attention was directed.

They found:

* The mathematicians did not have uniform agreement on the validity of the proofs. Some of the proofs had a clear mistake and then the mathematicians did agree, but others were more ambiguous. (The proofs that were used are in an appendix in the article so you can have a look for yourself if you have JSTOR or whatever.) The authors are interested in using this result to challenge the conventional wisdom that mathematicians have a strong shared standard for judging proofs. I am sympathetic to the project of recognizing the way that proof reading depends on context, but found this argument a little irritating. The proofs used by the authors look like student work: the sequence of ideas isn’t being communicated clearly. So it wasn’t the validity of a sequence of ideas that the participants evaluated, it was also the success of an imperfect attempt to communicate that sequence. Maybe this distinction is ultimately unsupportable, but I think it has to be acknowledged in order to give the idea that mathematicians have high levels of agreement about proofs its due. Nobody who espouses this really thinks that mathematicians are likely to agree on what counts as clear communication. Somehow the sequence of ideas has to be separated from the attempt to communicate it if this idea is to be legitimately tested.

* The undergraduates spent a higher percentage of the time looking at the formulas in the proofs and a lower percentage of time looking at the text, as compared with the mathematicians. The authors argue that this is not fully explained by the hypothesis that the students had more trouble processing the formulas, since the undergrads spent only slightly more time total on them. The mathematicians spent substantially more time on the text. The authors speculate that the students were not paying as much attention to the logic of the arguments, and that this pattern accounts for some of the notorious difficulty that students have in determining the validity of proofs.

* The mathematicians moved their focus back and forth between consecutive lines of the proofs more frequently than the undergrads did. The authors suggest that the mathematicians were doing this to try to infer the “implicit warrant” that justified the 2nd line from the 1st.

The authors are also interested in arguing that mathematicians’ introspective descriptions of their proof-validation behavior are not reliable. Their evidence is that previous research (Weber, 2008: “How mathematicians determine if an argument is a valid proof”, JRME 39, pp. 431-459) based on introspective descriptions of mathematicians found that mathematicians begin by reading quickly through a proof to get the overall structure, before going into the details; however, none of the mathematicians in the present study did this according to their eye data. One of them stated that she does this in her informal debrief after the study, but her eye data didn’t indicate that she did it here. Again I’m sympathetic to the project of shaking up conventional wisdom, and there is lots of research in other fields to suggest that experts are not generally expert at describing their expert behavior, and I think it’s great when we (mathematicians or anyone else) have it pointed out to us that we aren’t right about everything. But I don’t feel the authors have quite got the smoking gun they claim to have. As they acknowledge in the study, the proofs they used are all really short. These aren’t the proofs to test the quick-read-thru hypothesis on.

The authors conclude by suggesting that when attempting to teach students how to read proofs, it might be useful to explicitly teach them to mimic the major difference found between novices and experts in the study: in particular, the idea is to teach them to ask themselves if a “warrant” is required to get from one line to the next, to try to come up with one if it is, and then to evaluate it. This idea seems interesting to me, especially in any class where students are expected to read a text containing proofs. (The authors are also calling for research that tests the efficacy of this idea.)

The authors also suggest ways that proof-writing could be changed to make it easier for non-experts to determine validity. They suggest (a) reducing the amount of symbolism to prevent students being distracted by it, and (b) making the between-line warrants more explicit. These ideas strike me as ridiculous. Texts already differ dramatically with respect to (a) and (b), there is no systemic platform from which to influence proof-writing anyway, and in any case as the authors rightly note, there are also costs to both, so the sweet spot in terms of text / symbolism balance isn’t at all clear and neither is the implicit / explicit balance. Maybe I’m being mean.

Angle Sum Formulas: Request for Ideas

One of the student teachers I supervise is planning a lesson introducing the sine and cosine angle sum formulas. I wanted to give him some advice on how to make the lesson better – in particular, along the axes of motivation and justification – and realized that, never having taught precalculus, I barely had any! Especially re: justification. I basically understand these formulas as corollaries of the geometry of multiplication of complex numbers.[1] I have seen elementary proofs, but I remember them as feeling complicated and not that illuminating.

So: how do you teach the trig angle sum formulas? And in particular:

* How do you make them seem needed? (I offered my young acolyte the idea of asking the kids to find sin 30, sin 45, sin 60, sin 75 and sin 90 – with the intention of having them be slightly bothered by the fact that they can do all but sin 75.)

* Do you state the formulas or do you set something up to have the kids conjecture them? If the latter, how do you do it? How does it fly?

* How do you justify them? Do you do a rigorous derivation? Do you do something to make them seem intuitively reasonable? What do you do and how does it fly?

* Do you do them before or after complex numbers, and do you connect the two? If so, how do you do it and how does it fly?

Any thoughts would be much appreciated.

Addendum 3/20/11:

Thanks to John Abreu, who sent me the following in an email –

Please find attached a Word document with the proofs of the trig angle sum formulas. After opening the document you’ll see a sequence of 14 figures, the conclusions are obtained comparing the two of them in yellow. Also, I left the document in “crude” format so it’ll be easier for you to decide the format before posting.

I must say that the proofs/method is not mine, but I can’t remember where I learned them.

with an attachment containing the following figures (click to enlarge / for slideshow) –

As far as I can tell, the proof is valid for any pair of angles with an acute sum.


[1]Let z_1,z_2 be two complex numbers on the unit circle, at angles \theta_1,\theta_2 from the positive real axis. Then z_1=\cos{\theta_1}+\imath\sin{\theta_1} and z_2=\cos{\theta_2}+\imath\sin{\theta_2}, so by sheer algebra, z_1z_2=(\cos{\theta_1}\cos{\theta_2}-\sin{\theta_1}\sin{\theta_2})+\imath(\cos{\theta_1}\sin{\theta_2}+\sin{\theta_1}\cos{\theta_2}). On the other hand, the awesome thing about multiplication of complex numbers is that the angles add – the product z_1z_2 will be at an angle of \theta_1+\theta_2 from the positive real axis; thus it is equal to \cos{(\theta_1+\theta_2)} + \imath\sin{(\theta_1+\theta_2)}. This is QED for both formulas if you believe me about the awesome thing. Of course it usually gets proven the other way – first the trig formulas, then use this to prove angles add when you multiply. But I think of the fact about multiplication of complex numbers as more essential and fundamental, and the sum formulas as byproducts.

Creating Balance III / Miscellany

The Creating Balance in an Unjust World conference is back! I went a year and a half ago and it was awesome. Math education and social justice, what more could you want?

If you’re in NYC and you’re around this weekend, it’s happening right now! I’m going to try to make it to Session 3 this afternoon. It’s at Long Island University, corner of Flatbush and DeKalb in Brooklyn, right off the DeKalb stop on the Q train. I heard from one of the organizers that you can show up and register at the conference. I’m not 100% sure how that works given that it’s already begun, but I am sure you can still go.

* * * * *

I’ve just had a very intense week.

I want to get some thoughts down. I’m going to try very hard to resist my natural inclinations to a) try to work them into an overall narrative, and b) take forever doing it. Let’s see how I do.

(Ed. note: apparently not very well.)

I. Last spring I wrote

20*20 is 400; how does taking away 2 from one of the factors and 3 from the other affect the product? We get kids thinking hard about this and it would support the most contrivance-free explanation for why (neg)(neg)=(pos) that I have ever seen.

Without going into contextual details, let me just say that if you try to use this to actually develop the multiplication rules in a 1-hour lesson, all that will happen is that you will be dragging kids through the biggest, clunkiest, hardest-to-swallow, easiest-to-lose-the-forest-for-the-trees, totally-mathematically-correct-but-come-now model for signed number multiplication that you have ever seen (and this includes the hot and cold cubes). This idea makes sense for building intuition about signed numbers slowly, before they’re an actual object of study. It does not make any sense at all for teaching a one-off lesson explicitly about them. (Yes, the hard way. I totally knew this five months ago – what was I thinking?)

II. I gave a workshop Wednesday night, for about 35 experienced teachers, entitled “Why Linear Algebra Is Awesome.” The idea was to reinterpret the Fibonacci recurrence as a linear transformation and use linear algebra to get a closed form for the Fibonacci numbers. Again, without going into details –

I gave a problem set to make participants notice that the transformation we were working with was linear. I used those PCMI-style tricks like giving two problems in a row that have the same answer for a mathematically significant reason. This worked totally well. Here is the problem set:

Oops I guess I failed to avoid going into details. Anyway, the question was about how to follow this up. I went over 1-4 with everyone (actually, I had individual participants come up to the front for #3 and 4) at which point the only thing I really needed out of this – the linearity of the transformation – had been noticed by pretty much the whole room. One participant had gotten to #9 where you prove it, and I had her go over her proof.

I think this was valueless for the group as a whole. The proof was just a straight computation. You kind of have to do it yourself to feel it at all. It was such a striking difference watching people work on the problem set and have all these lightbulbs go off, vs. listening to somebody prove the thing they’d noticed. It almost seemed like people didn’t see the connection between what they’d noticed and what just got proved. I told them to take 5 minutes and discuss this connection with their table, but I got the feeling that this instruction was actually further disorienting for some participants.

I’m trying to put the experience into language so I get the lesson from it.

It’s like, there was something uninspired and disconnected about watching somebody formally prove the result, and then afterward trying to find the connection between the proof and the observation. Now that I write this down, clearly that was backward. If I wanted the proof (which was really just a boring calculation) to mean anything, especially if I wanted it to be at all engaging to watch somebody else do the proof, we needed to be in suspense about whether the result was true; either because we legitimately weren’t sure, or because we were pretty sure but a lot was riding on it.

This is adding up to: next time I do it, feel no need to prove the linearity. Let them observe it from the problem set and articulate it, but if there is no sense of uncertainty about it, this is enough. Later in the workshop, when we use it to derive a closed form for the Fibonacci numbers, now a lot is riding on it. If it feels right, we could take that moment to make sure it’s true.

III. As I work on my teacher class, something that’s impressing itself upon me for the first time is that definitions are just as important as proofs. What I mean by this is two things:

a) It makes sense to put a real lot of thought into motivating a course’s key definitions,

and maybe even more importantly,

b) Students of math need practice in creating definitions. You know I think that creating proofs is an underdeveloped skill for most students of math; it strikes me that creating definitions might be even more underdeveloped.

Definitions are one of the most overtly creative products of mathematical work, but they also solve problems. Not in quite the same sense that theorems do – they don’t answer precisely stated questions. But they answer an important question nonetheless – what do we really mean? And to really test a definition, you have to try to prove theorems with it. If it helps you prove theorems, and if the picture that emerges when you prove them matches the image you had when you started trying to make the definition, then it is a “good” definition. (This got clear for me by reading Stephen Maurer’s totally entertaining 1980 article The King Chicken Theorems.)

Anyway this adds up to an activity to put students through that I’ve never explicitly thought about before, but now find myself building up to with my teacher class:

a) Pose a definitional problem. Do a lot of work to make the class understand that we have an important idea at hand for which we lack a good definition.

b) Make them try to create a definition.

c) If they come up with something at all workable, have them try to use it to prove something they already believe true. I’ve often talked in the past about how trying to prove something you already believe true is very difficult, and that will be a problem here. However, unlike in the cases I had in mind (e.g. a typical Geometry “proof exercise”), this situation has the necessary element of suspense: does our definition work?

If they don’t come up with something workable, maybe give them a not entirely precise definition to try out.

d) Refine the definition based on the experience trying to use it to prove something.

I’ll let you know how it goes. I’m excited about it because it mirrors the process that advances mathematics as a discipline. But I expect to have a much better sense of its usefulness once I’ve given it an honest whirl.

Honor your Dissatisfaction

Two things I forgot to say last night.

I. The reason I’m excited about the idea of having my class use its own self-made definitions to try to prove things is not just, or even primarily, because it will help them realize the inadequacies in their definitions. Although it will do that for sure. Even more than that, it seems to me the perfect way to support them in coming up with better definitions. This is what happened to Cauchy: he defined the limit verbally and a little vaguely, but then when he actually tried to use his definition to prove things, he started writing down precise inequalities. He didn’t have a teacher around to point out that this meant he should probably revise his definition, but my class does.

II. Yesterday when I asked my class to try to make a precise definition for what it means to converge, or for something to have a limit, some of them who took real analysis long ago began accessing this knowledge in an incomplete way. They started to talk about \epsilon and \delta, but in vague, uncertain terms. It looked as though others might possibly accept the half-remembered vagueries because they seemed like they might be the “this is supposed to be the answer” answer. I had to prevent this. (The danger would have been even greater if these participants had correctly and confidently remembered the definition.) I stepped in to the conversation to say, yes, that thing you’re half-remembering is my objective, but what’s going to make you understand it so you never forget it again is to fight till you’re satisfied we’ve captured the meaning of convergence. You can either fight with the definition you half-remember or you can fight to build a new definition, but you have to go through your dissatisfaction to get there. You have to air all this dissatisfaction.

Afterward, I thought of a better language. I’ll give this to them next time.

Honor your dissatisfaction.

Dissatisfaction is the engine that created analysis. This content, more than any other content, is both confusing and pointless if you bury your dissatisfaction rather than allowing it to thrive and be answered. The primary virtue of the tools of analysis is that they are satisfying. Only if you bring forth your dissatisfaction will this content have a chance to show you its value. So. Honor your dissatisfaction. It is the engine that will move us forward.

The History of Algebra, part II: Unsophisticated vs. Sophisticated Tools

Math ed bloggers love Star Wars. This post is extremely long, and involves a fair amount of math, so in the hopes of keeping you reading, I promise a Star Wars reference toward the end. Also, you can still get the point if you skip the math, though that would be sad.

The historical research project I gave myself this spring in order to prep my group theory class (which is over now – why am I still at it?) has had me working slowly through two more watershed documents in the history of math:

Disquisitiones Arithmeticae
by Carl Friedrich Gauss
(in particular, “Section VII: Equations Defining Sections of a Circle”)


Mémoire sur les conditions de résolubilité des équations par radicaux
by Evariste Galois

I’m not done with either, but already I’ve been struck with something I wanted to share. Mainly it’s just some cool math, but there’s a pedagogically relevant idea in here too –

Take-home lesson: The first time a problem is solved the solution uses only simple, pre-existing ideas. The arguments and solution methods are ugly and specific. Only later do new, more difficult ideas get applied, which allow the arguments and solution methods to become elegant and general.

The ugliness and specificity of the arguments and solution methods, and the desire to clean them up and generalize them, are thus a natural motivation for the new ideas.

This is just one historical object lesson in why “build the machinery, then apply it” is a pedagogically unnatural order. Professors delight in using the heavy artillery of modern math to give three-sentence proofs of theorems once considered difficult. (I’ve recently taken courses in algebra, topology, and complex analysis, with three different professors, and deep into each course, the professor gleefully showcased the power of the tools we’d developed by tossing off a quick proof of the fundamental theorem of algebra.) Now, this is a very fun thing to do. But if the goal is to make math accessible, then this is not the natural order.

The natural order is to try to answer a question first. Maybe we answer it, maybe we don’t. But the desire for and the development of the new machinery come most naturally from direct, hands-on experience with the limitations of the old machinery. And that means using it to try to answer questions.

I’m not saying anything new here. But I just want to show you a really striking example from Gauss. (Didn’t you always want to see some original Gauss? No? Okay, well…)

* * * * *

I am reading a 1966 translation of the Disquisitiones by Arthur A. Clarke which I have from the library. An original Latin copy is online here. I don’t read Latin but maybe you do.

I’m focusing on the last section in the book, but at one point Gauss makes use of a result he proved much earlier:

Article 42. If the coefficients A, B, C, \dots, N; a, b, c, \dots, n of two functions of the form



are all rational and not all integers, and if the product of P and Q is


then not all the coefficients \mathfrak{A}, \mathfrak{B}, \dots, \mathfrak{Z} can be integers.

Note that even the statement of Gauss’ proposition here would be cleaned up by modern language. Gauss doesn’t even have the word “polynomial.” The word “monic” (i.e., leading coefficient 1) would also have been handy. In modern language he could have said, “The product of two rational monic polynomials is not an integer polynomial if any of their coefficients are not integers.”

But this is not the most dramatic difference between Gauss’ statement (and proof – just give me a sec) and the “modern version.” On page 400 of Michael Artin’s Algebra textbook (which I can’t stop talking about only because it is where I learned like everything I know), we find:

(3.3) Theorem. Gauss’s Lemma: A product of primitive polynomials in \mathbb{Z}[x] is primitive.

The sense in which this lemma is Gauss’s is precisely the sense in which it is really talking about the contents of Article 42 from Disquisitiones which I quoted above.


First of all, what’s \mathbb{Z}[x]? Secondly, what’s a primitive polynomial? Third and most important, what does this have to do with the above? Clearly they both have something to do with multiplying polynomials, but…

Okay. \mathbb{Z}[x] is just the name for the set of polynomials with integer coefficients. (Apologies to those of you who know this already.) So a polynomial in \mathbb{Z}[x] is really just a polynomial with integer coefficients. This notation was developed long after Gauss.

More substantively, a “primitive polynomial” is an integer polynomial whose coefficients have gcd equal to 1. I.e. a polynomial from which you can’t factor out a nontrivial integer factor. E.g. 4x^2+4x+1 is primitive, but 4x^2+4x+2 is not because you can take out a 2. This idea is from after Gauss as well.

So, “Gauss’s Lemma” is saying that if you multiply two polynomials each of whose coefficients do not have a common factor, you will not get a common factor among all the coefficients in the product.

What does this have to do with the result Gauss actually stated?

That’s an exercise for you, if you feel like it. (Me too actually. I feel confident that the result Artin states has Gauss’s actual result as a consequence; less sure of the converse. What do you think?) (Hint, if you want: take Gauss’s monic, rational polynomials and clear fractions by multiplying each by the lcm of the denominators of its coefficients. In this way replace his original polynomials with integer polynomials. Will they be primitive?)

Meanwhile, what I really wanted to show you are the two proofs. Original proof: ugly, long, specific, but containing only elementary ideas. Modern proof: cute, elegant, general, but involving more advanced ideas.

Here is a very close paraphrase of Gauss’ original proof of his original claim. Remember, P and Q are monic polynomials with rational coefficients, not all of which are integers, and the goal is to prove that PQ‘s coefficients are not all integers.

Demonstration. Put all the coefficients of P and Q in lowest terms. At least one coefficient is a noninteger; say without loss of generality that it is in P. (If not, just switch the roles of P and Q.) This coefficient is a fraction with a denominator divisible by some prime, say p. Find the term in P among all the terms in P whose coefficient’s denominator is divisible by the highest power of p. If there is more than one such term, pick the one with the highest degree. Call it Gx^g, and let the highest power of p that divides the denominator of G be p^t. (t \geq 1 since p was chosen to divide the denominator of some coefficient in P at least once.). The key fact about the choice of Gx^g is, in Gauss’s words, that its “denominator involves higher powers of p than the denominators of all fractional coefficients that precede it, and no lower powers than the denominators of all succeeding fractional coefficients.”

Gauss now divides Q by p to guarantee that at least one term in it (at the very least, the leading term) has a fractional coefficient with a denominator divisible by p, so that he can play the same game and choose the term \Gamma x^{\gamma} of Q/p with \Gamma having a denominator divisible by p more times than any preceding fractional coefficient and at least as many times as each subsequent coefficient. Let the highest power of p dividing the denominator of \Gamma be p^{\tau}. (Having divided the whole of Q by p guarantees that \tau \geq 1, just like t.)

I’ll quote Gauss word-for-word for the next step:

“Let those terms in P which precede Gx^g be 'Gx^{g+1}, ''Gx^{g+2}, etc. and those which follow be G'x^{g-1}, G''x^{g-2}, etc.; in like manner the terms which precede \Gamma x^{\gamma} will be '\Gamma x^{\gamma+1}, ''\Gamma x^{\gamma+2}, etc. and the terms which follow will be \Gamma'x^{\gamma-1}, \Gamma''x^{\gamma-2}, etc. It is clear that in the product of P and Q/p the coefficient of the term x^{g+\gamma} will

= G\Gamma + 'G\Gamma' + ''G\Gamma'' + etc.

+ '\Gamma G' + ''\Gamma G'' + etc.

“The term G\Gamma will be a fraction, and if it is expressed in lowest terms, it will involve t+\tau powers of p in the denominator. If any of the other terms is a fraction, lower powers of p will appear in the denominators because each of them will be the product of two factors, one of them involving no more than t powers of p, the other involving fewer than \tau such powers; or one of them involving no more than \tau powers of p, the other involving fewer than t such powers. Thus G\Gamma will be of the form e/(fp^{t+\tau}), the others of the form e'/(f'p^{t+\tau-\delta}) where \delta is positive and e, f, f' are free of the factor p, and the sum will


The numerator is not divisible by p and so there is no reduction that can produce powers of p lower than t+\tau.”

(This is on pp. 25-6 of the Clarke translation.)

This argument guarantees that the coefficient of x^{g+\gamma} in PQ/p, expressed in lowest terms, has a denominator divisible by p^{t+\tau}. Thus the coefficient of the same term in PQ has a denominator divisible by p^{t+\tau-1}. Since t and \tau are each at least 1, this means the denominator of this term is divisible by p at least once, and so a fraction. Q.E.D.

Like I said – nasty, right? But the concepts involved are just fractions and divisibility. Compare a modern proof of “Gauss’ Lemma” (the statement I quoted above from Artin – a product of primitive integer polynomials is primitive).

Proof. Let the polynomials be P and Q. Pick any prime number p, and reduce everything mod p. P and Q are primitive so they each have at least one coefficient not divisible by p. Thus P \not\equiv 0 \mod{p} and Q \not\equiv 0 \mod{p}. By looking at the leading terms of P and Q mod p we see that the product PQ must be nonzero mod p as well. This implies that PQ contains at least one coefficient not divisible by p. Since this argument works for any prime p, it follows that there is no prime dividing every coefficient in PQ, which means that it is primitive. Q.E.D.1

Clean and quick. If you’re familiar with the concepts involved, it’s way easier to follow than Gauss’s original. But, you have to first digest a) the idea of reducing everything mod p; b) the fact that this operation is compatible with all the normal polynomial operations; and c) the crucial fact that because p is prime, the product of two coefficients that are not \equiv 0 \mod{p} will also be nonzero mod p.

Now Gauss actually had access to all of these ideas. In fact it was in the Disquisitiones Arithmeticae itself that the world was introduced to the notation “a \equiv b \mod{p}.” So in a way it’s even more striking that he didn’t think to use them here when they would have cleaned up so much.

What bugged me out and made me excited to share this with you was the realization that these two proofs are essentially the same proof.


I’m not gonna spell it out, because what’s the fun in that? But here’s a hint: that term Gx^g that Gauss singled out in his polynomial P? Think about what would happen to that term (in comparison with all the terms before it) if you a) multiplied the whole polynomial by the lcm of the denominators to clear out all the fractions and yield a primitive integer polynomial, and then b) reduced everything mod p.

(If you are into this sort of thing, I found it to be an awesome exercise, that gave me a much deeper understanding of both proofs, to flesh out the equivalence, so I recommend that.)

* * * * *

What’s the pedagogical big picture here?

I see this as a case study in the value of approaching a problem with unsophisticated tools before learning sophisticated tools for it. To begin with, this historical anecdote seems to indicate that this is the natural flow. I mean, everybody always says Gauss was the greatest mathematician of all time, and even he didn’t think to use reduction mod p on this problem, even though he was developing this tool on the surrounding pages of the very the same book.

In more detail, why is this more pedagogically natural than “build the (sophisticated) machine, then apply it”?

First of all, the machine is inscrutable before it is applied. Think about being handed all the tiny parts of a sophisticated robot, along with assembly instructions, but given no sense of how the whole thing is supposed to function once it’s put together. And then trying to follow the instructions. This is what it’s like to learn sophisticated math ideas machinery-first, application-later. I felt this way this spring in learning the idea of Brouwer degree in my topology class. Now that everything is put together, I have a strong desire to go back to the beginning and do the whole thing again knowing what the end goal is. The ideas felt so airy and insubstantial the first time through. I never felt grounded.

Secondly, the quick solution that is powered by the sophisticated tools loses something if it’s not coupled with some experience working on the same problem with less sophisticated tools. The aesthetic delight that professors take in the short and elegant solution of the erstwhile-difficult problem comes from an intimacy with this difficulty that the student skips if she just learns the power tools and then zaps it. Likewise, if the goal is to gain insight into the problem, the short, turbo-powered solution often feels very illuminating to someone (say, the professor) who knows the long, ugly solution, but like pure magic, and therefore not illuminating at all, to someone (say, a student) who doesn’t know any other way. There is something tenuous and spindly about knowing a high-powered solution only.

Here I can cite my own experience with Gauss’s Lemma, the subject of this post. I remember reading the proof in Artin a year ago and being satisfied at the time, but I also remember being unable to recall this proof (even though it’s so simple! maybe because it’s so simple!) several months later. You read it, it works, it’s like poof! done! It’s almost like a sharp thin needle that passes right through your brain without leaving any effect. (Eeew… sorry that was gross.) The process of working through Gauss’ original proof, and then working through how the proofs are so closely related, has made my understanding of Artin’s proof far deeper and my appreciation of its elegance far stronger. Before, all I saw was a cute short argument that made something true. I now see in it the mess that it is elegantly cleaning up.

I’ve had a different form of the same experience as I fight my way through Galois’ paper. (I am working through the translation found in Appendix I of Harold Edwards’ 1984 book Galois Theory. This is a great way to do it because if at any point you are totally lost about what Galois means, you can usually dig through the book and find out what Edwards thinks he means.) I previously learned a modern treatment of Galois theory (essentially the one found in Nathan Jacobson’s Basic Algebra I – what a ridiculous title from the point of view of a high school teacher!). When I learned it, I “followed” everything but I knew my understanding was not where I wanted it to be. Here the words “spindly” and “tenuous” come to mind again. The arguments were built one on top of another till I was looking at a tall machine with a lot of firepower at the very top but supported by a series of moving parts I didn’t have a lot of faith in.

An easy mark for Ewoks, and I knew it.

This version of Galois theory was all based on concepts like fields, automorphisms, vector spaces, separable and normal extensions, of which Galois himself had access to none. The process of fighting through Galois’ original development of his theory and trying to understand how it is related to what I learned before has been slowly filling out and reinforcing the lower regions of this structure for me. Coupling the sophisticated with the less sophisticated approach has given the entire edifice some solidity.

Thirdly, and this is what I feel like I hear folks (Shawn Cornally, Dan Meyer, Alison Blank, etc.) talk about a lot, but it bears repeating, is this:

If you attack a problem with the tools you have, and either you can’t solve it, or you can solve it but your solution is messy and ugly, like Gauss’s solution above (if I may), then you have a reason to want better tools. Furthermore, the way in which your tools are failing you, or in which they are being inefficient, may be a hint to you for how the better tools need to look.

Just as an example, think about how awesome reduction mod p is going to seem if you are already fighting (as Gauss did) with a whole bunch of adding stuff up some of which is divisible by p and some of which is not. What if you could treat everything divisible by p as zero and then summarily forget about it? How convenient would that be?

I want to bring this back to the K-12 level so let me give one other illustration. A major goal of 7th-9th grade math education in NY (and elsewhere) is getting kids to be able to solve all manner of single-variable linear equations. The basic tool here is “doing the same thing to both sides.” (As in, dividing both sides of the equation by 2, or subtracting 2x from both sides…) For the kids this is a powerful and sophisticated tool, one that takes a lot of work to fully understand, because it involves the extremely deep idea that you can change an equation without changing the information it is giving you.

There is no reason to bring out this tool in order to have the kiddies solve x+7=10. It’s even unnatural for solving 4x-21=55. Both of these problems are susceptible to much less abstract methods, such as “working backwards.” The “both sides” tool is not naturally motivated until the variable appears on both sides of the equation. I used to let students solve 4x-21=55 whatever way made sense to them, but then try to impose on them the understanding that what they had “really done” was to add 21 to both sides and then divide both sides by 4, so that later when I gave them equations with variables on both sides, they’d be ready. This was weak because I was working against the natural pedagogical flow. They didn’t need the new tool yet because I hadn’t given them problems that brought them to the limitations of the old tool. Instead, I just tried to force them to reimagine what they’d already been doing in a way that felt unnatural to them. Please, if a student answers your question and can give you any mathematically sound reason, no matter how elementary, accept it! If you would like them to do something fancier, try to give them a problem that forces them to.

Basically this whole post adds up to an excuse to show you some cool historical math and a plea for due respect to be given to unsophisticated solutions. There is no rush to the big fancy general tools (except the rush imposed by our various dark overlords). They are learned better, and appreciated better, if students, teachers, mathematicians first get to try out the tools we already have on the problems the fancy tools will eventually help us answer. It worked for Gauss.

[1] This is the substance of the proof given in Artin but I actually edited it a bit to make it (hopefully) more accessible. Artin talks about the ring homomorphism \mathbb{Z}[x] \longrightarrow \mathbb{F}_p[x] and the images of P and Q (he calls them f and g) under this homomorphism.

ADDENDUM 8/10/11:

I recently bumped into a beautiful quote from Hermann Weyl that I had read before (in Bob and Ellen Kaplan’s Out of the Labyrinth, p. 157) and forgotten. It is entirely germane.

Beautiful general concepts do not drop out of the sky. To begin with, there are definite, concrete problems, with all their undivided complexity, and these must be conquered by individuals relying on brute force. Only then come the axiomatizers and systematizers and conclude that instead of straining to break in the door and bloodying one’s hands one should have first constructed a magic key of such and such a shape and then the door would have opened quietly, as if by itself. But they can construct the key only because the successful breakthrough enables them to study the lock front and back, from the outside and from the inside. Before we can generalize, formalize and axiomatize there must be mathematical substance.

Five Proofs of the Irrationality of Root 5

Recently I’ve had the pleasure of teaching a series of Math and Dinner workshops for the New York Math Circle. The series was on the unique factorization of the integers, aka “the fundamental theorem of arithmetic.” I opened the first session with a problem set intended to get the participants (mostly math teachers) to see how heavily their knowledge about numbers depends on this theorem. One of the problems was to prove the irrationality of root 5. The problem served its purpose during the workshop; participants came up with one proof dependent on unique factorization, and I showed them another as well. But to my surprise, when we went to dinner after the workshop was over, participants showed me three more proofs. Two of them were totally independent of unique factorization. The third involved it, but not in any way I would have expected.

The experience was just a mathematical delight – and I’ll share the proofs in a moment – but it got me thinking about teaching too (doesn’t everything?). I’ve repeatedly made the case (along, I suppose, with NCTM and everyone else) that proof needs to be a much more central part of math education than it is, at every level. This is just an elaboration on that theme. How powerful and illuminating it is to see, and consider simultaneously, multiple proofs of the same result; how each proof shines light on the result from a different angle; how different proofs of the same result may generalize differently and show that several large principles might be at play in one tiny case. How in school I’ve rarely given students two different arguments for a major result, and never asked them to compare the arguments to each other. What a missed opportunity that is.

The closest I came was that when I used to teach Algebra I, I would make three different cases that a^0 should be defined as 1: to fit the pattern of multiplication and division; to fit the equations we derived for exponential growth; and because an empty product ought to be the identity since multiplication by it ought to have no effect. However, I never asked kids to entertain the three arguments simultaneously, or to ask if they shed different light on the conclusion, or if they generalized in different ways.

Proof number one: the Euclid-esque one

This is the proof given by a participant in answer to the problem I posed.

If root 5 were rational it could be written as a fraction a/b in lowest terms, i.e. in which a and b are integers and do not have a common factor other than 1. Then it would be so that

\left(\frac{a}{b}\right)^2 = 5
and thus that

\frac{a^2}{b^2} = 5


a^2 = 5b^2

This would imply that a^2 is a multiple of 5. Since 5 is prime, this implies a is a multiple of 5. Thus a = 5c for some integer c, and

5b^2 = a^2 = 25c^2

Dividing by 5, this means

b^2 = 5c^2

So b^2 is a multiple of 5, and, just as it did for a, this means b is a multiple of 5. But a and b were presumed to lack a common factor other than 1, so this is a contradiction, and the fraction a/b for \sqrt{5} must fail to exist.

The covert reliance on unique prime factorization came out when I pushed the participants to justify the step “if a^2 is a multiple of 5, so is a” and they told me it was because a^2‘s prime factors are exactly a‘s duplicated; thus if 5 is not among a‘s prime factors it cannot be among a^2‘s. This reliance can be removed from the proof by finding another way to argue this point, but this is how the participants did it.

Proof number two: straight from unique factorization

Once we saw the secret reliance on unique factorization in the above proof, I offered this simpler proof (lifted more or less verbatim from Hardy and Wright’s compendiously awesome An Introduction to the Theory of Numbers):

if a/b is the square root of 5, then a^2/b^2 is 5. If a and b are integers, then you can prime factorize the numerator and denominator of this fraction. The whole denominator needs to cancel because the quotient is an integer. Thus all b^2‘s prime factors are found among a^2‘s. But prime factorizations for a^2 and b^2 contain exactly the same primes as the prime factorizations of a and b (only twice as many of each prime). So all b‘s prime factors are found among a‘s, and a/b is an integer. This is a contradiction since 5 is not the square of an integer. This proof generalizes in a natural way to show that it is not possible for any integer that is not a square to have a rational square root.

That’s as much as happened during the workshop. Everything worked perfectly in terms of my pedagogical intention to have the participants recognize how much they needed unique factorization to know what they know about numbers. This turns out to have been sheer luck. At dinner afterwards…

Proof number three: less than

This one was shown to me by Japheth Wood, a math professor at Bard College, who helped to organize the series.

Suppose a/b is the positive square root of 5 and as in proof 1 suppose a and b are positive integers and the fraction is in lowest terms. This means b is the smallest possible denominator for a fraction equal to root 5.

Now, 5 is between 4 and 9 which means that a/b is between 2 and 3. Multiplying by b we find that a is between 2b and 3b, and subtracting 2b we find that a-2b is between 0 and b. In other words, a-2b is a positive integer less than b.

You may be able to see where this is headed. The assumption that a/b is a square root of 5 is going to lead to a representation of root 5 as a fraction with a-2b in the denominator, which is impossible because a/b was assumed to be in lowest terms, so that b was the lowest possible denominator. How this happens is just some algebraic tricksiness:


This last expression was gotten by multiplying a/b by 1 in the form of (a/b-2)/(a/b-2). But

\frac{(a/b)(a/b-2)}{a/b-2} = \frac{(a/b)^2 - 2(a/b)}{a/b-2} = \frac{5-2(a/b)}{a/b-2}

since (a/b)^2 is 5, that’s the whole point of a square root. And now we can multiply numerator and denominator by b to find

\frac{5-2(a/b)}{a/b-2} = \frac{5b-2a}{a-2b}

and poof! All this was equal to root 5; but now we have represented root 5 as a fraction with a denominator lower than the lowest possible denominator for such a fraction. This is a contradiction so our representation as a fraction in the first place was impossible.

Proof number four: way less than

This one was passed on to me from a participant named Barry. (Don’t know the last name. Barry if you happen to read this, claim your credit!)

\sqrt{5}-2 is a positive real number between 0 and 1. Now, consider what happens when you raise it to powers.

On the one hand, since it is positive and less than 1, it will get arbitrarily small. (I.e. given any positive number, if you raise \sqrt{5}-2 to a high enough power, it will be less than that number.)

But also, it will have the form (integer) + (integer)*\sqrt{5}.
For example:

\left(\sqrt{5}-2\right)^2 = 5 - 2\cdot 2\cdot\sqrt{5} + 4 = 9-4\sqrt{5}


\left(\sqrt{5}-2\right)^3 = \left(\sqrt{5}-2\right)\left(9-4\sqrt{5}\right)

= 9\sqrt{5} - 18 - 20 + 8\sqrt{5} = -38+17\sqrt{5}

This is happening because every pair of \sqrt{5}‘s in the product make an integer. Thus, \sqrt{5}-2 to any power must have the form m + n\sqrt{5}, where m and n are integers.

All this is actually true about the square root of 5.

Now suppose root 5 could be written as a fraction a/b with a and b integers. (This time we don’t have to assume lowest terms!) Then any power of \sqrt{5}-2 would have the form m + n\frac{a}{b} = \frac{mb+na}{b}.

The numerator of this fraction is an integer. The denominator is b. This means the smallest positive number it can be is 1/b (or, if for some crazy reason you decided to use a negative b, then -1/b). Thus \sqrt{5}-2 to any power would be greater than or equal to 1/b. But we already know it is capable of getting arbitrarily small by taking a high enough power, since \sqrt{5}-2 is positive and less than 1. This contradiction proves that the square root of 5 can’t be a fraction.

Proof number five: unique factorization bonanza

This proof was relayed to me by Ted Diament. It’s more technical than the others; sorry about that. It’s using equipment much more powerful than is needed for the task. It is extremely cute though.

This one relies extra much on unique factorization. In fact, not only unique factorization of the integers, but unique factorization of the set of integer polynomials! Like integers, integer polynomials have factorizations into prime (irreducible) polynomials that are unique except for sign. (This fact follows from Gauss’ Lemma.) For example,

6x^2 - 18x+12

factors into


with each factor irreducible in the sense that it can’t be factored further (except in pointless ways like 3=-1\cdot-3). This factorization is unique in the sense that if you try to factor the original polynomial a different way, you will end up with the same set of factors, except possibly for some pairs of negative signs.

Anyway. Suppose root 5 were rational and equal to a fraction a/b in lowest terms. Then a^2 would equal 5b^2, and so the integer polynomial

b^2x^2 - a^2

would equal

b^2x^2 - 5b^2

Now the first of these factors into


while the second one factors into


But since they are equal, this violates the unique factorization of integer polynomials. Things would be fine if we could keep factoring both sides till they were identical (up to sign), but this isn’t possible: since a/b was in lowest terms, a and b lack a common factor, so that ax-b and ax+b are irreducible. Also, x^2 - 5 is clearly irreducible since 5 is not an integer square. So the two factorizations are irrevocably distinct. Since integer polynomials only factor one way (up to sign), it must be that root 5 wasn’t rational after all.

I hope you enjoyed these. When they were shown to me I was just delighted by how different they all feel. I’d never seen anything except the first two, which feel very number-theory-ish to me. The last two feel much more algebraish. Proof number four even has a whiff of calculus what with all that “arbitrarily small” business. And all that is going on in the single fact of root 5’s irrationality!

Addendum, 3/25/13

Since for some reason this post continues to get a fair amount of traffic, I’ve got one more to add! This one occurred to me the other day as a purely algebraic reformulation of proof number four above (“way less than”). It’s higher-tech than any of the above, so apologies for that. I am adding this note in a hurry so I am not going to try to gloss the advanced concepts, so apologies for that too.

The ring \mathbb{Z}[\sqrt{5}] is a finitely generated module over \mathbb{Z}. In fact, it is generated by 1 and \sqrt{5}. This is just another way of saying that everything that can be produced from integers and \sqrt{5} out of +,-,\times has the form a+b\sqrt{5} with a,b integers. This is a slight generalization of what was noted above in proof 4.

Meanwhile, if a is any rational number that is not an integer, the ring \mathbb{Z}[a] can never be finitely generated as a \mathbb{Z}-module, because it contains arbitrarily large denominators since a has some nontrivial denominator and \mathbb{Z}[a] contains all a‘s powers. Again, this is essentially what was noted in proof 4. Putting these two paragraphs together, it follows that \sqrt{5} can’t be rational.

Ultimately the reason \mathbb{Z}[\sqrt{5}] is finitely generated as a \mathbb{Z}-module is that \sqrt{5} is an algebraic integer; thus this argument shows that an algebraic integer can never be rational unless it is an actual ordinary integer.

Pattern Breaking

A while ago I put out a call for problems and situations in which the initial data suggests some pattern that is not actually what’s going on. The idea was, most students both fail to appreciate the need for proof in math and also are hamstrung in their ability to actually create proofs, by the fact that they’re used to patterns in math class always working. They see the pattern work twice and that’s good enough for them. We exacerbate this situation whenever we treat one or two examples as sufficient grounds for believing something, without a conversation about why they’re working. (Which is an exceedingly common practice that I too have been guilty of; but people, we need to stop, immediately and completely.) This goes on for 10 years and then they get to geometry class, where suddenly we ask them to start “writing proofs,” which they can’t do and don’t see the point of, since up until this point a few examples have been treated as sufficient grounds to believe. (I’ve written in much more detail about these issues here and here.)

So, I wanted to create a repository of problems and mathematical situations that could be used to freak students out and snap them out of the “it works twice it must be true” stupor, by suggesting a pattern that actually fails. I proposed a few; JD2718 suggested a few; and the real jackpot came from James Tanton, who devoted an issue of his newsletter to such situations back in 2006. Here is a pdf of the newsletter. Thank you Sue VanHattum for forwarding me this. Also, although I’d heard about James Tanton’s awesomeness from Bob and Ellen Kaplan, I hadn’t actually seen his website before today and it is clearly fresh, so check it out.

Here are my favorites from the repository as it stands now:

*The points on a circle problem. I described it in the same post as the call for problems. Here is Tanton’s visual:
Points on a Circle
This is still the best one.

*JD2718’s powers of eleven:
Rows of the Pascal’s Triangle?
I like this one as a counterpoint because the pattern is not completely wrong.

*Tanton’s powers of 3 idea:

three to the … power equals number of digits
1 3 1
2 9 1
3 27 2
4 81 2
5 243 3
6 729 3
7 2,187 4
8 6,561 4

Could you use this pattern to predict the number of digits in a high power of 3?

All three of the above examples are awesome not only because the obvious pattern fails but because the real story is within the scope of many high school classes. Most of what follows does not have this virtue; but it’s still exciting to me because the patterns that fail, and the questions you could ask about them, are still accessible.

Various gems from Tanton:

*6th roots:

6th root of equals
2 1.12246…
3 1.20093…
4 1.25992…
5 1.30766…
6 1.34800…
7 1.38308…

Look at the pattern in the first number after the decimal point: 1, 2, 2, 3, 3, 3, … According to Tanton (I haven’t done the calculation myself), the pattern continues for a while: four 4’s, five 5’s, six 6’s.

*Factors of factorials:
1! has 1 factor
2! has 2 factors (1 and 2)
3! has 4 factors (1, 2, 3 and 6)
4! has 8 factors (1, 2, 3, 4, 6, 8, 12, 24)
5! has 16 factors (1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60, 120)

*First Fibonacci sneakiness:
How many ways can n be represented as a sum of 1’s, 3’s and 5’s, distinguishing different orders?
1 – only one way: 1=1
2 – only one way: 2=1+1
3 – two ways: 3=1+1=3
4 – three ways: 4=1+1+1+1=1+3=3+1
5 – five ways: 5=1+1+1+1+1=1+1+3=1+3+1=3+1+1=5
6 – eight ways: 6=1+1+1+1+1+1=1+1+1+3=1+1+3+1=1+3+1+1=3+1+1+1=1+5=5+1=3+3
(Challenge question: how many ways can n be represented as a sum of any odd natural numbers, distinguishing different orders? Does this fix the Fibonacci pattern? Can you prove or disprove it?)

*Second Fibonacci sneakiness:
For n≥5, how many ways can you write n as a sum of natural numbers including at least one 5, this time disregarding order?
5 – only one way: 5=5
6 – only one way: 6=1+5
7 – two ways: 7=1+1+5=2+5
8 – three ways: 8=1+1+1+5=1+2+5=3+5
9 – five ways: 9=1+1+1+1+5=1+1+2+5=2+2+5=1+3+5=4+5
(Tanton claims that 10 can be so written 8 ways, but that the pattern breaks after this. I only find that 10 can be written 7 ways. Don’t such representations of n correspond in a bijective way with partitions of n-5? And p(10-5)=p(5)=7, right? In any case, with a class of kids who know the Fibonacci sequence, 1, 1, 2, 3, 5 should be enough to get them stuck on it.)

Funny business with primes:

These ones are easy to come up with because the primes just don’t behave. They are also somewhat harder to make into workable lessons because testing primality can be hard for students, even with numbers in the 2-digit range.

*I made a brainstorm for a lesson based on trying to find formulas for primes here. The punchline is Euler’s beautiful n^2 – n + 41, which Tanton also gives a version of, which produces primes for n=1,…,40 but fails (why?) for n=41.

*In the pdf linked above, Tanton gives several “prime-generating formulas.” None of them fail before you reach numbers too large for students to test primality, so I don’t think they could be used easily in school. However, two of them are just too cute not to mention:
31, 331, 3331, 33331, 333331, and 3333331 are all prime!
3!-2!+1! = 5
4!-3!+2!-1! = 19
5!-4!+3!-2!+1! = 101
6!-5!+4!-3!+2!-1! = 619
7!-6!+5!-4!+3!-2!+1! = 4,421
All prime!

Lastly, JD2718 gives an idea that is inspiring my creativity in this vein:

*Even numbers like 10 and 32 have an even number of factors, while odd numbers like 49 and 81 have an odd number of factors. Right??

I like this because it actually gives us a template for making up many more such fake patterns: it doesn’t have to start at the beginning and the break on the way up, as all of the above do; maybe you just need to pick the numbers carefully. (E.g., 33, 69, 90… multiples of 3 must have all their digits be multiples of 3, right?)

* * * * *

A while back I mentioned that the awesome Catherine Twomey Fosnot was working on a book about algebra with mathematician Bill Jacob. It’s out. I just ordered it. (I tried to get a friend who went to NCTM to pick me up a copy at the publisher’s booth, but it was sold out before she got there.) It seems to be focused on the early grades, like the rest of the Young Mathematicians at Work series, which is a little disappointing to me since I was hoping for something directly applicable to Algebra I. But I’m sure it’s very thought-provoking and look forward to its arrival.

What’s In the Way of Making Students Prove, part I

In response to my last post, Kate Nowak raised the most important set of issues I can think of:

I will absolutely stipulate to all of this:

This is why insisting on too much formality too early is bad for people who are learning how to prove… If someone is insisting on formality from you when you don’t have any reason to doubt something less carefully argued, you will get the idea that proof has nothing to do with what makes sense to you, what you find convincing. But you can’t produce a proof without being guided by this.

All of this adds up to the case I’ve made before, that saying “prove that such-and-such is true” is the wrong problem… The “contract” that says we are supposed to give them a chance to work on “proof” as opposed to something else. If they also have to figure out what is even true, that could feel like we’re asking them to do more than just prove something. The problem is that they will never learn how to prove something if we don’t ask them for more.

But I need some SERIOUS training in how exactly one goes about teaching that way. I wasn’t taught that way and none of my colleagues teaches that way. Sometimes I feel like I get close, because I make the kids investigate and measure and conjecture (today, for example: median of a trapezoid), but then I stop before asking them to prove it. Or I do say something like “Why should that have to be true? Can we come up with some kind of explanation?” But they have no idea how to even start and it feels unfair and scary to ask them to. It would not occur to them to draw a picture and extend the legs and think about similar triangles, in a zillion years.

Thank you Kate for getting this conversation started. There are 2 very big questions here:

1) Why don’t they know how to start?
2) Why does it feel unfair and scary to ask them to?

These questions are bigger than me, but here are my 2 cents. I’m posting my thoughts on question (1) now because I wanted to get this up. I’ll post on question (2) tomorrow or Friday.

I have 3 thoughts to offer about why they don’t know how to start.

a) Inexperience on the students’ part.
b) Failure of the question to hook into the natural processes students need to use to actually prove things.
c) The vicious cycle.

(a) A big reason they don’t know how to start is that in most cases, no one has asked them for this type of thing before (at least not with any follow-through – more in (c)). If they’re in high school, they’ve had a long time to formulate an idea of what is going to be asked of them in math class, and typically this isn’t it. The content for which they’re being asked to seek justification has 10 years of increases in sophistication since kindergarden, but most students’ development in terms of the creative act of seeking justification is still at the kindergarden level. The later the first occasion when the students are asked to find justification, the harder it will be for them.

People who have been consistently made to justify their mathematical beliefs for a long time know how to start. I know this from private tutoring. If I have sufficient time with a kid, I have the luxury of requiring them to find a reason for every piece of content they learn in school. (I acknowledge readily that this is a luxury, and I only have it if the parents and student have made a sufficient investment of time in tutoring.) The task quickly ceases to be disorienting when it’s required every time they learn any fact. In a recent comment JD2718 wrote about the need to “play math” (meaning, in context, seeking explanations and counterexamples for patterns in numbers) well before a proof course. This is the same point. Seeking explanations and counterexamples is the main activity of research mathematicians. You can’t go from 0 to 60 on this practice; you have to start out slow and ease into it.

Consequently I think it’s totally essential that justification infuse math learning from K on up. (As I said elsewhere, I think that the commutativity of multiplication should be treated as a major theorem needing a thoughtful proof.) This is going to require some major PD for elementary teachers; I actually would love to run some of this PD. Anyway, the fact that this is not the current state of the art adds up to big problems for high school teachers who try to do something more authentic and creative with proof after at least 9 years of schooling during which a typical student has never or hardly ever been asked to come up with a reason to back up their belief. (At least, not with follow-through – again, more in (c).) Why would it ever occur to them to extend the legs? Think about similar triangles? The act of coming up with a proof is essentially creative. You don’t just get creative on the spot in a domain in which you’ve never created before.

(b) Another variable is the way the question is posed and the expectation for what will count as an answer. Especially when you’re first learning about proof, a request for justification has to hook into some natural processes in order for you to respond to it effectively. If you’re an experienced prover, you can hook a problem into these processes intentionally, but when you’re starting out, you can’t. The problem has to be posed just-so to make your natural processes connect to it.

Here’s what I’m getting at: seeking an explanation for something that puzzles you is a natural act. Accounting for your belief is another one. Outside of math class, if you assert a belief and somebody says, “why do you think so?” you’ll probably answer the question fluently. Coherent or not, you won’t be disoriented by the question – you’ll have something to say. And if something is actually puzzling you, you are actually slightly irritated until you’ve made the whole thing resolve itself in one way or another. (Some of us are in the habit of shrugging and going “well, that’s just a mystery of life.” This is one way to make it resolve. Even so, I don’t believe this is anyone’s first response to puzzlement, at least in non-academic contexts, unless there is a lot of weed involved.)

The act of mathematical proof is supposed to plug into these two natural processes. They are what give you both motivation for sticking with the question, and direction in searching for an answer. But in math class, justification questions often fail to hook into these processes. The big example is what I keep harping on: if the problem is “prove X,” and you’re new to this game, you already know X because the question implicitly told you it. Your “this is bothering me” process is missed entirely since there’s no uncertainty anywhere, and your “why do you think so?” process can only be honestly answered by “you told me to prove it,” which doesn’t count as a proof. It’s better if you didn’t start out knowing the right answer, but even here, the question can fail to connect to these processes. For example, suppose you are measuring something, and you notice a pattern, and you make a conjecture about it. If you’ve never seen a pattern in math class hold for 5 cases and fail later, then all you have to do is notice the pattern and you’re satisfied. Again, nothing is bothering you because you don’t have any uncertainty. Meanwhile, the honest answer to “why do you think so?” is “it worked 5 times.” This is one reason why it’s important for teachers at all levels not to act like something is established fact once it has been noticed as a pattern, and why I’m still hoping people contribute to my earlier call for problems that involve a pattern holding for the first few cases and failing later. (Eventually if I get enough stuff I’ll put the brainstorm in a single post. Thanks to all who have contributed so far.)

The question is most effective if it taps these processes. One way to do it: pose a problem – not a proof problem, just a figure-it-out problem – that the students don’t already know a process for, but that’s easy enough for them to find a solution. Then pose another problem that can be solved the same way. Then another one. Pretty soon the students have developed an algorithm. Now, the question “how do you know that works?” taps the “why do you think so?” process. The students do think so, for some sort of mathematical reasons, because they themselves devised the algorithm, in response to the original problem. They can come up with a good answer. I had a great time doing this very thing with a tutoring client two weeks ago. She’d just learned how to FOIL. I posed to her some simple factoring problems, such as x^2+5x+6. They were all reducible monic quadratics. She came up with one of the standard methods, totally on her own: “This really isn’t that hard. You just think of all the numbers that multiply to the last number, and you see which ones add to the middle number.” I asked her why that would work and she didn’t miss a beat, since the whole thing was her idea in the first place. She was a total Clever Hans when I started working with her two years ago. Yes, I’m proud of both of us. I botched the very next move, though – more in the forthcoming followup post.

To tap the other process you have to generate some sort of cognitive dissonance for the students. Ideally, I would like my students to experience cognitive dissonance the minute they see a pattern that is not yet explained – why (the hell) is this happening? In my experience it helps to have this attitude myself. (E.g. sometimes I act kind of paranoid when something happens 2 times without being proven, and increasingly agitated if it happens a 3rd time.) But this is only cultivated over time. To generate cognitive dissonance in students who don’t already care about justification, they have to see something happen that contradicts their intuition. JD2718 recently made a recent suggestion that strikes me as along these lines (I’ve never tried this one myself):

“9 has 3 factors (1, 3, 9). 6 has four (1, 2, 3, 6). So, even numbers have even numbers of factors, and odd numbers have odd numbers of factors. Right?”

This conclusion would appeal to the intuition of just about every student I’ve taught. I’d probably have to prod them to even get them to test another example because they’d already be convinced. If I do get them to try another one, I’ll make sure the first one I ask about fits the pattern (say 10, or 25) – so they’ll be even more convinced. In that context, the first counterexample they come across is going to bother them. Then the “what (the hell) is going on here?” process is engaged.

At any rate, these two processes, or the like, are needed to orient students as they try to prove things. I think a big part of why students flop when we ask them to justify is that the question fails to hook students into these processes.

(c) One final thought about why they have a hard time proving: there’s a vicious cycle at work here. They already do have a hard time. That means we all know that if we try to get them to do it, it’s going to take a really long time, or they’re going to fail miserably, or both. In that context, it’s very hard to take this on. It’s even harder to follow through and really make sure it happens, and not to cop out in some way (for example, as in the article I wrote about last time). But this just exacerbates the situation I described in (a) above. Often the students in front of us are so inexperienced not because nobody has ever contemplated trying to get them to prove anything before but because many people have contemplated it and then opted not to, or tried it and then given up. Ultimately the task felt unfair in some way. This is the perfect segue into question (2), which I’ll engage in a followup post in the next day or two.