Good Brawls and Honoring Kids’ Dissatisfaction

I was just reading some old correspondence with a friend J who periodically writes me regarding a math question he and his son are pondering together. The exchange was pretty juicy, about how many ways can an even number be decomposed as a sum of primes. But actually, the juiciest thing we got into was this:

Is 1 a prime number?

It was kind of a fight! Since I and Wikipedia agreed on this point (it’s not prime), J acknowledged we must know something he didn’t. But regardless, he kind of wasn’t having it.

Point 1: This is awesome.

Nothing could be better mathematician training than a fight about math. Proofs are called “arguments” for a reason.

When I went to Bob and Ellen Kaplan’s math circle training in 2009, I was heading to do a practice math circle with some high schoolers and Bob asked me, “what question are you opening with?” I said, “does .9999…=1?” He smiled with knowing anticipation and said, “oooh, that one always starts a brawl.”

Well, it wasn’t quite the bloodbath Bob led me to expect, but the kids were totally divided. One kid knew the “proof” where you go


Multiplying by 10,



9 = 9x

so x=1

and the other kids had that same sort of feeling like, “he knows something we don’t know,” but they weren’t convinced, and with only a minimal amount coaxing, they weren’t shy about it. The resulting conversation was the stuff of real growth: everybody in the room was contending with, and thereby pushing, the limits of their understanding. Even the boy who “knew the right answer” began to realize he didn’t have the whole story, as he found himself struggling to be articulate in the face of his classmates’ doubt.

Now this could have gone a completely different way. It’s common for “0.999… = 1” to be treated as a fact and the above as a proof. Similarly, since the Wikipedia entry on prime numbers says, “… a natural number greater than 1 that has no positive divisors…,” we could just leave it at that.

But in both situations, this would be to dishonor everyone’s dissatisfaction. It is so vital that we honor it. Everybody, school-aged through grown-up, is constantly walking away from math thinking “I don’t get it.” This is a useless perspective. Never let them say they don’t get it. What they should be thinking is that they don’t buy it.

And they shouldn’t! If it wasn’t already clear that I think the above “proof” that 0.999…=1 is bullsh*t, let me make it clear. I think that argument, presented as proof, is dishonest.

I mean, if you understand real analysis, I have no beef with it. But at the level where this conversation is usually happening, this is not a proof, are you kidding me?? THE LEFT SIDE IS AN INFINITE SERIES. That means to make this argument sound, you have to deal with everything that is involved with understanding infinite series! But you just kinda slipped that in the back door, and nobody said anything because they are not used to honoring their dissatisfaction. As I have pointed out in the past, if you ignore all the series convergence issues, the exact same argument proves that …999.0=-1:


Dividing by 10,



-.9 = .9x

so x=-1

If you smell a rat, good! My point is that that same rat is smelling up the other proof too. We need to have some respect for kids’ minds when they look funny at you when you tell them 0.999…=1. They should be looking at you funny!

Same thing with why 1 is not a prime. If a student feels like 1 should be prime, that deserves some frickin respect! Because they are behaving like a mathematician! Definitions don’t get dropped down from the sky; they take their form by mathematicians arguing about them. And they get tweaked as our understanding evolves. People were still arguing about whether 1 was prime as late as the 19th century. Today, no number theorist thinks 1 is prime; however, in the 20th century we discovered a connection between primes and valuations, which has led to the idea in algebraic number theory that in addition to the ordinary primes there is an “infinite” prime, corresponding to the ordinary absolute value just as each ordinary prime corresponds to a p-adic absolute value. Now for goodness sakes, I hope you don’t buy this! With study, I have gained some sense of the utility of the idea, but I’m not entirely sold myself.

To summarize, point 2: Change “I don’t get it” to “I don’t buy it”.

Now I think this change is a good idea for everyone learning mathematics, at any level but especially in school, and I think we should teach kids to change their thinking in this way regardless of what they’re working on. But there is something special to me about these two questions (is 0.999…=1? Is 1 prime?) that bring this idea to the foreground. They’re like custom-made to start a fight. If you raise these questions with students and you are intellectually honest with them and encourage them to be honest with you, you are guaranteed to find that many of them will not buy the “right answers.” What is special about these questions?

I think it’s that the “right answers” are determined by considerations that are coming from parts of math way beyond the level where the conversation is happening. As noted above, the “full story” on 0.999…=1, in fact, the full story on the left side even having meaning, involves real analysis. We tend to slip infinite decimals sideways into the grade-school/middle-school curriculum without comment, kind of like, “oh, you know, kids, 0.3333…. is just like 0.3 or 0.33 but with more 3’s!” Students are uncomfortable with this, but we just squoosh their discomfort by ignoring it and acting perfectly comfortable ourselves, and eventually they get used to the idea and forget that they were ever uncomfortable.

Meanwhile, the full story on whether 1 is prime involves the full story on what a prime is. As above, that’s a story that even at the level of PhD study I don’t feel I fully have yet. The more I learn the more convinced I am that it would be wrong to say 1 is prime; but the learning is the point. If you tell them “a prime is a number whose only divisors are 1 and itself,” well, then, 1 is prime! Changing the definition to “exactly 2 factors” can feel like a contrivance to kick out 1 unfairly. It’s not until you get into heavier stuff (e.g. if 1 is prime, then prime factorizations aren’t unique) that it begins to feel wrong to lump 1 in with the others.

I highlight this because it means that trying to wrap up these questions with pat answers, like the phony proof above that 0.999…=1, is dishonest. Serious questions are being swept under the rug. The flip side is that really honoring students’ dissatisfaction is a way into this heavier stuff! It’s a win-win. I would love to have a big catalogue of questions like these: 3- to 6-word questions you could pose at the K-8 level but you still feel like you’re learning something about in grad school. Got any more for me?

All this puts me in mind of a beautiful 15-minute digression I witnessed about 2 years ago in the middle of Jesse Johnson’s class regarding the question is zero even or odd? It wasn’t on the lesson plan, but when it came up, Jesse gave it the full floor, and let me tell you it was gorgeous. A lot of kids wanted the answer to be that 0 is neither even nor odd; but a handful of kids, led by a particularly intrepid, diminutive boy, grew convinced that it is even. Watching him struggle to form his thoughts into an articulate point for others, and watching them contend with those thoughts, was like watching brains grow bigger visibly in real time.

Honor your dissatisfaction. Honor their dissatisfaction. Math was made for an honest fight.

p.s. Obliquely relevant: Teach the Controversy (Dan Meyer)


Elementary Mathematics from an Advanced Standpoint

Another of the many reasons I’m in grad school. I benefit as a teacher from understanding the content I teach in way more depth than I teach it. (I think everybody does, but it’s easiest to talk about myself.)

This does a number of things for me. The simplest is that it makes the content more exciting to me. Something that previously seemed routine can become pregnant with significance if I know where it’s going, and there’s a corresponding twinkle that shows up in my eye the whole time my students are dealing with it. A second benefit is that it gives me both tools and inspiration to find more different ways of explaining things. A third is that it helps me see (and therefore develop lessons aimed at) connections between different ideas.

So, this post is a catalogue of some insights that I’ve had about K-12 math that I’ve been led to by PhD study. The title of the post is a reference to Felix Klein’s classic books of the same name. The catalogue is mostly for my own benefit, and I don’t have all that much time, so I’m going to try to suppress the impulse to fully explain some of the more esoteric vocabulary, but I never want to write something here that requires expert knowledge to avoid being useless, so I’ll try to be both clear and pithy. (Wish me luck.)

Elementary level: Multiplication is function composition.

I’m developing the opinion that it’s important for especially middle and high school teachers to have this language. The upshot is that in addition to the usual models of multiplication as (a) repeated addition and (b) arrays and area of rectangles (and, if you’re lucky, (c) double number lines), multiplication is also the net effect of doing two things in a row, such as stretching (and possibly reversing) a number line.

The big thing I want to say here is that understanding this is key to understanding multiplication of signed numbers. I would go so far as to wager that anybody who feels they know intuitively why -\cdot -=+ understands it on some level, consciously or not.

When somebody asks me why a negative times a negative is a positive, I have often had the inclination to answer with, “well, what’s the opposite of the opposite of something?” (I have seen many teachers use metaphors with the same upshot.) The problem is that if you understand multiplication only as repeated addition and as the the area of rectangles, I’ve changed the subject with this answer. It is a complete nonsequitur. It’s probably clear why it has to do with negatives but why does it have to do with multiplication?

On the other hand, if on any level you realize that one meaning of 2\times 3 is “double then triple”, then it’s natural for (-2)\times(-3) to mean “double and oppositize, then triple and oppositize.” But for this you had to be able to see multiplication as “do something then do something else.”

Algebra I and Algebra II: Substitution is calculation inside a coordinate ring.

I just realized this today, and that’s what inspired this blog post. So far, I’m not sure the benefit of this one to my teaching beyond the twinkle it will bring to my eye, though perhaps that will become clear later. It’s certainly helping me understand something about algebraic geometry. The basic idea is this: say you’re finding the intersections of some graphs like y=3x+5 and 2x+y=30. You’re like, “alright, substitute using the fact that y=3x+5. 2x+(3x+5)=30, so 5x+5=30…” and you solve that to find x=5, for an intersection point of (5,20). A way to look at what you’re doing when you make the substitution y=3x+5 is that you’re working in a special algebraic system determined by the line y=3x+5, in particular the (tautological) fact that on this line, y is exactly three x plus five. In this system, polynomials in x or y alone work the usual way, but polynomials in x and y both can often be simplified using the relation y=3x+5 connecting x and y. This algebraic system is called “the coordinate ring of the line y=3x+5.”

I can’t tell if it will even seem that I’ve said anything at all here. The point, for me, is just a sublte shift in perspective. I imagine myself sitting on the line y=3x+5; then this line determines an algebraic system (the coordinate ring) which, as long as I’m on the line, is the right system; and when I substitute 3x+5 for y, what I’m doing is using the rules of that system.

Calculus: The chain rule is the functoriality of the derivative.

“Functoriality” is a word from category theory which I will avoid defining. The point is really about the chain rule. The main ways the derivative is presented in a first-year calculus class are as speed, or rate of change, on the one hand (like, you’re always thinking of the independent variable as time, whatever it really is), and the slope of the tangent line of a graph, on the other. There is a third way to look at it, which I learned from differential geometry. If you look at a function as a mapping from the real line to itself, then the derivative describes the factor by which it stretches small intervals. For example, f(x)=x^2 has a derivative of 6 at x=3. What this is saying is that very small intervals around x=3 get mapped to intervals that are about 6 times as long. (To illustrate: the interval [3,3.01] gets mapped to [9,9.0601], about 6 times as long.)

Seen in this way, the strange formula [f(g(x))]'=f'(g(x))\cdot g'(x) for the chain rule becomes the only sensible way it could be. The function f(g(x)) is the net effect of doing g to x and then doing f to the answer g(x). If I want to know how much this function stretches intervals, well, when g is applied to x they are stretched by a factor of g'(x). Then when f is applied to g(x) they are stretched by a factor of f'(g(x)). (Note it is clear why you evaluate f' at g(x): that is the number to which f got applied.) So you stretched first by a factor of g'(x) and then by a factor of f'(g(x)); net effect, f'(g(x))\cdot g'(x), just like the formula says.

(As an aside, for the sake of being thematic, note the role here of the fact that the multiplication comes from the composition of the two stretches – multiplication is function composition. When I say “the derivative is functorial” what I really mean is that it turns composition of functions into composition of stretches.)

Calculus: The intermediate value theorem is f(\text{connected})=\text{connected}. The extreme value theorem is f(\text{compact})=\text{compact}.

This is a good example of what I was talking about at the beginning about the twinkle in my eye, and connections between ideas. When I used to teach AP calculus, the extreme value theorem and the intermediate value theorem were things I had trouble connecting to the rest of the curriculum. They were these miscellaneous, intuitively obvious factoids about continuous functions that were stuck into the course in awkward places. They both had the same clunky hypothesis, “if f is a function that is continuous on a closed interval [a,b]…” I didn’t do much with them, because I didn’t care about them.

I started to see a bigger picture about three years ago, in a course for calculus teachers taught by the irrepressible Larry Zimmerman. He referred to that clunky hypothesis as something to the effect of “a lilting refrain.” I was also left with the image of a golden thread weaving through the fabric of calculus but I’m not sure if he said that. The point is, he made a big deal about that hypothesis, making me notice how thematic it is.

Last year when I taught a course on algebra and analysis, having benefited from this education, I made these theorems important goals of the course. But something further clicked into place this fall, when I started to need to draw on point-set topology knowledge as I studied differential geometry. Two fundamental concepts in topology are compactness and connectedness. They have technical definitions for which you can follow the links. Intuitively, connectedness is what it sounds like (all one piece), and compactness means (very loosely) that a set “ends, and reaches everywhere it heads toward.” (A closed interval is compact. The whole real line is not compact because it doesn’t end. An open interval is not compact because it wants to include its endpoints but it doesn’t. A professor of mine described compactness as, “everything that should happen [in the set] does happen.”)

Two basic theorems of point-set topology are that under a continuous mapping, the image of any connected set is connected and the image of any compact set is compact. These theorems are very general: they are true in the setting of any map between any two topological spaces. (They could be multidimensional, curved or twisted, or even more exotic…) What I realized is that the intermediate value theorem is just the theorem about connectedness specialized to the real line, and the extreme value theorem is just the theorem about compactness. What is a compact, connected subset of \mathbb{R}? It is precisely a closed interval. Under a continuous function, the image must therefore be compact and connected. Therefore, it must attain a maximum and minimum, because if not, the image either “doesn’t end” or “doesn’t reach its ends,” either of which would make it noncompact. And, for any two values hit by the image, it must hit every value between them; any missing value would disconnect it. So, “if f is a function that is continuous on a closed interval [a,b]…”


So I’m teaching this course this year. It’s for the math faculty of a high school. It’s called:

MA600 Algebra and Analysis with Connections to the K-12 Curriculum

I am unspeakably excited, and want to do the best job possible.

The class: 7 teachers, deeply committed to kids, serious, not real talkative, rightly protective about their time, which is in short supply, but eager to get sh*t done.

The content: Basically, all of mathematics, seen as a unified whole.

It’s met twice. The second class was last Thursday. I need to get my thoughts sorted out here. I’m expecting this to help me visualize the next moves more clearly, just by doing it, but I’d love your thoughts too.

I didn’t really know anything about the mathematical background of the group when I wrote the syllabus, so for the first class I gave them a getting-to-know-you problem set with a wide range of problems and just let them work the whole time. Magically the experience of watching folks work on the problems and then later looking at what they did on paper gave me just enough information to plan the direction of the class’ first unit. We’re beginning with analysis. My first goal: the \epsilon\delta definition of the limit. (I.e., the definition of the limit, for the snobby among you.) My second: the completeness axiom.

The plan: generate the need to define the limit by working with 2 everyday concepts that are actually limits. Namely, infinite decimals, and instantaneous speed. My hope is that by pressing on these concepts, we’ll see that in spite of our familiarity with them, we don’t actually understand them unless we have a precise way to talk about limits. Then, develop the definition out of the need to fully understand the familiar. Then, develop the completeness axiom out of the desire to make sure infinite decimals have a limit.

Here’s what we did:

I opened class with a problem set designed to get them thinking about the meaning of decimals in particular, and various other contexts for the idea of limits. I shamelessly bit the format from PCMI. The problems span a wide range of skills and I didn’t leave enough time to do them all, so people could attack problems appropriate to their skill level. This is now my favorite way to differentiate problem sets, a propos of a) using it in some NYMC workshops last year, and b) hearing about how wonderful it was for everyone at PCMI.

Then, since we are all just getting to know each other, I did a short presentation on the mindset I wanted us to be in:

(Scribd did not handle slide 6 very well, which is too bad because I was proud of that slide. This is my first PowerPoint presentation ever. Actually I did it in Keynote.)

Then, we got to business. I put this up:

I asked them to talk about it with their tables. (I had them in 2 pairs and a group of three, in three tables in a horseshoe shape in front of the board. I like this and think I’ll keep it. Easy transitions from pair/group to whole-class; tables feel separate enough so you don’t feel like your conversation with your partner is in front of everybody; but everyone’s close enough so we can all talk. On day 1 I put us all around one table, for a sense of collegiality and common purpose, but it was too close; you couldn’t discreetly check in with your neighbor, for example.)

There was a widespread sense of mathematical discomfort, and rightly so. Infinite decimals enter most people’s math educations with no attention to the fact that they actually violate everything you’ve learned about math up to that point. You don’t get the full story until analysis, but unless you really get intimate with and own that content, you probably don’t connect what you learn there to what your teacher introduced without comment somewhere between 3rd and 7th grade, as though it weren’t a mind-boggling idea. “When you expand 1/3 as a decimal, the 3’s just keep going.” Or, “3.14159… It never ends or repeats.” Um, excuse me? It NEVER ENDS?

So it’s no surprise everybody has an underdeveloped idea of infinite decimals, and therefore that objects like 0.99999… cause some dissonance. This is very productive dissonance. I’m hoping it carries us all the way to the completeness axiom; we’ll see.

One of the three tables produced the standard argument that if x = 0.9999…, then 10x = 9.9999…, so 9x = 10x – x = 9, so x = 1; but even this table found this conclusion unsatisfying. I asked them why. The table that had produced the argument said, “usually this method gives you a fraction.” I asked for an example. They produced one from the problem set:

x = 1.363636…
100x = 136.363636…
99x = 100x – x = 135
x = 135/99 = 15/11

I asked how many folks found this argument convincing. 7 out of 7. (Well, one raised hand was kind of hesitant.)

I asked the same question about the same argument with .9999…. 4 out of 7. Then I dropped this:

How many people found this one convincing? 0 out of 7.



Then what’s the difference?

At first, they cast about a bit, but then one of them said, “1.363636… has a finite limit, but …9999.0 doesn’t.” Their ideas began to coalesce around this type of language. Another one said, “we can actually estimate 1.363636…, for example we know it’s between 1 and 2.”

From the point of view I am ultimately heading for, this is the rub. Infinite decimals suggest convergent series, and the standard way to give them meaning as real numbers is that they are equal to the limit of the convergent series they suggest. …9999.0 suggests a wildly divergent series, so it cannot become a real number in the same way. (To bring home that convergence is the heart of the matter: there is an alternative way to define distance between numbers, the 10-adic metric, according to which it is actually …9999.0 that has the convergent series, and in this alternative system the above proof is valid and it actually does equal -1.) What I’d like us to do is a) define limits precisely; b) use this to prove that when a series has a limit, you can do the above type of manipulations to find it; c) try to prove that the series suggested by an infinite decimal always has a limit; d) realize that we can’t prove this without articulating the completeness axiom; e) articulate the axiom; and f) prove from the axiom that any infinite decimal has a real number limit. (Somewhere along the line, produce an \epsilon\delta proof that 0.9999… = 1.) Now, how to orchestrate this…

For next time I told them to try to craft a definition of the meaning of an infinite decimal 0.abcd… I gave them a few minutes just before the end to discuss this with their groups. I’m expecting to learn a lot about their thinking from what they come up with, but I’m not counting on anyone to have a mathematically satisfying answer. I’ll be pleased if somebody does though.

As I think about next class, here’s what’s on my mind:

1) When we develop the \epsilon\delta limit, what I’m going for is for this definition to feel like a satisfying relief. I know how easy it is for this definition instead to feel like a horrible monstrosity designed to oppress analysis students. I think what I have to do is keep them thinking about the reasons why anything less than this definition is too vague, which means I need to keep coming up with objects and problems that throw monkey wrenches into whatever more naive definitions they go for. (Of course, if they come up with something equally precise as the \epsilon\delta limit but different, that would be amazing.) I feel like we’re off to a good start on this, but I want a fuller catalogue of head-scratchers (like …9999.0 = -1) to push the level of precision higher.

2) Relatedly, I sense a danger that the “real answers” will be unsatisfying because it’ll feel like “wait, I already said that.” For example, the participant who said that the difference between 1.3636… = 15/11 and …9999.0 = -1 is that “the first one has a finite limit”… I mean this is basically the answer. But it’s not based on a precise definition of limit yet, so it’s not what I want yet. I’m afraid of a “what was the big deal?” moment when we’ve got the real sh*t up there. I think the way to avoid this lies in that catalogue of head-scratchers I need to develop, so that nothing less than the real thing is satisfying. What do you think?

3) Where to go immediately next. Basically the question is: stick with decimals? Or change gears completely and press on the notion of instantaneous speed? Most (not all, I think) of the teachers have had a calculus course, but think at most 1 or 2 of them have internalized the philosophical lesson that instantaneous speed needs to be defined as a limit in order for us to even access it. I’m attracted to the idea of switching gears because I’m drawn to the connection between the disparate realms: two highly familiar, but totally different, objects – infinite decimals and speed in a moment – both getting pressed on to the point where you realize you never fully understood either one, and then you realize that the missing idea you need is the same thing in the two cases. (A precise way to talk about what number some varying quantity is “heading toward.”)

Actually as I write this out, it seems clear to me that switching gears is the way to go. I think it’ll give us a clearer understanding of what we’re missing with the decimals. Also, it’ll allow us to access all this rich historical stuff around the development of calculus. For example, maybe I’ll share with them some choice quotes from Bishop Berkeley’s The Analyst, to help articulate why the 18th century definition of the derivative was inadequate.

Anyway. Very excited about all this. Will definitely keep you posted.