Good Brawls and Honoring Kids’ Dissatisfaction Friday, Mar 8 2013

I was just reading some old correspondence with a friend J who periodically writes me regarding a math question he and his son are pondering together. The exchange was pretty juicy, about how many ways can an even number be decomposed as a sum of primes. But actually, the juiciest thing we got into was this:

Is 1 a prime number?

It was kind of a fight! Since I and Wikipedia agreed on this point (it’s not prime), J acknowledged we must know something he didn’t. But regardless, he kind of wasn’t having it.

Point 1: This is awesome.

Nothing could be better mathematician training than a fight about math. Proofs are called “arguments” for a reason.

When I went to Bob and Ellen Kaplan’s math circle training in 2009, I was heading to do a practice math circle with some high schoolers and Bob asked me, “what question are you opening with?” I said, “does .9999…=1?” He smiled with knowing anticipation and said, “oooh, that one always starts a brawl.”

Well, it wasn’t quite the bloodbath Bob led me to expect, but the kids were totally divided. One kid knew the “proof” where you go

$0.999...=x$

Multiplying by 10,

$9.999...=10x$

Subtracting,

$9 = 9x$

so $x=1$

and the other kids had that same sort of feeling like, “he knows something we don’t know,” but they weren’t convinced, and with only a minimal amount coaxing, they weren’t shy about it. The resulting conversation was the stuff of real growth: everybody in the room was contending with, and thereby pushing, the limits of their understanding. Even the boy who “knew the right answer” began to realize he didn’t have the whole story, as he found himself struggling to be articulate in the face of his classmates’ doubt.

Now this could have gone a completely different way. It’s common for “0.999… = 1″ to be treated as a fact and the above as a proof. Similarly, since the Wikipedia entry on prime numbers says, “… a natural number greater than 1 that has no positive divisors…,” we could just leave it at that.

But in both situations, this would be to dishonor everyone’s dissatisfaction. It is so vital that we honor it. Everybody, school-aged through grown-up, is constantly walking away from math thinking “I don’t get it.” This is a useless perspective. Never let them say they don’t get it. What they should be thinking is that they don’t buy it.

And they shouldn’t! If it wasn’t already clear that I think the above “proof” that 0.999…=1 is bullsh*t, let me make it clear. I think that argument, presented as proof, is dishonest.

I mean, if you understand real analysis, I have no beef with it. But at the level where this conversation is usually happening, this is not a proof, are you kidding me?? THE LEFT SIDE IS AN INFINITE SERIES. That means to make this argument sound, you have to deal with everything that is involved with understanding infinite series! But you just kinda slipped that in the back door, and nobody said anything because they are not used to honoring their dissatisfaction. As I have pointed out in the past, if you ignore all the series convergence issues, the exact same argument proves that …999.0=-1:

$...999.0=x$

Dividing by 10,

$...999.9=0.1x$

Subtracting,

$-.9 = .9x$

so $x=-1$

If you smell a rat, good! My point is that that same rat is smelling up the other proof too. We need to have some respect for kids’ minds when they look funny at you when you tell them 0.999…=1. They should be looking at you funny!

Same thing with why 1 is not a prime. If a student feels like 1 should be prime, that deserves some frickin respect! Because they are behaving like a mathematician! Definitions don’t get dropped down from the sky; they take their form by mathematicians arguing about them. And they get tweaked as our understanding evolves. People were still arguing about whether 1 was prime as late as the 19th century. Today, no number theorist thinks 1 is prime; however, in the 20th century we discovered a connection between primes and valuations, which has led to the idea in algebraic number theory that in addition to the ordinary primes there is an “infinite” prime, corresponding to the ordinary absolute value just as each ordinary prime corresponds to a p-adic absolute value. Now for goodness sakes, I hope you don’t buy this! With study, I have gained some sense of the utility of the idea, but I’m not entirely sold myself.

To summarize, point 2: Change “I don’t get it” to “I don’t buy it”.

Now I think this change is a good idea for everyone learning mathematics, at any level but especially in school, and I think we should teach kids to change their thinking in this way regardless of what they’re working on. But there is something special to me about these two questions (is 0.999…=1? Is 1 prime?) that bring this idea to the foreground. They’re like custom-made to start a fight. If you raise these questions with students and you are intellectually honest with them and encourage them to be honest with you, you are guaranteed to find that many of them will not buy the “right answers.” What is special about these questions?

I think it’s that the “right answers” are determined by considerations that are coming from parts of math way beyond the level where the conversation is happening. As noted above, the “full story” on 0.999…=1, in fact, the full story on the left side even having meaning, involves real analysis. We tend to slip infinite decimals sideways into the grade-school/middle-school curriculum without comment, kind of like, “oh, you know, kids, 0.3333…. is just like 0.3 or 0.33 but with more 3′s!” Students are uncomfortable with this, but we just squoosh their discomfort by ignoring it and acting perfectly comfortable ourselves, and eventually they get used to the idea and forget that they were ever uncomfortable.

Meanwhile, the full story on whether 1 is prime involves the full story on what a prime is. As above, that’s a story that even at the level of PhD study I don’t feel I fully have yet. The more I learn the more convinced I am that it would be wrong to say 1 is prime; but the learning is the point. If you tell them “a prime is a number whose only divisors are 1 and itself,” well, then, 1 is prime! Changing the definition to “exactly 2 factors” can feel like a contrivance to kick out 1 unfairly. It’s not until you get into heavier stuff (e.g. if 1 is prime, then prime factorizations aren’t unique) that it begins to feel wrong to lump 1 in with the others.

I highlight this because it means that trying to wrap up these questions with pat answers, like the phony proof above that 0.999…=1, is dishonest. Serious questions are being swept under the rug. The flip side is that really honoring students’ dissatisfaction is a way into this heavier stuff! It’s a win-win. I would love to have a big catalogue of questions like these: 3- to 6-word questions you could pose at the K-8 level but you still feel like you’re learning something about in grad school. Got any more for me?

All this puts me in mind of a beautiful 15-minute digression I witnessed about 2 years ago in the middle of Jesse Johnson’s class regarding the question is zero even or odd? It wasn’t on the lesson plan, but when it came up, Jesse gave it the full floor, and let me tell you it was gorgeous. A lot of kids wanted the answer to be that 0 is neither even nor odd; but a handful of kids, led by a particularly intrepid, diminutive boy, grew convinced that it is even. Watching him struggle to form his thoughts into an articulate point for others, and watching them contend with those thoughts, was like watching brains grow bigger visibly in real time.

Honor your dissatisfaction. Honor their dissatisfaction. Math was made for an honest fight.

p.s. Obliquely relevant: Teach the Controversy (Dan Meyer)

Elementary Mathematics from an Advanced Standpoint Monday, Mar 12 2012

Another of the many reasons I’m in grad school. I benefit as a teacher from understanding the content I teach in way more depth than I teach it. (I think everybody does, but it’s easiest to talk about myself.)

This does a number of things for me. The simplest is that it makes the content more exciting to me. Something that previously seemed routine can become pregnant with significance if I know where it’s going, and there’s a corresponding twinkle that shows up in my eye the whole time my students are dealing with it. A second benefit is that it gives me both tools and inspiration to find more different ways of explaining things. A third is that it helps me see (and therefore develop lessons aimed at) connections between different ideas.

So, this post is a catalogue of some insights that I’ve had about K-12 math that I’ve been led to by PhD study. The title of the post is a reference to Felix Klein’s classic books of the same name. The catalogue is mostly for my own benefit, and I don’t have all that much time, so I’m going to try to suppress the impulse to fully explain some of the more esoteric vocabulary, but I never want to write something here that requires expert knowledge to avoid being useless, so I’ll try to be both clear and pithy. (Wish me luck.)

Elementary level: Multiplication is function composition.

I’m developing the opinion that it’s important for especially middle and high school teachers to have this language. The upshot is that in addition to the usual models of multiplication as (a) repeated addition and (b) arrays and area of rectangles (and, if you’re lucky, (c) double number lines), multiplication is also the net effect of doing two things in a row, such as stretching (and possibly reversing) a number line.

The big thing I want to say here is that understanding this is key to understanding multiplication of signed numbers. I would go so far as to wager that anybody who feels they know intuitively why $-\cdot -=+$ understands it on some level, consciously or not.

When somebody asks me why a negative times a negative is a positive, I have often had the inclination to answer with, “well, what’s the opposite of the opposite of something?” (I have seen many teachers use metaphors with the same upshot.) The problem is that if you understand multiplication only as repeated addition and as the the area of rectangles, I’ve changed the subject with this answer. It is a complete nonsequitur. It’s probably clear why it has to do with negatives but why does it have to do with multiplication?

On the other hand, if on any level you realize that one meaning of $2\times 3$ is “double then triple”, then it’s natural for $(-2)\times(-3)$ to mean “double and oppositize, then triple and oppositize.” But for this you had to be able to see multiplication as “do something then do something else.”

Algebra I and Algebra II: Substitution is calculation inside a coordinate ring.

I just realized this today, and that’s what inspired this blog post. So far, I’m not sure the benefit of this one to my teaching beyond the twinkle it will bring to my eye, though perhaps that will become clear later. It’s certainly helping me understand something about algebraic geometry. The basic idea is this: say you’re finding the intersections of some graphs like $y=3x+5$ and $2x+y=30$. You’re like, “alright, substitute using the fact that $y=3x+5$. $2x+(3x+5)=30$, so $5x+5=30$…” and you solve that to find $x=5$, for an intersection point of $(5,20)$. A way to look at what you’re doing when you make the substitution $y=3x+5$ is that you’re working in a special algebraic system determined by the line $y=3x+5$, in particular the (tautological) fact that on this line, $y$ is exactly three $x$ plus five. In this system, polynomials in $x$ or $y$ alone work the usual way, but polynomials in $x$ and $y$ both can often be simplified using the relation $y=3x+5$ connecting $x$ and $y$. This algebraic system is called “the coordinate ring of the line $y=3x+5$.”

I can’t tell if it will even seem that I’ve said anything at all here. The point, for me, is just a sublte shift in perspective. I imagine myself sitting on the line $y=3x+5$; then this line determines an algebraic system (the coordinate ring) which, as long as I’m on the line, is the right system; and when I substitute $3x+5$ for $y$, what I’m doing is using the rules of that system.

Calculus: The chain rule is the functoriality of the derivative.

“Functoriality” is a word from category theory which I will avoid defining. The point is really about the chain rule. The main ways the derivative is presented in a first-year calculus class are as speed, or rate of change, on the one hand (like, you’re always thinking of the independent variable as time, whatever it really is), and the slope of the tangent line of a graph, on the other. There is a third way to look at it, which I learned from differential geometry. If you look at a function as a mapping from the real line to itself, then the derivative describes the factor by which it stretches small intervals. For example, $f(x)=x^2$ has a derivative of $6$ at $x=3$. What this is saying is that very small intervals around $x=3$ get mapped to intervals that are about 6 times as long. (To illustrate: the interval $[3,3.01]$ gets mapped to $[9,9.0601]$, about 6 times as long.)

Seen in this way, the strange formula $[f(g(x))]'=f'(g(x))\cdot g'(x)$ for the chain rule becomes the only sensible way it could be. The function $f(g(x))$ is the net effect of doing $g$ to $x$ and then doing $f$ to the answer $g(x)$. If I want to know how much this function stretches intervals, well, when $g$ is applied to $x$ they are stretched by a factor of $g'(x)$. Then when $f$ is applied to $g(x)$ they are stretched by a factor of $f'(g(x))$. (Note it is clear why you evaluate $f'$ at $g(x)$: that is the number to which $f$ got applied.) So you stretched first by a factor of $g'(x)$ and then by a factor of $f'(g(x))$; net effect, $f'(g(x))\cdot g'(x)$, just like the formula says.

(As an aside, for the sake of being thematic, note the role here of the fact that the multiplication comes from the composition of the two stretches – multiplication is function composition. When I say “the derivative is functorial” what I really mean is that it turns composition of functions into composition of stretches.)

Calculus: The intermediate value theorem is $f(\text{connected})=\text{connected}$. The extreme value theorem is $f(\text{compact})=\text{compact}$.

This is a good example of what I was talking about at the beginning about the twinkle in my eye, and connections between ideas. When I used to teach AP calculus, the extreme value theorem and the intermediate value theorem were things I had trouble connecting to the rest of the curriculum. They were these miscellaneous, intuitively obvious factoids about continuous functions that were stuck into the course in awkward places. They both had the same clunky hypothesis, “if $f$ is a function that is continuous on a closed interval $[a,b]$…” I didn’t do much with them, because I didn’t care about them.

I started to see a bigger picture about three years ago, in a course for calculus teachers taught by the irrepressible Larry Zimmerman. He referred to that clunky hypothesis as something to the effect of “a lilting refrain calling like a siren song.” I was also left with the image of a golden thread weaving through the fabric of calculus but I’m not sure if he said that. The point is, he made a big deal about that hypothesis, making me notice how thematic it is.

Last year when I taught a course on algebra and analysis, having benefited from this education, I made these theorems important goals of the course. But something further clicked into place this fall, when I started to need to draw on point-set topology knowledge as I studied differential geometry. Two fundamental concepts in topology are compactness and connectedness. They have technical definitions for which you can follow the links. Intuitively, connectedness is what it sounds like (all one piece), and compactness means (very loosely) that a set “ends, and reaches everywhere it heads toward.” (A closed interval is compact. The whole real line is not compact because it doesn’t end. An open interval is not compact because it wants to include its endpoints but it doesn’t. A professor of mine described compactness as, “everything that should happen [in the set] does happen.”)

Two basic theorems of point-set topology are that under a continuous mapping, the image of any connected set is connected and the image of any compact set is compact. These theorems are very general: they are true in the setting of any map between any two topological spaces. (They could be multidimensional, curved or twisted, or even more exotic…) What I realized is that the intermediate value theorem is just the theorem about connectedness specialized to the real line, and the extreme value theorem is just the theorem about compactness. What is a compact, connected subset of $\mathbb{R}$? It is precisely a closed interval. Under a continuous function, the image must therefore be compact and connected. Therefore, it must attain a maximum and minimum, because if not, the image either “doesn’t end” or “doesn’t reach its ends,” either of which would make it noncompact. And, for any two values hit by the image, it must hit every value between them; any missing value would disconnect it. So, “if $f$ is a function that is continuous on a closed interval $[a,b]$…”

Creating Balance III / Miscellany Saturday, Oct 23 2010

The Creating Balance in an Unjust World conference is back! I went a year and a half ago and it was awesome. Math education and social justice, what more could you want?

If you’re in NYC and you’re around this weekend, it’s happening right now! I’m going to try to make it to Session 3 this afternoon. It’s at Long Island University, corner of Flatbush and DeKalb in Brooklyn, right off the DeKalb stop on the Q train. I heard from one of the organizers that you can show up and register at the conference. I’m not 100% sure how that works given that it’s already begun, but I am sure you can still go.

* * * * *

I’ve just had a very intense week.

I want to get some thoughts down. I’m going to try very hard to resist my natural inclinations to a) try to work them into an overall narrative, and b) take forever doing it. Let’s see how I do.

(Ed. note: apparently not very well.)

I. Last spring I wrote

20*20 is 400; how does taking away 2 from one of the factors and 3 from the other affect the product? We get kids thinking hard about this and it would support the most contrivance-free explanation for why (neg)(neg)=(pos) that I have ever seen.

Without going into contextual details, let me just say that if you try to use this to actually develop the multiplication rules in a 1-hour lesson, all that will happen is that you will be dragging kids through the biggest, clunkiest, hardest-to-swallow, easiest-to-lose-the-forest-for-the-trees, totally-mathematically-correct-but-come-now model for signed number multiplication that you have ever seen (and this includes the hot and cold cubes). This idea makes sense for building intuition about signed numbers slowly, before they’re an actual object of study. It does not make any sense at all for teaching a one-off lesson explicitly about them. (Yes, the hard way. I totally knew this five months ago – what was I thinking?)

II. I gave a workshop Wednesday night, for about 35 experienced teachers, entitled “Why Linear Algebra Is Awesome.” The idea was to reinterpret the Fibonacci recurrence as a linear transformation and use linear algebra to get a closed form for the Fibonacci numbers. Again, without going into details –

I gave a problem set to make participants notice that the transformation we were working with was linear. I used those PCMI-style tricks like giving two problems in a row that have the same answer for a mathematically significant reason. This worked totally well. Here is the problem set:

Oops I guess I failed to avoid going into details. Anyway, the question was about how to follow this up. I went over 1-4 with everyone (actually, I had individual participants come up to the front for #3 and 4) at which point the only thing I really needed out of this – the linearity of the transformation – had been noticed by pretty much the whole room. One participant had gotten to #9 where you prove it, and I had her go over her proof.

I think this was valueless for the group as a whole. The proof was just a straight computation. You kind of have to do it yourself to feel it at all. It was such a striking difference watching people work on the problem set and have all these lightbulbs go off, vs. listening to somebody prove the thing they’d noticed. It almost seemed like people didn’t see the connection between what they’d noticed and what just got proved. I told them to take 5 minutes and discuss this connection with their table, but I got the feeling that this instruction was actually further disorienting for some participants.

I’m trying to put the experience into language so I get the lesson from it.

It’s like, there was something uninspired and disconnected about watching somebody formally prove the result, and then afterward trying to find the connection between the proof and the observation. Now that I write this down, clearly that was backward. If I wanted the proof (which was really just a boring calculation) to mean anything, especially if I wanted it to be at all engaging to watch somebody else do the proof, we needed to be in suspense about whether the result was true; either because we legitimately weren’t sure, or because we were pretty sure but a lot was riding on it.

This is adding up to: next time I do it, feel no need to prove the linearity. Let them observe it from the problem set and articulate it, but if there is no sense of uncertainty about it, this is enough. Later in the workshop, when we use it to derive a closed form for the Fibonacci numbers, now a lot is riding on it. If it feels right, we could take that moment to make sure it’s true.

III. As I work on my teacher class, something that’s impressing itself upon me for the first time is that definitions are just as important as proofs. What I mean by this is two things:

a) It makes sense to put a real lot of thought into motivating a course’s key definitions,

and maybe even more importantly,

b) Students of math need practice in creating definitions. You know I think that creating proofs is an underdeveloped skill for most students of math; it strikes me that creating definitions might be even more underdeveloped.

Definitions are one of the most overtly creative products of mathematical work, but they also solve problems. Not in quite the same sense that theorems do – they don’t answer precisely stated questions. But they answer an important question nonetheless – what do we really mean? And to really test a definition, you have to try to prove theorems with it. If it helps you prove theorems, and if the picture that emerges when you prove them matches the image you had when you started trying to make the definition, then it is a “good” definition. (This got clear for me by reading Stephen Maurer’s totally entertaining 1980 article The King Chicken Theorems.)

Anyway this adds up to an activity to put students through that I’ve never explicitly thought about before, but now find myself building up to with my teacher class:

a) Pose a definitional problem. Do a lot of work to make the class understand that we have an important idea at hand for which we lack a good definition.

b) Make them try to create a definition.

c) If they come up with something at all workable, have them try to use it to prove something they already believe true. I’ve often talked in the past about how trying to prove something you already believe true is very difficult, and that will be a problem here. However, unlike in the cases I had in mind (e.g. a typical Geometry “proof exercise”), this situation has the necessary element of suspense: does our definition work?

If they don’t come up with something workable, maybe give them a not entirely precise definition to try out.

d) Refine the definition based on the experience trying to use it to prove something.

I’ll let you know how it goes. I’m excited about it because it mirrors the process that advances mathematics as a discipline. But I expect to have a much better sense of its usefulness once I’ve given it an honest whirl.

The History of Algebra, part II: Unsophisticated vs. Sophisticated Tools Friday, Jul 9 2010

Math ed bloggers love Star Wars. This post is extremely long, and involves a fair amount of math, so in the hopes of keeping you reading, I promise a Star Wars reference toward the end. Also, you can still get the point if you skip the math, though that would be sad.

The historical research project I gave myself this spring in order to prep my group theory class (which is over now – why am I still at it?) has had me working slowly through two more watershed documents in the history of math:

Disquisitiones Arithmeticae
by Carl Friedrich Gauss
(in particular, “Section VII: Equations Defining Sections of a Circle”)

and

Mémoire sur les conditions de résolubilité des équations par radicaux
by Evariste Galois

I’m not done with either, but already I’ve been struck with something I wanted to share. Mainly it’s just some cool math, but there’s a pedagogically relevant idea in here too -

Take-home lesson: The first time a problem is solved the solution uses only simple, pre-existing ideas. The arguments and solution methods are ugly and specific. Only later do new, more difficult ideas get applied, which allow the arguments and solution methods to become elegant and general.

The ugliness and specificity of the arguments and solution methods, and the desire to clean them up and generalize them, are thus a natural motivation for the new ideas.

This is just one historical object lesson in why “build the machinery, then apply it” is a pedagogically unnatural order. Professors delight in using the heavy artillery of modern math to give three-sentence proofs of theorems once considered difficult. (I’ve recently taken courses in algebra, topology, and complex analysis, with three different professors, and deep into each course, the professor gleefully showcased the power of the tools we’d developed by tossing off a quick proof of the fundamental theorem of algebra.) Now, this is a very fun thing to do. But if the goal is to make math accessible, then this is not the natural order.

The natural order is to try to answer a question first. Maybe we answer it, maybe we don’t. But the desire for and the development of the new machinery come most naturally from direct, hands-on experience with the limitations of the old machinery. And that means using it to try to answer questions.

I’m not saying anything new here. But I just want to show you a really striking example from Gauss. (Didn’t you always want to see some original Gauss? No? Okay, well…)

* * * * *

I am reading a 1966 translation of the Disquisitiones by Arthur A. Clarke which I have from the library. An original Latin copy is online here. I don’t read Latin but maybe you do.

I’m focusing on the last section in the book, but at one point Gauss makes use of a result he proved much earlier:

Article 42. If the coefficients $A, B, C, \dots, N; a, b, c, \dots, n$ of two functions of the form

$P=x^m+Ax^{m-1}+Bx^{m-2}+Cx^{m-3}+\dots+N$

$Q=x^{\mu}+ax^{\mu-1}+bx^{\mu-2}+cx^{\mu-3}+\dots+n$

are all rational and not all integers, and if the product of $P$ and $Q$ is

$x^{m+\mu}+\mathfrak{A}x^{m+\mu-1}+\mathfrak{B}x^{m+\mu-2}+etc.+\mathfrak{Z}$

then not all the coefficients $\mathfrak{A}, \mathfrak{B}, \dots, \mathfrak{Z}$ can be integers.

Note that even the statement of Gauss’ proposition here would be cleaned up by modern language. Gauss doesn’t even have the word “polynomial.” The word “monic” (i.e., leading coefficient 1) would also have been handy. In modern language he could have said, “The product of two rational monic polynomials is not an integer polynomial if any of their coefficients are not integers.”

But this is not the most dramatic difference between Gauss’ statement (and proof – just give me a sec) and the “modern version.” On page 400 of Michael Artin’s Algebra textbook (which I can’t stop talking about only because it is where I learned like everything I know), we find:

(3.3) Theorem. Gauss’s Lemma: A product of primitive polynomials in $\mathbb{Z}[x]$ is primitive.

The sense in which this lemma is Gauss’s is precisely the sense in which it is really talking about the contents of Article 42 from Disquisitiones which I quoted above.

Huh?

First of all, what’s $\mathbb{Z}[x]$? Secondly, what’s a primitive polynomial? Third and most important, what does this have to do with the above? Clearly they both have something to do with multiplying polynomials, but…

Okay. $\mathbb{Z}[x]$ is just the name for the set of polynomials with integer coefficients. (Apologies to those of you who know this already.) So a polynomial in $\mathbb{Z}[x]$ is really just a polynomial with integer coefficients. This notation was developed long after Gauss.

More substantively, a “primitive polynomial” is an integer polynomial whose coefficients have gcd equal to 1. I.e. a polynomial from which you can’t factor out a nontrivial integer factor. E.g. $4x^2+4x+1$ is primitive, but $4x^2+4x+2$ is not because you can take out a 2. This idea is from after Gauss as well.

So, “Gauss’s Lemma” is saying that if you multiply two polynomials each of whose coefficients do not have a common factor, you will not get a common factor among all the coefficients in the product.

What does this have to do with the result Gauss actually stated?

That’s an exercise for you, if you feel like it. (Me too actually. I feel confident that the result Artin states has Gauss’s actual result as a consequence; less sure of the converse. What do you think?) (Hint, if you want: take Gauss’s monic, rational polynomials and clear fractions by multiplying each by the lcm of the denominators of its coefficients. In this way replace his original polynomials with integer polynomials. Will they be primitive?)

Meanwhile, what I really wanted to show you are the two proofs. Original proof: ugly, long, specific, but containing only elementary ideas. Modern proof: cute, elegant, general, but involving more advanced ideas.

Here is a very close paraphrase of Gauss’ original proof of his original claim. Remember, $P$ and $Q$ are monic polynomials with rational coefficients, not all of which are integers, and the goal is to prove that $PQ$‘s coefficients are not all integers.

Demonstration. Put all the coefficients of $P$ and $Q$ in lowest terms. At least one coefficient is a noninteger; say without loss of generality that it is in $P$. (If not, just switch the roles of $P$ and $Q$.) This coefficient is a fraction with a denominator divisible by some prime, say $p$. Find the term in $P$ among all the terms in $P$ whose coefficient’s denominator is divisible by the highest power of $p$. If there is more than one such term, pick the one with the highest degree. Call it $Gx^g$, and let the highest power of $p$ that divides the denominator of $G$ be $p^t$. ($t \geq 1$ since $p$ was chosen to divide the denominator of some coefficient in $P$ at least once.). The key fact about the choice of $Gx^g$ is, in Gauss’s words, that its “denominator involves higher powers of $p$ than the denominators of all fractional coefficients that precede it, and no lower powers than the denominators of all succeeding fractional coefficients.”

Gauss now divides $Q$ by $p$ to guarantee that at least one term in it (at the very least, the leading term) has a fractional coefficient with a denominator divisible by $p$, so that he can play the same game and choose the term $\Gamma x^{\gamma}$ of $Q/p$ with $\Gamma$ having a denominator divisible by $p$ more times than any preceding fractional coefficient and at least as many times as each subsequent coefficient. Let the highest power of $p$ dividing the denominator of $\Gamma$ be $p^{\tau}$. (Having divided the whole of $Q$ by $p$ guarantees that $\tau \geq 1$, just like $t$.)

I’ll quote Gauss word-for-word for the next step:

“Let those terms in $P$ which precede $Gx^g$ be $'Gx^{g+1}$, $''Gx^{g+2}$, etc. and those which follow be $G'x^{g-1}$, $G''x^{g-2}$, etc.; in like manner the terms which precede $\Gamma x^{\gamma}$ will be $'\Gamma x^{\gamma+1}$, $''\Gamma x^{\gamma+2}$, etc. and the terms which follow will be $\Gamma'x^{\gamma-1}$, $\Gamma''x^{\gamma-2}$, etc. It is clear that in the product of $P$ and $Q/p$ the coefficient of the term $x^{g+\gamma}$ will

$= G\Gamma + 'G\Gamma' + ''G\Gamma'' + etc.$

$+ '\Gamma G' + ''\Gamma G'' + etc.$

“The term $G\Gamma$ will be a fraction, and if it is expressed in lowest terms, it will involve $t+\tau$ powers of $p$ in the denominator. If any of the other terms is a fraction, lower powers of p will appear in the denominators because each of them will be the product of two factors, one of them involving no more than $t$ powers of $p$, the other involving fewer than $\tau$ such powers; or one of them involving no more than $\tau$ powers of $p$, the other involving fewer than $t$ such powers. Thus $G\Gamma$ will be of the form $e/(fp^{t+\tau})$, the others of the form $e'/(f'p^{t+\tau-\delta})$ where $\delta$ is positive and $e, f, f'$ are free of the factor $p$, and the sum will

$=\frac{ef'+e'fp^{\delta}}{ff'p^{t+\tau}}$

The numerator is not divisible by $p$ and so there is no reduction that can produce powers of $p$ lower than $t+\tau$.”

(This is on pp. 25-6 of the Clarke translation.)

This argument guarantees that the coefficient of $x^{g+\gamma}$ in $PQ/p$, expressed in lowest terms, has a denominator divisible by $p^{t+\tau}$. Thus the coefficient of the same term in $PQ$ has a denominator divisible by $p^{t+\tau-1}$. Since $t$ and $\tau$ are each at least 1, this means the denominator of this term is divisible by $p$ at least once, and so a fraction. Q.E.D.

Like I said – nasty, right? But the concepts involved are just fractions and divisibility. Compare a modern proof of “Gauss’ Lemma” (the statement I quoted above from Artin – a product of primitive integer polynomials is primitive).

Proof. Let the polynomials be $P$ and $Q$. Pick any prime number $p$, and reduce everything mod $p$. $P$ and $Q$ are primitive so they each have at least one coefficient not divisible by $p$. Thus $P \not\equiv 0 \mod{p}$ and $Q \not\equiv 0 \mod{p}$. By looking at the leading terms of $P$ and $Q$ mod $p$ we see that the product $PQ$ must be nonzero mod $p$ as well. This implies that $PQ$ contains at least one coefficient not divisible by $p$. Since this argument works for any prime $p$, it follows that there is no prime dividing every coefficient in $PQ$, which means that it is primitive. Q.E.D.1

Clean and quick. If you’re familiar with the concepts involved, it’s way easier to follow than Gauss’s original. But, you have to first digest a) the idea of reducing everything mod $p$; b) the fact that this operation is compatible with all the normal polynomial operations; and c) the crucial fact that because $p$ is prime, the product of two coefficients that are not $\equiv 0 \mod{p}$ will also be nonzero mod $p$.

Now Gauss actually had access to all of these ideas. In fact it was in the Disquisitiones Arithmeticae itself that the world was introduced to the notation “$a \equiv b \mod{p}$.” So in a way it’s even more striking that he didn’t think to use them here when they would have cleaned up so much.

What bugged me out and made me excited to share this with you was the realization that these two proofs are essentially the same proof.

What?

I’m not gonna spell it out, because what’s the fun in that? But here’s a hint: that term $Gx^g$ that Gauss singled out in his polynomial $P$? Think about what would happen to that term (in comparison with all the terms before it) if you a) multiplied the whole polynomial by the lcm of the denominators to clear out all the fractions and yield a primitive integer polynomial, and then b) reduced everything mod p.

(If you are into this sort of thing, I found it to be an awesome exercise, that gave me a much deeper understanding of both proofs, to flesh out the equivalence, so I recommend that.)

* * * * *

What’s the pedagogical big picture here?

I see this as a case study in the value of approaching a problem with unsophisticated tools before learning sophisticated tools for it. To begin with, this historical anecdote seems to indicate that this is the natural flow. I mean, everybody always says Gauss was the greatest mathematician of all time, and even he didn’t think to use reduction mod $p$ on this problem, even though he was developing this tool on the surrounding pages of the very the same book.

In more detail, why is this more pedagogically natural than “build the (sophisticated) machine, then apply it”?

First of all, the machine is inscrutable before it is applied. Think about being handed all the tiny parts of a sophisticated robot, along with assembly instructions, but given no sense of how the whole thing is supposed to function once it’s put together. And then trying to follow the instructions. This is what it’s like to learn sophisticated math ideas machinery-first, application-later. I felt this way this spring in learning the idea of Brouwer degree in my topology class. Now that everything is put together, I have a strong desire to go back to the beginning and do the whole thing again knowing what the end goal is. The ideas felt so airy and insubstantial the first time through. I never felt grounded.

Secondly, the quick solution that is powered by the sophisticated tools loses something if it’s not coupled with some experience working on the same problem with less sophisticated tools. The aesthetic delight that professors take in the short and elegant solution of the erstwhile-difficult problem comes from an intimacy with this difficulty that the student skips if she just learns the power tools and then zaps it. Likewise, if the goal is to gain insight into the problem, the short, turbo-powered solution often feels very illuminating to someone (say, the professor) who knows the long, ugly solution, but like pure magic, and therefore not illuminating at all, to someone (say, a student) who doesn’t know any other way. There is something tenuous and spindly about knowing a high-powered solution only.

Here I can cite my own experience with Gauss’s Lemma, the subject of this post. I remember reading the proof in Artin a year ago and being satisfied at the time, but I also remember being unable to recall this proof (even though it’s so simple! maybe because it’s so simple!) several months later. You read it, it works, it’s like poof! done! It’s almost like a sharp thin needle that passes right through your brain without leaving any effect. (Eeew… sorry that was gross.) The process of working through Gauss’ original proof, and then working through how the proofs are so closely related, has made my understanding of Artin’s proof far deeper and my appreciation of its elegance far stronger. Before, all I saw was a cute short argument that made something true. I now see in it the mess that it is elegantly cleaning up.

I’ve had a different form of the same experience as I fight my way through Galois’ paper. (I am working through the translation found in Appendix I of Harold Edwards’ 1984 book Galois Theory. This is a great way to do it because if at any point you are totally lost about what Galois means, you can usually dig through the book and find out what Edwards thinks he means.) I previously learned a modern treatment of Galois theory (essentially the one found in Nathan Jacobson’s Basic Algebra I – what a ridiculous title from the point of view of a high school teacher!). When I learned it, I “followed” everything but I knew my understanding was not where I wanted it to be. Here the words “spindly” and “tenuous” come to mind again. The arguments were built one on top of another till I was looking at a tall machine with a lot of firepower at the very top but supported by a series of moving parts I didn’t have a lot of faith in.

An easy mark for Ewoks, and I knew it.

This version of Galois theory was all based on concepts like fields, automorphisms, vector spaces, separable and normal extensions, of which Galois himself had access to none. The process of fighting through Galois’ original development of his theory and trying to understand how it is related to what I learned before has been slowly filling out and reinforcing the lower regions of this structure for me. Coupling the sophisticated with the less sophisticated approach has given the entire edifice some solidity.

Thirdly, and this is what I feel like I hear folks (Shawn Cornally, Dan Meyer, Alison Blank, etc.) talk about a lot, but it bears repeating, is this:

If you attack a problem with the tools you have, and either you can’t solve it, or you can solve it but your solution is messy and ugly, like Gauss’s solution above (if I may), then you have a reason to want better tools. Furthermore, the way in which your tools are failing you, or in which they are being inefficient, may be a hint to you for how the better tools need to look.

Just as an example, think about how awesome reduction mod $p$ is going to seem if you are already fighting (as Gauss did) with a whole bunch of adding stuff up some of which is divisible by $p$ and some of which is not. What if you could treat everything divisible by $p$ as zero and then summarily forget about it? How convenient would that be?

I want to bring this back to the K-12 level so let me give one other illustration. A major goal of 7th-9th grade math education in NY (and elsewhere) is getting kids to be able to solve all manner of single-variable linear equations. The basic tool here is “doing the same thing to both sides.” (As in, dividing both sides of the equation by 2, or subtracting 2x from both sides…) For the kids this is a powerful and sophisticated tool, one that takes a lot of work to fully understand, because it involves the extremely deep idea that you can change an equation without changing the information it is giving you.

There is no reason to bring out this tool in order to have the kiddies solve $x+7=10$. It’s even unnatural for solving $4x-21=55$. Both of these problems are susceptible to much less abstract methods, such as “working backwards.” The “both sides” tool is not naturally motivated until the variable appears on both sides of the equation. I used to let students solve $4x-21=55$ whatever way made sense to them, but then try to impose on them the understanding that what they had “really done” was to add 21 to both sides and then divide both sides by 4, so that later when I gave them equations with variables on both sides, they’d be ready. This was weak because I was working against the natural pedagogical flow. They didn’t need the new tool yet because I hadn’t given them problems that brought them to the limitations of the old tool. Instead, I just tried to force them to reimagine what they’d already been doing in a way that felt unnatural to them. Please, if a student answers your question and can give you any mathematically sound reason, no matter how elementary, accept it! If you would like them to do something fancier, try to give them a problem that forces them to.

Basically this whole post adds up to an excuse to show you some cool historical math and a plea for due respect to be given to unsophisticated solutions. There is no rush to the big fancy general tools (except the rush imposed by our various dark overlords). They are learned better, and appreciated better, if students, teachers, mathematicians first get to try out the tools we already have on the problems the fancy tools will eventually help us answer. It worked for Gauss.

[1] This is the substance of the proof given in Artin but I actually edited it a bit to make it (hopefully) more accessible. Artin talks about the ring homomorphism $\mathbb{Z}[x] \longrightarrow \mathbb{F}_p[x]$ and the images of $P$ and $Q$ (he calls them $f$ and $g$) under this homomorphism.

I recently bumped into a beautiful quote from Hermann Weyl that I had read before (in Bob and Ellen Kaplan’s Out of the Labyrinth, p. 157) and forgotten. It is entirely germane.

Beautiful general concepts do not drop out of the sky. To begin with, there are definite, concrete problems, with all their undivided complexity, and these must be conquered by individuals relying on brute force. Only then come the axiomatizers and systematizers and conclude that instead of straining to break in the door and bloodying one’s hands one should have first constructed a magic key of such and such a shape and then the door would have opened quietly, as if by itself. But they can construct the key only because the successful breakthrough enables them to study the lock front and back, from the outside and from the inside. Before we can generalize, formalize and axiomatize there must be mathematical substance.

Half of a Problem on Negatives – Ideas for the Other Half? Friday, May 21 2010

I’m getting a chance to put all my lofty talk about negative numbers into practice. A teacher I work with is doing an introduction to negative numbers with her sixth graders in a week and a half. I’m helping her plan the unit and she’s asked me to teach the intro lesson. (She is extremely awesome, by the way.)

A fair number of kids in the class can already produce, for example, -7 as the answer if 1 is tripled and then 10 is subtracted. Some kids can’t though. I am looking to make the intro lesson rich enough for the kids who can already do this to remain interested, but my intention is to make no assumptions about what kids already know. I want to give them a problem that will make them interact with the idea of negativeness even if they’ve never heard of it.

One idea is what I described before: consider a problem like 20+10 and ask what is the effect on the answer of adding 3 to or subtracting 3 from 10; do the same with 20-10; highlight the effect of the “added 3″ or the “subtracted 3.” I love this idea in principle but actually I think it doesn’t really develop fast enough for a single late-in-the-year lesson with 6th graders who already have some exposure to negatives. When I originally had it I was imagining 3rd and 4th graders, and developing the idea slowly, with little short exercises over the course of weeks, to slowly draw kids’ attention toward the “subtracted 3″ as a worthwhile object.

So, another idea. Base the lesson around a problem that involves numbers going in two directions. Make the problem a little mathematically interesting. Don’t try to force the negative idea but make the problem impossible to solve without considering the directions of the numbers. I have an idea for the mathematical content of such a problem but I want to put it in a realistic context to allow kids to reason about it without knowledge of negatives, and I’m having trouble thinking of a context that doesn’t feel contrived. So this is a request for suggestions.

Here’s the mathematical content idea: Three target numbers (positive and negative) and a list of 8-10 numbers, in no particular order, that sort into piles adding up to each target. The puzzle is to do the sorting. For instance:
Targets: -12, -19, and 7
List: -6, -4, -9, -15, 4, 2, -5, 9
I am avoiding posting an answer (there is at least one; I haven’t checked whether there is definitely only one) so you can try out the problem if you want.

I’ve been trying to compose a story so that (a) the positive and negative numbers have a concrete meaning accessible to someone who’s never heard of positives and negatives, and (b) there is some sort of reason why you’d know the target sums and the list but not know how the list sorts into piles, and yet you’d want to know. I have so far been stuck. I’ve been thinking about something like, 3 friends go gambling; A loses $12, B loses$19, C wins $7; for some reason (??) they know the individual transactions that took place (house collects$6, house collects $4, … , house collects$5, house pays out \$9) but not who did which transaction, and for some even more far-fetched reason (????) they want to know. As you can see this idea isn’t really working.

So: can you help me out with this? What I like about the problem is that a) it’s got that sudoku-jigsaw-like challenge even for someone for whom the actual arithmetic is elementary; and b) if I can find the right context to allow kids to reason about the problem concretely even if they’ve never heard of negatives, then the problem elicits as an answer what amounts to a set of equations about negatives, without the kids needing to learn anything new beforehand.

Or, is the problem impossible to render in an un-contrived way? In which case, can you help me think of a less-contrived problem that accomplishes some of the same goals?

The History of Algebra, part I: Negative Numbers Saturday, May 1 2010

This is the post I promised over a month ago on two landmark books in the history of algebra:

Kitab al-Jabr wa-l-Muqabala, aka The Compendium on Calculating by Completion and Reduction

and

Ars Magna, aka The Great Art, or The Rules of Algebra
by Girolamo Cardano

A lot can be and has been said about these books. I’m going to zero in on one particular story they tell:

Take-home lesson #1: the mathematical world’s understanding of negative numbers came incredibly slowly, in very gradual stages. We tend to treat learning about negatives like there’s just one big idea to understand. Really, there are like twenty.

Reading these books has given me more respect than ever for the depth of the process we ask kids to go through between sixth and ninth grade as they get comfortable working with negatives.

Take-home lesson #2: In the process of understanding a new and difficult idea, the ability to understand and use the idea to answer a question comes way before the ability to pose a question about the idea. So, it makes sense to get very comfortable with -2 as the answer to 5-7 before ever asking yourself to add -2 to something.

Take-home lesson #3: The development of algebra is an important motivator, historically anyway, for the development of negatives.

Pedagogical idea: How can we use this historical motivation to develop negatives with students?
a) Al-Khwarizmi’s book contains a very limited idea of negativeness: that which has been subtracted. But since he is thinking about how to multiply, for example, an unknown with 2 subtracted, from the same unknown with 3 subtracted, he needs to see that, once everything has been distributed, the product of the subtracted 2 and the subtracted 3 contribute an added 6 to the total. It is not immediately obvious how this becomes a classroom activity but I think it definitely can. 20*20 is 400; how does taking away 2 from one of the factors and 3 from the other affect the product? We get kids thinking hard about this and it would support the most contrivance-free explanation for why (neg)(neg)=(pos) that I have ever seen.
b) Allowing the coefficients of equations to be negative significantly cleaned up the theory of equations. Our students know more about negatives than the inventors of algebra did. It might be really exciting and powerful, increasing their appreciation for both negatives and quadratics, to show or let them develop the original (negative-impaired) theory of quadratics, and then have them use negatives to clean it up.

* * * * *

The first of these books was written in Arabic, and published around 820 in Baghdad. 820. Just to make sure you didn’t miss that. The translation I read is 180 years old. The full text of it is available online.

What I am even more anxious to make sure you didn’t miss are certain bits of the author’s name and the book title. The author is Muhammad Ibn Musa Al-Khwarizmi. (Muhammad, son of Musa, from Khwarizm.) He is often referred to just as Al-Khwarizmi. This is the origin of the English word “algorithm.”

And as if that weren’t awesome enough, the “completion” in the title is the Arabic word “al-jabr.” This is the origin of the English word “algebra.”

The second book was written in Latin and published in 1545 in Renaissance Italy. I read a 1968 translation by T. Richard Witmer. I can’t find it online, but in case you read Latin, here is a pdf of the original. Its distinction historically is that it was the first publication of a general method for solving what we would now call cubic and quartic equations. (Cardano attributes the solution for one class of cubics to Niccolo Tartaglia and Scipione del Ferro, the generalization of this solution to other classes of cubics to himself, and the solution of quartics to his student Lodovico Ferrari.)

As it happened, I overshot the historical mark a bit – for the purposes of the class, the mindframes of these two books are unnecessarily archaic. There is real pedagogical fertility here, but it’s around ideas that the participants in my class (who are teachers and mathematicians) already understand.

On the other hand, I’ve taught plenty of students who don’t understand them. In particular, I found myself surprised and intrigued by what each book did and did not say about negative numbers. I felt like I was watching this idea (the negative) coalesce and congeal, roughly and haltingly, over time. Like a churning mixture of hard crystal clarity and murky goo. If I may.

Though separated by 700 years, both books find it necessary to give three different quadratic formulas. Because, you see, you need a different method to solve
$x^{2}+10x=20$
than to solve
$x^{2}=10x+20$
or
$x^{2}+20=10x$.
(Actually, this notation is anachronistic. Neither author uses anything resembling modern notation. Muhammad Ibn Musa writes everything in prose. For the first of these equations, for instance, he would write, “A square and ten roots equal 20 dirhems.” Dirhems are an Arabic unit of currency.)

We think of there being only one quadratic formula because we are comfortable moving everything to the left; the only difference a modern reader can see between these equations is a difference in the signs of the coefficients:
$x^{2}+10x-20=0$
$x^{2}-10x+20=0$
etc. And all the equations can be solved exactly the same way. But for neither of these authors had the idea of negativeness grown adequately supple to make this possible.

Since Cardano presents the full algebraic solution to cubic equations, the situation is even more extreme in Ars Magna. Each of the following gets its own chapter:
“On the cube and first power equal to the number”
“On the cube equal to the first power and number”

“On the cube, first power, and number equal to the square”
“On the cube, square, and number equal to the first power”
These are the first two and last two in a sequence of 13 chapters. This is over 20% of the book. Not only does each equation type get its own method of solution, each method gets its own (geometric) proof.

Histories of mathematics often mention the situation I’ve described here. For example, Mactutor’s history of quadratic, cubic and quartic equations says something like “the different types arise because Al-Khwarizmi had no zero or negatives.” This is the story I’d gotten before I picked up the originals, and what I found out is that it’s not true.

Both books calculate comfortably with something translated as “negative numbers”. Ars Magna goes so far as to contain a calculation with imaginaries. But the scope of the idea of negativeness is limited, in a different way, in each book. And I think I learned something important about how people come to understand negative numbers by taking note of these limitations.

In Muhammad Ibn Musa’s work, a “negative” is a number that’s been subtracted from another number. That’s it; that’s all it is. But this is enough to justify all the rules of arithmetic with negatives that we teach middle schoolers, because Muhammad makes use of all of them:

If there are greater numbers combined with units to be added to or subtracted from them, then four multiplications are necessary; namely, the greater numbers by the greater numbers, the greater numbers by the units, the units by the greater numbers, and the units by the units.

He is talking about FOIL in case that wasn’t clear.

If the units, combined with the greater numbers, are positive, then the last multiplication is positive; if they are both negative, then the fourth multiplication is likewise positive. But if one of them is positive, and one negative, then the fourth multiplication is negative.

This is on pp. 21-22. Elsewhere, he fluently adds and subtracts these “negative” (i.e. subtracted) quantities. For example, on p. 27,

The root of two hundred, minus ten, subtracted from twenty minus the root of two hundred, is thirty minus twice the root of two hundred; twice the root of two hundred is the root of eight hundred.

My point is that Ibn Musa’s use of the idea of negativeness is so limited in scope that the word “negative” might even be sort of a mistranslation to a modern reader; however, this limited-scope idea fully supports all the rules of arithmetic we teach.

Cardano’s understanding of negativeness is much broader. For example, in the first chapter of the book, he explicitly discusses the possibility that a negative number might satisfy an equation. But throughout, his dealings with negatives are marked by a kind of choppiness, an inconsistency. Firstly, he refers to negative solutions to equations as “false” or “fictitious” (as opposed to “true”). Then, once he gets into the nitty gritty of solving equations, he pretty much stops mentioning them entirely. For example in chapter 8 he says “it is evident that when the middle power is equal to the highest power and the constant, there are necessarily two solutions…” We would say there are three (1 negative), and Cardano would have acknowledged this third solution in chapter 1.

What Cardano virtually never does with negatives (the one exception is below) is treat them like they can be coefficients. Solutions, but not coefficients: i.e. a negative number can be the answer to a question I asked but they can’t be the language in which the question is posed. Most of the time, the idea of working with negative coefficients appears simply to not occur to him. On one occasion, the spectre is invoked only to be dismissed (for reasons that are opaque to me). Cardano is discussing positive and negative solutions to equations in which a power equals a certain number. (I.e. solvable by the simple extraction of one root.)

It is always presumed in this case, of course, that the number to which the power is equated is true and not fictitious. To doubt this would be as silly as to doubt the fundamental rule itself for, though opposite reasoning must be observed in opposite cases, the reasoning is still the same. p. 11

What??

The point that I am making is that if Cardano is any example, negatives are much easier to get your head around as an answer than as part of the question. Allowing coefficients to be negative would have caused a massive increase in the efficiency of the theory: as noted above, Cardano gave separate solutions for thirteen forms of cubic equations. With negative coefficients, these thirteen cases are reduced to 2: quadratic term is zero vs. nonzero. I don’t know when this cleaning-up of the theory actually historically took place. Avital Oliver, whom I mentioned in my last post, told me that noticing how much negative coefficients would simplify the theory of equations was a major reason, historically, that negative numbers gained acceptance as numbers. That makes sense to me.

The one moment in the book where the idea of a negative number is entertained as part of the statement of a problem is in the absolutely fascinating chapter 37, On the Rule for Postulating a Negative:

This rule is threefold, for one either assumes a negative, or seeks a negative square root, or seeks what is not. p. 217

Cardano is being highly speculative here. He seems to think maybe the entire chapter he’s writing is crazy talk. He begins by considering equations with negative solutions. Even though he already spent chapter 1 talking about negative solutions, he feels the need to justify them here. He notes that
$x^{2}=4x+32$
and
$x^{2}=x+20$
don’t appear to have a common solution, since 8 solves the first while 5 solves the second. However, the “turned-around” equations
$x^{2}+4x=32$
and
$x^{2}+x=20$
do have a common solution, namely 4. In chapter 1, Cardano asserted that a quadratic and its “turnaround” have opposite solutions: a “true” (positive) solution for one is a “fictitious” (negative) solution, equal in magnitude, for the other. So here, the original pair of equations have a common solution after all: -4. Cardano seems to feel (and I kind of relate) that the presence of the common positive solution between the turned-around equations and the formal relationship between the turned-around pair and the original pair means there ought to be a common solution for the original pair; the fact that this common solution turns out to exist if you allow negative solutions is then a reason to believe in negative solutions.

Anyway, he follows with two problems about the property of a man named Francis. The problems are totally contrived but they lead to negative solutions for Francis’ property, which he interprets as meaning that Francis has debt. Tellingly, though, he sets up the equations letting -x be Francis’ property, so that the equations he actually solves have positive solutions.

Then, he poses a problem that has no positive or negative solution: divide 10 into two parts whose product is 40. He follows the procedure he uses on comparable problems with real solutions (e.g. divide 10 into two parts whose product is 21): “… it is clear that this case is impossible. Nevertheless, we will work thus:…” (p. 219). The procedure forces him to subtract 40 from 25 and then take the square root of this. He already seems dubious about the subtraction 25-40:

The square root of the remainder, then - if anything remains - added to or subtracted from [five] shows the parts. But since such a remainder is negative, you will have to imagine $\sqrt{-15}$. p. 219

Note the “if anything remains.” So this “square root of a negative” business is a bunch of new hooey built on something that might be hooey to begin with. In that context it almost feels like what we’d now call imaginaries (and what Cardano calls “the sophistic negative”) are only a comparatively small speculative step beyond the craziness of negative numbers in the first place. The whole chapter has this I-know-this-is-complete-madness-but-I’m-just-gonna-do-it tone. A famous passage:

... you will have that which you seek, namely $5+\sqrt{25-40}$ and $5-\sqrt{25-40}$, or $5+\sqrt{-15}$ and $5-\sqrt{-15}$. Putting aside the mental tortures involved, multiply $5+\sqrt{-15}$ and $5-\sqrt{-15}$, making 25 - (-15) which is +15. Hence this product is 40... So progresses arithmetic subtlety the end of which, as is said, is as refined as it is useless. p. 219-220

(As above, the notation here is anachronistic; but the translation I read modernized all Cardano’s notation for ease of reading.)

It is in this wildly speculative chapter that Cardano – for the only time in the book – suggests a problem posed in terms of negatives:

... If it be said, Divide 6 into two parts the product of which is 40, the problem is one of the sophistic negative... But if it is said, Divide 6 into two parts the product of which is -40, or divide -6 into two parts producing -40, in either case the problem will be one of the pure negative... and the parts will be those that have been given [10 and -4, or -10 and 4]. If it be said, Divide -6 into two parts the product of which is +24, the problem will be one of the sophistic negative. pp. 220-221

What am I getting at with all this? Well I can’t tell you what to think but I am left with a completely new sense of the natural contours of learning about negatives.

I taught Algebra I for a long time. My students entered the class having trouble both conceptually and computationally with negative numbers. I did my duty and explained their meaning and operation, along with lots of practice for the kiddies, early in the year. Having always been concerned with understanding, I looked for models of negatives that would support all the operations I wanted kids doing. I wanted the model to instantiate as much of the mathematical structure as possible. The school I taught at had a woodshop program, and I got them to build me a board with a flat surface with holes cut in it and wooden pucks to fill the holes, so that I could physically model 1 + -1 = 0 and people would physically see how a hole combined with a wooden puck to make a flat surface. Subtraction of negatives would become removing holes, and this clearly required adding pucks to the surface; thus subtraction of negatives is adding positives. The model required another layer of contrivance to support multiplication: I had to ask students to imagine standing upside down, on the other side of the surface, so the holes became pucks and the pucks could be imagined as holes; then 3*-4 could be 3 people with the normal point of view, each standing by 4 holes, while -3*4 was 3 upside down people each standing by what appeared to them as 4 pucks.

It didn’t work as the centerpiece of teaching about positives and negatives. The multiplication problems make the contrivance really obvious, but actually there’s a certain amount of contrivance even in how it models addition. If I combine some pucks and some holes, who says that the pucks need to fall into the holes? I made kids draw tons of pictures of the whole thing, which completely wore them out, and I don’t know how much it added to their understanding. Meanwhile, the model, as all models do, made problems bigger, clunkier. Subtracting -5 from -7 was no thing: just fill 5 holes. But subtracting (-5) from 1 was like a whole production. The kids either needed to create 5 holes by removing pucks from them (and retaining the pucks – why would you do either thing?) before adding 5 new ones to fill the wholes, or they needed to make the intensely abstract and not-adequately-justified leap that because subtracting -5 amounted to adding +5 when you were subtracting from a negative, the same thing should be true when subtracting from a positive. Retrospectively the fact that I asked my kids to make this leap of faith and told myself that I was actually helping them understand how math makes sense is kind of embarrassing.

But the thing is, as models go, I’ll stand behind this one as one of the better ones. I’ve seen cuter models for multiplication, e.g. on the wall of the classroom of my first former student to become a math teacher (yes I am now old enough for that to happen):
Do you LOVE to LOVE? You’re a LOVER.
Do you LOVE to HATE? You’re a HATER.
Do you HATE to LOVE? You’re a HATER.
Do you HATE to HATE? You’re a LOVER.
But none of these cuter models supports addition or subtraction as well, and sometimes it’s hard to see that they are even related to multiplication. Meanwhile, the only model I’ve ever seen, besides mine, that supports all four operations is the IMP curriculum‘s “hot and cold cubes.” And if you see the contrivance and unnaturalness in what I described above, “hot and cold cubes” is another level. Again, I think it’s kind of a brilliant model. But if you’ve ever tried to use it with low-skilled kids, you know how much production is involved in even getting them to imagine and buy into the scenario in the first place, let alone use all that machinery to solve problems.

It’s been a few years now that it’s seemed clear to me that the whole idea of teaching negatives through a particular model is not the way to go. People who use negatives effectively have gotten them down to a very slim abstract notion that supports all their operations and all their uses as representations of real things. (I would describe my own understanding with words like “opposite directionyness” – don’t laugh.) Teaching has to be aimed at this slim, efficient understanding as an end product. Forcing kids to engage with a whole clunky megilla of story and visual image every time they want to do a computation with negatives can’t possibly be the right path.

In more recent years I’ve found much more effective ways to teach negatives. I’ve been beginning by brainstorming with my students what negative numbers are actually, in the real world, used to represent. Not just debt, temperature and elevation. These aren’t enough. They capture the “below zeroiness” but not the “opposite directionyness,” since the positive direction is so fixed in each case. Also needed are examples of net change: gain or loss of money by a business; football yardage; etc. Furthermore, examples where negatives are used to specify direction in space or time: say uptown is positive; what would negative mean? What if east were positive? What if downtown were positive? If positive 3 means the space shuttle took off 3 seconds ago, what would -3 mean?

Using this conversation as groundwork has brought me much more success than the wooden board did, but there’s still something missing. It’s hard to find convincing examples familiar to kids that support multiplication, for one thing, except for the private tutoring student whose father was a stockbroker, because then short-selling a stock that goes down in price is (neg)(neg) = pos. But it’s more fundamental than that. I’ve still been starting from the question “what is a negative?” when the student’s only legitimate reason to believe negatives even exist is that school says so and her only legitimate reason to care is that she’ll be accountable for an answer.

This question puts the cart before the horse. A corollary of that amazing conversation with Avital Oliver I described last time is that when I teach a new idea I want to cause it to be needed, or at least cause its presence to be felt, cause students to become aware of it in the room with them, before it is ever named. So “what is a negative?” is not ultimately my desired opener for teaching about negatives.

What I’m left with after reading Cardano and Muhammad Ibn Musa is the beginning of an idea, modeled on the history of the concept itself, for what could take its place. So, here’s a curriculum brainstorm. It spans a lot of years and doesn’t fit in with anybody’s state frameworks, so I hope you’ll forgive the impracticality. I’m just fantasizing.

First, laying the groundwork (inspired by Ibn Musa): When you do arithmetic, how does subtracting something from the numbers affect the answer? How does 20 + 10 change if I subtract 3 from the 10? (To focus attention on the key point, what does the subtracted 3 do to the answer?) How does 20 – 10 change? How does 20*10 change? How does 20*10 change if I subtract 4 from the 20? How about if I subtract 4 from the 20 and 3 from the 10? What if I add 4 to the 20 and subtract 3 from the 10? The point is to engage the students in sorting out all these questions. (Why would they care about these questions? That’s a whole other thing but I don’t think a very hard one, and it will depend on the group of students – but I’m sure given any set of folks we can find a context to make these questions compelling.) Note that there is no “new kind of number” here. Some 3′s are subtracted, some added, that’s all. We very gently call their attention to the “subtracted 3″ as an object worth talking about, but they already know what we mean; there’s nothing new to learn. I think this sorting-out is going to attune students’ antennae to the frequency in the universe on which negative numbers live.

Much later, once negatives come into play, stay respectful of the fact that they make sense as answers more easily than they make sense as questions. What number could you add to 7 and get 4? (No number! Even if you add nothing, it’s still 7.) If you could add something, what would that thing be like? In other words, bring forth the idea of negativeness as the answer to questions. (Perhaps your earlier “subtracted 3″ will be what they come up with; perhaps not.) Do a lot and a lot and a lot of this, before ever asking anybody anything about negatives.

Later still, it will be time to develop equation solving intently. The way we do this in Algebra now, we build in the necessity for the methods to generalize to negative coefficients. Instead, start it earlier and use Muhammad Ibn Musa-typed problems. Let them develop techniques that feel most natural to them. (From lots of classroom experience, I can tell you that these will not be methods that generalize to negative coefficients.) Allow problems with negative solutions to creep in, but not negative coefficients. Negative numbers and their operations are becoming familiar, but still let the students do what’s comfortable in the realm of equation solving. Increase the sophistication of the equations; develop the solution of one of the three forms of the quadratic (what number can I multiply by itself, and then add 6 of itself, to get 40?). Pose problems in the other forms as well though. Finally, as a last act, lead them to the fact that allowing coefficients to be negative unifies all three cases of the quadratic into one and they can use a single method on all problems. How useful! Negatives are now official.

I would really love to do this with an out-of-school math circle of youngish kids or mathphobic adults. I need to get on that.

* * * * *

Two tidbits from these books that didn’t fit in with the main lines of thought above. There’s lots more where these came from but as usual I’ve already OD-ed so I have to draw the line somewhere.

a) Muhammad Ibn Musa gives a beautiful, though not rigorous, justification for the circle area formula that I’ve never seen before. He expresses the circle’s area as half its circumference times half its diameter. He explains that this is true because any regular polygon has an area equal to half its circumference times half the diameter of the inscribed circle. (Draw lines from the center to every vertex, and think about the areas of the triangles you get, to see that this is true.)

b) Cardano says something really darling about the solution of the cubic, that I just found delightful and have to share:

In our own days Scipione del Ferro of Bologna has solved the case of the cube and first power equal to a constant, a very elegant and admirable accomplishment. Since this art surpasses all human subtlety and the perspicuity of mortal talent and is a truly celestial gift and a very clear test of the capacity of men's minds, whoever applies himself to it will believe that there is nothing that he cannot understand. p. 8