|
|
|||
| My National Review Online "Diary" column for September 2002 had a note on a mathematical topic: Chebyshev's bias. I include the note below, as it appeared. The note ends with me saying: "Michael Rubinstein and Peter Sarnak proved in 1994 that the violations have nonzero density, a fascinating and counter-intuitive result..." Several readers felt this was a cliff-hanger. What does "nonzero density" mean? they wanted to know. So after the original piece, I've appended a note on logarithmic density. Original piece from my NRO September diary: From the cutting-room floor.
I have just lost a minor battle with the editors of my book about
math. I wanted to add a
6-page appendix on Chebyshev’s Bias.
They: “No!
The darn book is already too long!
No! No!
NO!!!” OK, fine. Chebyshev’s
Bias deserves to be much better known than it is, though, so to get the
word out, I’m going to blog it, right here.
This is absolutely the only conservative web site where you
get serious math. Write down the first few
prime numbers: 2
3 5
7 11 13
17 19
23 29 31
37 41
43 47 53
59 ... Divide each one by 4 and
note the remainder: 2
3 1
3 3 1
1 3
3 1 3
1 1
3 3 1
3 ... Once you get past p
= 2, the remainder must be either 1 or 3.
Which one is "ahead" at any point? Denoting the answer by 1, 3, or T (for "tie"), the
answer is: T
3 T
3 3 3
T 3
3 3 3
3 T
3 3 3
3 ... (That "T" in the seventh position, for example, means that up to that point, there were three remainder-1 primes and three remainder-3 primes — three of each — a tie. The "3" in the eighth position means that at this point — the eighth prime being 19, which is a remainder-3 — the remainder-3's have edged ahead.) That's a Chebyshev bias.
Do the 1’s ever take the lead?
Yes, they do; but not
until p = 26,861. And
that's nothing: if you divide
by 3 instead of 4, the remainder (once you get past p = 3) must be
either 1 or 2. The bias is to
2; and that bias doesn't get violated until p =
608,981,813,029 ! (This
result wasn't found until 1978, by Carter Bays and Richard Hudson.) If you divide by 10 instead
of by 4 or 3, you will just get the last digit of your prime number.
(659 divided by 10 leaves remainder 9.)
Once you get past p = 2 and p = 5, every prime number
must end in 1, 3, 7, or 9. Is
there a Chebyshev bias? I ran
through all the primes up to p = 100,711,433, which is as many as I
keep handy on disk. That’s
the first 5.8 million primes. Threes
and sevens were in the lead roughly 2.8 million times each, ones had
113,922 leads, nines had 357, and there were 26,776 ties.
Notice, by the way, that these “who’s ahead” biases arise
from very small margins. The
actual counts for ones, threes, sevens and nines as last digit in those
first 5.8 million primes were: 1,449,824
ones, 1,450,185 threes, 1,450,153
sevens, and 1,449,836 nines
— a variation of only 361, a niggardly 0.025 percent.
The situation resembles those “first past the post” election
systems, where a nationwide majority of 51 percent can give your party a
landslide in terms of parliamentary seats;
or a foot race with very well-matched runners, in which one runner
manages to stay slightly ahead for most of the race, and gets all the
glory. The
English mathematician J.E. Littlewood proved in 1914 that any Chebyshev
bias gets violated infinitely often, if you go far enough. Michael Rubinstein and Peter Sarnak proved in 1994 that the
violations have nonzero density, a fascinating and counter-intuitive
result... But that’s about
as much math as I can get away with on NRO.
You’ll have to read the amazing Rubinstein-Sarnak result for
yourself: “Chebyshev’s
Bias,” in Experimental Mathematics, Vol.3, 1994 (pp. 173-197).
Additional note on logarithmic density: Logarithmic density. Consider
the ordinary counting numbers 1, 2, 3, 4, ...
Mathematicians call them “the natural numbers.”
Now, it often happens in math that we find ourselves dealing with
some sub-set of the natural numbers.
Examples: The
even numbers: 2, 4, 6, 8, 10,
12, 14, ... The
powers of two: 1, 2, 4, 8,
16, 32, 64, ... The
perfect squares: 1, 4, 9, 16,
25, 36, 49, ... The
primes: 2, 3, 5, 7, 11, 13,
17, ... The
Fibonacci numbers: 1, 2, 3, 5, 8, 13, 21, ... (from 3 on, each is the sum of the
previous two). Numbers
with no square factor: 1, 2,
3, 5, 6, 7, 10, ... Numbers
that leave remainder two when you divide by seven:
9, 16, 23, 30, 37, 44, ... When
we are dealing with one of these sub-sets, an obvious question to ask is:
How big is this sub-set, relative to all the natural numbers?
What proportion of the natural numbers belong to it?
A popular way to ask the same question is:
If I pick a number at random from all the natural numbers, what’s
the probability that the number I picked belongs to this sub-set? The mathematical term of art is:
What’s the sub-set's density? On the one hand this is a
simple question, to which the answer is often perfectly plain.
For the first and last of the sub-sets above, the answers are of
course: one-half,
one-seventh. On the other
hand, there is an infinity of natural numbers, and infinity is a
notoriously difficult thing to handle arithmetically. How would you go about proving that the density of the
even numbers is one-half? “Logarithmic density” is one approach to this issue. (There are others.) It rests on the fact that if you add up the reciprocals of all the natural numbers up to N, the answer is approximately log(N), and the approximation gets better, without limit, as N gets bigger. This is the natural logarithm here, by the way — it appears on many pocket calculators as “ln,” but I prefer “log.” For example: 1 + 1/2+ 1/3 + 1/4 + 1/5 +
... + 1/1000 = 7.48547 while log(1000) is 6.90776. There is only a 7.72 percent error in the approximation. If instead of a thousand you sum to a million, then a billion, then a trillion, the percent errors are 4.01, 2.71, and 2.05. As N gets bigger and bigger, the percent error dwindles to zero. (Though it does so awfully slowly. The error doesn't drop below one percent until N = 6,568,651,717,465,645,315,850,822.) Mathematicians have a special symbol for this phenomenon, the
“twiddle” symbol: 1 + 1/2 + 1/3 + 1/4 + 1/5 +
... + 1/N ~ log(N) Now, pretty obviously, you
could just as well write this as: (1 + 1/2 + 1/3 + 1/4 + 1/5
+ ... + 1/N) / log(N) ~ 1 This gives a way to measure how “dense” a sub-set is, among the natural numbers as a whole. What you do is, take all the reciprocals of the numbers in your sub-set up to some large number N, add ‘em up, divide the result by log(N), and see what you get — more precisely, what your answer twiddles to, as N becomes indefinitely large. Your answer can't possibly be greater than 1, because 1 is the answer when the "sub-set" is all the natural numbers! If your sub-set was the even numbers, you get an answer that
twiddles to one-half. (I
mean, for bigger and bigger N, it gets closer and closer to one-half.)
That’s what you would expect.
For the last of those sub-sets I listed above, the answer is
one-seventh, which again makes sense. All but one of the other
sub-sets I listed above have logarithmic density zero.
Yes, even the primes — a fact expressed very elegantly in Hardy
and Wright’s Theory of Numbers by the sentence:
“Almost all numbers are composite.”
(“Composite” is the opposite of “prime.”)
The exception is the last but one in my list.
The density of “square-free” numbers is about 0.608 — 6
divided by the square of pi. What
on earth is pi — the ratio of a circle’s circumference to its diameter
— doing in an arithmetic problem? Don’t
ask. The wonderful and amazing
result got by Rubinstein & Sarnak is this:
Take a Chebyshev bias; for illustration, I'll take the bias to remainder 3 when you divide
a prime by 4. For each
natural number N in turn, check to see whether the primes up to N satisfy
the bias. That is, check to
see whether or not remainder-3 primes up to N outnumber remainder-1 primes
up to N (inclusive, in both cases). If
they don’t, put N into a special sub-set:
the “Chebyshev bias violation” sub-set.
Now, this sub-set would seem to be pretty darn sparse.
The first violation, as I mentioned in the original piece, doesn’t occur
until 26,861. The first
really big block of violations is from 616,877 to 617,026.
Yet the stunning fact — Rubinstein & Sarnak proved it
mathematically — is that this subset has nonzero logarithmic density.
Its density is, in fact, 0.0041.
In other words, taking the natural numbers as a whole, around one
number in 244 is a Chebyshev violator (for this particular bias).
This is the kind of result
mathematicians love. It is
counter-intuitive — deeply non-obvious.
The math you need to get it is fairly heavy, but not that hard to
follow if you concentrate on it. Best
of all, it is, so far as I know, perfectly useless. If you want to try your hand at this, here is a sub-set of the natural numbers: All those that have a “3” in their ordinary (that is, decimal) form. This sub-set begins: 3, 13, 23, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 53, .... What is the logarithmic density of this sub-set? |
||||