Chance News 7
Sept 26 2005 to Oct 15 2005
While writing my book [Stochastic Processes] I had an argument with Feller [Introuction to Probability Theory and its Applications]. He asserted that everyone said "random variable" and I asserted that everyone said "chance variable." We obviously had to use the same name in our books, so we decided the issue by a stochastic procedure. That is, we tossed for it and he won.
Peter Winkler suggested our first forsooth.
Texas beats Ohio State in their opening game
of the season (Saturday Sept 10 2002). The sportscasters (legendary Brent Musburger on play-by-play or Gary Danielson on analysis) observed that of the 14 teams who have previously played in the championship game (at the end of each season) 5 have suffered an earlier defeat. "Thus," they conclude, "Ohio State can still make it to the championship game, but their chances are now less
What is wrong with this?
Here are two forsooths from a recent issue of RSS NEWS
'Big ticket quiz' at the start of Wimbledon:
Q. How many punnets (a small light basket or other container for fruit or vegetables) of strawberries are eaten each day during the Wimbledone tournament?
Is it (a) over 8,000, (b) over 9,000 or (c) over 10,000?
20 June 2005
Waiting time for foot surgery down by 500%
5 July 2005
Fortune's Formula: Wanna Bet?
New York Times Book Section, September 25, 2005
This must be the kind of review that every Science writer dreams of. Pogue ends his review with:
"Fortune's Formula" may be the world's first history book, gambling primer, mathematics text, economics manual, personal finance guide and joke book in a single volume. Poundstone comes across like the best college professor you ever had, someone who can turn almost any technical topic into an entertaining and zesty lecture. But every now and then, you can't help wishing there were some teaching assistants on hand to help.
The author William Poundstone is a science writer who has written a number of very successful science books. His book "Prisoner's dilemma: John von Neumann, game theory and the puzzle of the bomb" was written in the style of this book. Indeed Helen Joyce, in her review of this book in Plus Magazine writes:
This book is a curious mixture of biography, history and mathematics, all neatly packaged into an entertaining and enlightening read.
Pounstone describes himself as a visual artist who does books as a "day job.". You can learn about his art work here.
Fortune's Formula is primarily the story of Edward Thorp, Claude Shannon, and John Kelly and their attempt to use mathematics to make money gambling in casinos and on the stock market. None of these did their graduate work in mathematics. Thorp and Kelly got their Phd's in physics and Shannon in Genetics.
In the spring of 1955 while a graduate student at UCLA Thorp joined a discussion on the possiblity of making money from roulette. Thorp suggested that they could taking advantage of the fact that bets are still accepted for a few second after the croupier releases the ball and in these seconds, he could estimate what part of the wheel the ball would stop.
Thorp did not pursue this and in 1959 became an instructor in math at M.I.T. Here he became interested in blackjack and developed his famous card counting method for wining at blackjack. He decided to publish his method in the most prestigious journal he could find and settled on The Proceedings of the National Academy of Sciences. For this he needed to have a member of the Academy submit his paper. The only member in the math department was Shannon so he had to persuade him of the importance of his paper. Shannon not only agreed but in the process became fascinated by Thorp's idea for beating roulette. He agreed to help Thorp carry this out. To be continued.
Which foods prevent cancer?
Which of these foods will stop cancer? (Not so fast)
New York Times, 27 September 2005, Sect. F, p. 1
Among other examples, the article includes a data graphic on purported benefits of dietary fiber in preventing colorectal cancer. Early observational studies indicated an association, but subsequent randomized experiments found no effect.
More to follow.
Slices of risk and the broken heart concept
How a Formula Ignited Market That Burned Some Big Investors,
Mark Whitehouse, The Wall Street Journal, September 12, 2005.
This on-line article relates how a statistician, David Li, unknown outside a small coterie of finance theorists, helped change the world of investing.
The article focuses on a event last May when General Motors Corp's debt was downgraded to junk status, causing turmoil in some financial markets. The article gives a nice summary of the underlying financial instruments known as credit derivatives - investment contracts structured so their value depends on the behavior of some other thing or event - with exotic names like collateralized debt obligations and credit-default swaps.
The critical step is to estimate the likelihood that many of the companies in a pool of companies would go bust at once. For instance, if the companies were all in closely related industries, such as auto-parts suppliers, they might fall like dominoes after a catastrophic event. Such a pool would have a 'high default correlation'.
In 1997, nobody knew how to calculate default correlations with any precision. Mr. Li's solution drew inspiration from a concept in actuarial science known as the broken heart syndrome - people tend to die faster after the death of a beloved spouse. Some of his colleagues from academia were working on a way to predict this death correlation, something quite useful to companies that sell life insurance and joint annuities. He says:
Suddenly I thought that the problem I was trying to solve was exactly like the problem these guys were trying to solve,. Default is like the death of a company, so we should model this the same way we model human life."
This gave him the idea of using copulas, mathematical functions the colleagues had begun applying to actuarial science. Copulas help predict the likelihood of various events occurring when those events depend to some extent on one another. Until the events last May of this year, one of the most popular copulas for bond pools was the Gaussian copula, named after Carl Friedrich Gauss, a 19th-century German statistician.
- The on-line article gives more details about what went wrong in the financial markets in May and the search for a more appropriate copula to capture better the broken heart syndrome between companies.
- Wikipedia is a very worthwhile on-line resource for definitions of technical words, such as copula.
Submitted by John Gavin.
Learning to speak via statistics and graph theory
A language-learning robot may sound like science fiction but new software, developed by Cornell University psychology professor Shimon Edelman, with colleagues Zach Solan, David Horn and Eytan Ruppin from Tel Aviv University in Israel, is well on the way to constructing a computer program that can teach itself languages and make up its own sentences, the developers' claim.
Unlike previous attempts at developing computer algorithms for language learning - "Automatic Distillation of Structure," or "ADIOS" for short - discovers complex patterns in raw text by repeatedly aligning sentences and looking for overlapping parts. Once it has derived a language's rules of grammar, it can then produce sentences of its own, simply from blocks of text in that language.
It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, on coding regions in DNA sequences and on protein data correlating sequence with function.
Adios relies on a statistical method for pattern extraction and on structured generalisations - the two processes that have been implicated in language acquisition. Our experiments show that Adios can acquire intricate structures from raw data including transcripts of parents' speech directed at two- or three-year-olds. This may eventually help researchers understand how children, who learn language in a similar item-by-item fashion, and with little supervision, eventually master the full complexity of their native tongue.
Plus Magazine's website offers a more accessible explanation, with graphs:
The ADIOS algorithm is based on statistical and algebraic methods performed on one of the most basic and versatile objects of mathematics - the graph. Given a text, the program loads it as a graph by representing each word by a node, or vertex, and each sentence by a sequence of nodes connected by lines. A string of words in the text is now represented by a path in the graph.
Next it performs a statistical analysis to see which paths, or strings of words, occur unusually often. It then decides that those that appear most frequently - called "significant patterns" - can safely be regarded as a single unit and replaces the set of vertices in each of these patterns by a single vertex, thus creating a new, generalised, graph.
Finally, the program looks for paths in the graph which just differ by one vertex. These stand for parts of sentences that just differ by one word (or compound of words) like "the cat is hungry" and "the dog is hungry". Performing another statistical test on the frequency of these paths, the program identifies classes of vertices, or words, that can be regarded as interchangeable, or equivalent. The sentence involved is legitimate no matter which of the words in the class - in our example "cat" or "dog" - you put in.
This last step is then repeated recursively.
The website finishes with some reassuring words
All this doesn't mean, of course, that the program actually "understands" what it's saying. It simply knows how to have a good go at piecing together fragments of sentences it has identified, in the hope that they are grammatically correct. So if, like me, you're prone to swearing at your computer, you can safely continue to do so: it won't answer back for a long while yet.
The ADIOS homepage offers an overview and more detailed description of the program.
Submitted by John Gavin.