Chance News 110: Difference between revisions
Line 81: | Line 81: | ||
:by Natalie Wolchover, ''Quanta'', 28 March 2017 | :by Natalie Wolchover, ''Quanta'', 28 March 2017 | ||
Thomas Royen, a retired German statistics professor, has published a proof for the Gaussian correlation inequality, a result originally conjectured in the 1950s. The ''Quanta'' article gives an engaging description of Royen's discovery and the path to getting the result published. There is also a nice illustration | Thomas Royen, a retired German statistics professor, has published a proof for the Gaussian correlation inequality, a result originally conjectured in the 1950s. The ''Quanta'' article gives an engaging description of Royen's discovery and the path to getting the result published. There is also a nice illustration a simple case with a bivariate normal distribution | ||
<center>[[File:GCI.png | 500px]]</center> | <center>[[File:GCI.png | 500px]]</center> | ||
The general result | The general result holds for multivariate normal distribution and the intersection of two symmetric convex sets. In the example, the sets are the two infinite strips (light shading) that intersect in a rectangle (dark shading). | ||
Royen's original paper can be found [https://arxiv.org/pdf/1408.1028.pdf here]. It is short but still quite technical. [https://almostsure.wordpress.com/2017/05/22/the-gaussian-correlation-inequality/ Here] is a nice blogpost that provides useful additional context for the result. | Royen's original paper can be found [https://arxiv.org/pdf/1408.1028.pdf here]. It is short but still quite technical. [https://almostsure.wordpress.com/2017/05/22/the-gaussian-correlation-inequality/ Here] is a nice blogpost that provides useful additional context for the result. |
Revision as of 15:31, 22 June 2017
January 1, 2017 to June 30, 2017
Quotations
“When a coincidence seems amazing, that’s because the human mind isn’t wired to naturally comprehend probability and statistics.”
"What Thursday revealed is that polls struggle to capture the crucial nuances of politics today: there’s no longer a single story in Britain – and averages are dead."
"They [new planet candidates] are fascinating, but Kepler’s mission is not to pinpoint the next tourist destination — it is to find out on average how far away such places are. Or, as Dr. Batalha said, We’re not stamp collecting, we’re doing statistics.”
Forsooth
“[Richard] Florida finds that this population [service workers] currently splits its vote evenly between the two parties — no statistical significance for either Trump or Clinton. ”
Statistical artifacts
Artifacts (from XKCD)
Suggested by Michelle Peterson
Crowd size
From Lincoln to Obama, how crowds at the capitol have been counted
by Tim Wallace, New York Times, 18 January 2017
This article anticipates the controversy that ensued from Trump's claims about the size of the crowd for his inauguration.
There is a nice historical retrospective here, starting with Lincoln's inauguration. Period photographs have now been studied using tools like Google Earth to give an estimate of 7350 attendees.
Controversy over crowd estimates is also nothing new. It's now been more than 20 years since Louis Farrakhan's Million Man March in 1995. His supporters threatened to sue the National Park Service for giving an estimate of only 400,000. In the aftermath, the Park Service stopped providing official estimates.
In Crowd estimates from Chance News 68, we described Glenn Beck's 2010 rally and event held in response by John Stewart and Stephen Colbert. A Washington Post story at the time gave an annotated graphic of the satellite photo analysis of Barack Obama's 2009 inaugural. The present NYT article notes that satellite analyses have become more common since that time.
The NYT also references a Scientifc American discussion, The simple math behind crunching the sizes of crowds. As their "Math Dude "Jason Marshall, says "I feel that it’s important to note that estimating crowd sizes is a solved problem that’s actually pretty straightforward."
Of course, when the estimate becomes a proxy for political support, things are not so straightforward. The 2017 inaugural has given us the phrase alternative facts!
Still thinking about the election
Margaret Cibes sent a link to the following:
- The 2016 national polls are looking less wrong after final election tallies
- by Scott Clement, Washington Post, 6 February 2017
Gender stereotypes
Nick Horton sent the following to the Isolated Statisticians list-serv:
Gender stereotypes about intellectual ability emerge early and influence children’s interests
by Lin Bian, Sarah-Jane Leslie, Andrei Cimpian, Science, 27 January 2017
The full article requires a subscription. From the summary on the web page we read:
The distribution of women and men across academic disciplines seems to be affected by perceptions of intellectual brilliance. Bian et al. studied young children to assess when those differential perceptions emerge. At age 5, children seemed not to differentiate between boys and girls in expectations of “really, really smart”—childhood's version of adult brilliance. But by age 6, girls were prepared to lump more boys into the “really, really smart” category and to steer themselves away from games intended for the “really, really smart.”
Nick recommended this study for use in class for a number of reasons, including the fact that available for download from the Open Science Framework, and the analyses are quite accessible with tools such as the t-test and the chi-squared test.
Here is a newspaper story about the study:
- Why young girls don’t think they are smart snough
- by Andrei Cimpian and Sarah-Jane Leslie, New York Times, 26 January 2017
Hans Rosling
Margaret Cibes sent a link to the following:
- Hans Rosling, Swedish doctor and pop-star statistician, dies at 68
- by Sam Roberts, New York Times, 9 February 2017
With his famous Gapminder presentations, Rosling invited us to "Pour the sparkling fresh numbers into your eyes and upgrade your worldview."
Gaussian correlation inequality
Pete Schumer sent a link to the following:
- A long-sought proof, found and almost lost
- by Natalie Wolchover, Quanta, 28 March 2017
Thomas Royen, a retired German statistics professor, has published a proof for the Gaussian correlation inequality, a result originally conjectured in the 1950s. The Quanta article gives an engaging description of Royen's discovery and the path to getting the result published. There is also a nice illustration a simple case with a bivariate normal distribution
The general result holds for multivariate normal distribution and the intersection of two symmetric convex sets. In the example, the sets are the two infinite strips (light shading) that intersect in a rectangle (dark shading).
Royen's original paper can be found here. It is short but still quite technical. Here is a nice blogpost that provides useful additional context for the result.
Spotting bad statistics
Priscilla Bremser recommended the following TED talk:
- 3 ways to spot a bad statistic, by Mona Chalabi
Chalabi is data editor of the Guardian US. In this short (under 12 minutes) and very entertaining talk, she describes describes the problem society faces when policymakers can't get agreement on baseline statistical facts.
In between blindly accepting or reflexively denying any data-based claim, she describes three points to remember when evaluating statistics.
- Can you see uncertainty?
- Can I see myself in the data?
- How was the data collected?
Regarding uncertainty, she discusses the reasons that opinion polling has become more difficult, and wonders why the probability of a Hillary Clinton win was reported "with decimal places." On seeing yourself in the data, she notes that reporting only averages frustrates people who don't see their own experience represented. There is a very memorable quote in the section on data collection, where she observes that for one cosmetics commercial L'Oreal was happy to talk to just 48 women to "prove" that their product worked. She says:
Private companies don't have a huge interest in getting the numbers right, they just need the right numbers.
Flint water crisis
Q&A: Using Google search data to study public interest in the Flint water crisis
by John Gramlick, Pew Research Center, 27 April 2017.
The murky tale of Flint's deceptive water data
by Robert Langkjær-Bain, Significance, 5 April 2017
What went wrong In Flint
by Anna Maria Barry-Jester, FiveThirtyEight, 26 January 2016
Same stats (think Anscombe)
Jeff Witmer sent the following link to the Isolated Statisticians list.
- Same stats, different graphs: Generating datasets with varied appearance and identical statistics through simulated annealing
- by Justin Matejka, ACM SIGCHI Conference on Human Factors in Computing Systems
Observing that it is not known how Frank Anscombe went about creating his famous quartet of scatterplots, the authors present the results of their simulated annealing technique to produce some striking visualizations. You'll want to see the Datasaurus Dozen, which even has an R data package.
The fivethirtyeight package for R
fivethirtyeight Package
by Albert Y. Kim, Chester Ismay, and Jennifer Chunn, announced 13 March 2017
The authors have developed a package for pedagogical use that provides data and R code corresponding to analyses presented at FiveThirtyEight.com. Their goal is to allow students to get into the data with minimal overhead.
This should be a very valuable resource for teaching about statistics in the news! Here is a quick illustration of how to use the package. More details are available in the package vignette linked above.
Debate over white mortality
Stop saying white mortality Is rising
by Jonathan Auerbach and Andrew Gelman, Slate, 28 March 2017
The forces driving middle-aged white people's 'deaths Of despair'
by Jessica Boddy, NPR Morning Edition, 23 March 2017
Interracial marriage
Peter Doyle sent a link to this chart from the Economist:
- Daily chart: Interracial marriages are rising in America
- Economist, 12 June 2017
Quoting from the article, one reader commented:
"Of the roughly 400,000 interracial weddings in 2015, 82% involved a white spouse, even though whites account for just 65% of America’s adult population. " If you lump the population into just two groups A and B, 100% of intergroup marriages will involve a spouse from group A, no matter what fraction of the population belongs to group A.
Exercise: 2015 census data is available by googling "us census quickfacts". While the categories don't precisely match those in this piece, you can use this data to get a rough estimate the fraction of interracial weddings that would involve a white spouse under random pairing. What do you get? Is your answer more or less than 82%?
Peter notes that he got just over 82%. Here is his solution (using Mathematica):