Statistical Inference & Techniques

  • A song designed to assist in teaching the basics of Multi-Armed Bandits, which is a type of machine learning algorithm and is the foundation for many recommender systems. These algorithms spend some part of the time exploiting choices (arms) that they know are good while exploring new choices.  The song (music and lyrics) was written in 2021 by Cynthia Rudin from Duke University and was part of a set of three data science oriented songs that won the grand prize in the 2023 A-mu-sing competition.  The lyrics are full of double entendres so that the whole song has another meaning where the bandit could be someone who just takes advantage of other people! The composer mentions these examples of lines with important meanings:
    "explore/exploit" - the fundamental topic in MAB!
    "No regrets" - the job of the bandit is to minimize the regret throughout the game for choosing a suboptimal arm
    "I keep score" - I keep track of the regrets for all the turns in the game
    "without thinking too hard,"  - MAB algorithms typically don't require much computation
    "no context, there to use," - This particular bandit isn't a contextual bandit, it doesn't have feature vectors 
    "uncertainty drove this ride." - rewards are probabilistic
    "I always win my game"  - asymptotically the bandit always finds the best arm
    "help you, decide without the AB testing you might do" - Bandits are an alternative to massive AB testing of all pairs of arms
    "Never, keeping anyone, always looking around and around" - There's always some probability of exploration throughout the play of the bandit algorithm

    0
    No votes yet
  • A music video designed to assist in teaching the basics of Multi-Armed Bandits, which is a type of machine learning algorithm and is the foundation for many recommender systems. These algorithms spend some part of the time exploiting choices (arms) that they know are good while exploring new choices (think of an ad company choosing an advertisement they know is good, versus exploring how good a new advertisement is). The music and lyrics were written by Cynthia Rudin of Duke University and was one of three data Science songs that won the grand prize and first in the song category for the 2023 A-mu-sing competition.

    The lyrics are full of double entendres so that the whole song has another meaning where the bandit could be someone who just takes advantage of other people! The author provides these examples of some lines with important meanings:
    "explore/exploit" - the fundamental topic in MAB!
    "No regrets" - the job of the bandit is to minimize the regret throughout the game for choosing a suboptimal arm
    "I keep score" - I keep track of the regrets for all the turns in the game
    "without thinking too hard,"  - MAB algorithms typically don't require much computation
    "no context, there to use," - This particular bandit isn't a contextual bandit, it doesn't have feature vectors 
    "uncertainty drove this ride." - rewards are probabilistic
    "I always win my game"  - asymptotically the bandit always finds the best arm
    "help you, decide without the AB testing you might do" - Bandits are an alternative to massive AB testing of all pairs of arms
    "Never, keeping anyone, always looking around and around" - There's always some probability of exploration throughout the play of the bandit algorithm

    0
    No votes yet
  • This song is about overfitting, a central concept in machine learning. It is in the style of mountain music and, when listening,  one should think about someone staying up all night trying to get their algorithm to work, but it just won't stop overfitting! The music and lyrics are by Cynthia Rudin from Duke University and was one of three data science songs  by Dr. Rudin that won the grand prize and 1st place in the song category in the 2023 A-mu-sing competition.

    0
    No votes yet
  • This song is about the k-nearest neighbors algorithm in machine learning. This popular algorithm uses case-based reasoning to make a prediction for a current observation based on nearby observations. The music and lyrics were written by Cynthia Rudin from Duke University who was accompanied by  Dargan Frierson, from University of Washington in the audio recording. The song is one of three data science songs written by Cynthia Rodin that took the grand prize and first prize in the song category in the 2023 A-mu-sing competition.

    0
    No votes yet
  • A poem about type II errors in diagnostic testing using a diabetes test context.  The poem was written by Lawrence Lesser from The University of Texas at El Paso and received an honorable mention in the non-song category of the 2023 A-mu-sing Competition.  The author also provided the following outline for a lesson plan:

    Some sample questions (one per stanza) students can explore or discuss
    as a practical application of statistics to a prevalent disease
    that likely affects (or will) a friend or relative of almost everyone.

    First stanza: Look up history of diabetes prevalence to explore questions such as: Is “1 in 10” roughly accurate for the United States and how does that compare to other countries? Was the 2003 lowering of the threshold for a prediabetes diagnosis based on updated medical understanding of the disease or more of a policy decision to give an “earlier warning”?

    Second stanza: How does a hypothesis testing framework apply to an oral glucose tolerance test (OGTT)? It’s warned that a false positive is possible if the patient did not eat at least 150g of carbohydrates for each of the 3 days before the test. (This is likely what happened to the poet, whose diagnosis was overturned just 2 months later by an endocrinologist.)

    Third stanza: Given the usual trend that the null hypothesis usually means no effect, no difference, nothing special, explain whether it seems consistent that a normality test such as Anderson-Darling would let normality be the null. When might it make sense for a doctor to view having a particular disease as the null hypothesis (and what would be the Type I and Type II errors?)?

    Fourth stanza: Explain how having only a few individual values each day from a blood glucose meter (BGM) risks missing dangerously high variability of glucose (students can Google how high variability can be a risk factor for hypoglycemia and diabetes complications). Discuss how output from a Continuous Glucose Monitor (CGM) that records values every 5 minutes can be used to check, for example, that the coefficient of variation is sufficiently low (e.g., < 36%) and that “time in range” (e.g., 70-180 or 70-140 mg/dL) is sufficiently high. Example output is on page S86 of https://diabetesjournals.org/care/issue/45/Supplement_1.

    Fifth stanza: Have students look up current FDA guidelines on how accurate over-the-counter BGM readings need to be (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7753858/) and have them connect this to margin of error, confidence intervals, etc.

    Sixth stanza: Find online the diabetes “plate method” of taking a circular plate (9” in diameter) for a meal where half of the plate would have non-starchy vegetables, a quarter having lean protein, and a quarter with carbohydrate foods such as whole grains. How do this breakdown and total quantity compare to a pie chart of a typical meal that you (or typical college undergraduates) eat?

    0
    No votes yet
  • A cartoon to spark a discussion about the normal equations in the matrix approach to linear models.  The cartoon was created by Kylie Lynch, a student at the University of Virginia.  The cartoon won first place in the non-song categories of the 2023 A-mu-sing competition.

    0
    No votes yet
  • A song presenting common hypothesis tests and the steps in doing them with lyrics by Jamie Tan Xin Yee, Joelyn Chong, Deston Tang, Christine Sia, Nellie Lee, Josiah Tan, and Lee Yi Yuan who were all students at Singapore Management University taught by Rosie Ching Ju Mae.  May be sung to the tune of "LOVE" by Bert Kaempfert and Milt Gabler and recorded by Nat King Cole in 1965.  The vocals and guitar soundtrack on the audio were done by Joelyn. Editing of the soundtrack was done by the entire student team.The song placed tied for second in the 2023 A-mu-sing competition (see associated publicity).

    0
    No votes yet
  • A song about the value of ANCOVA in adjusting for a covariate. The lyrics were written by Greg Crowther (Everett Community College) and Leila Zelnick (University of Washington) and may be sung to the tune of "You're the One That I Want" by John Farrar and performed by Olivia Newton-John and John Travolta in the movie version of Grease. This parody was performed at the UW Division of Nephrology Grand Rounds on March 18, 2022 and placed tied for second in the 2023 A-mu-sing competition. Backing track purchased from Karaoke-Version.com

    0
    No votes yet
  • A song to discuss how a confidence interval made for a population parameter will be biased if the sample is biased (e.g. starting with a random sample of n=100 but then having individuals drop out one at a time based on a non-ignorable reason).  The song was written IN MARCH 2019 by Lawrence Lesser, The University of Texas at El Paso, and Dennis Pearl, Penn State University, using the mid-20th century recursive folk song "99 Bottles of Beer." The idea for the song came from an article by Donald Byrd of University of Indiana in the September 2010 issue of Math Horizons where he suggested using the song for various learning objectives in Mathematics Education.

    0
    No votes yet
  • This limerick was written in April 2021 by Larry Lesser of The University of Texas at El Paso to be used as a vehicle for​ discussing the issues and pitfalls of using .05 as a bright-line threshold for declaring statistical significance, in light of ASA recommendations.  The poe was also published in the June 2021 AmStat News.

    0
    No votes yet

Pages