Computational Thinking and Inferential Thinking: Foundations of Data Science

Michael Jordan

Michael Jordan, University of California, Berkeley

Abstract: In the Fall of 2015, my colleagues and I have offered a freshman-level undergraduate course at Berkeley that gives "Data Science" substance as a fine-grained blend of rich intellectual traditions in computer science and statistics.  Computer science is more than just programming; it is the creation of appropriate abstractions to express computational structures and the development of algorithms that operate on those abstractions.  Similarly, statistics is more than just collections of estimators and tests; it is the interplay of general notions of sampling, models, distributions and decision-making.  Our course is based on the idea that these styles of thinking support each other.  In teaching statistical inference, rather than making use of formulas and asymptotic justifications, we teach the computing concepts needed to transform and visualize data and to implement resampling-based inferential procedures.  Students learn to program (in Python), learning the language gradually in the service of increasingly sophisticated data analysis problems.  Moreover, students work throughout with real data sets and learn to draw substantive conclusions.  This is achieved in part by a set of two-unit "connector courses"in various disciplines that augment and ground the material taught in the core class.We discuss some of the lessons learned, emphasizing the main lesson---this has been a strikingly successful way to introduce university-level students to statistics.

Bio: Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego.  He was a professor at MIT from 1988 to 1998. His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics.  Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences.  He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics.  He received the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009.  He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM.


(Tip: click the fullscreen control)

Having trouble viewing? Try: Download (.mp4)

(Tip: right-click and choose "Save As...")

Teaching with Simulation-Based Inference