## Professors

- David Blei (with Computer Science)
- Richard R. Davis
- Victor H. de la Peña
- Andrew Gelman (with Political Science)
- Ioannis Karatzas (with Mathematics)
- Jingchen Liu
- Shaw-Hwa Lo
- David Madigan
- Marcel Nutz (with Mathematics)
- Liam Paninski
- Philip Protter
- Daniel Rabinowitz
- Bodhisattva Sen
- Michael Sobel
- Simon Tavaré (with Biological Sciences)
- Zhiliang Ying
- Ming Yuan
- Tian Zheng (Chair)

## Associate Professors

- John Cunningham
- Samory Kpotufe
- Arian Maleki
- Sumit Mukherjee

## Assistant Professors

- Cynthia Rush
- Anne van Delft

**Term Assistant Professors**

- Marco Avella
- Carsten Chong
- Haoran Li
- Xiaofei Shi
- Thibault Vatter
- Johannes Wiesel

## Adjunct Faculty

- Demissie Alemayehu
- Flavio Bartmann
- Mark Brown
- Guy Cohen
- Regina Dolgoarshinnykh
- Anthony Donoghue
- Hammou El Barmi
- Tat Sang Fung
- Xiaofu He
- Margaret Holen
- Irene Hueter
- Ying Liu
- Ka-Yi Ng
- Ha Nguyen
- Cristian Pasarica
- David Rios
- Ori Shental
- Haiyuan Wang
- Larry Wright
- Rongning Wu

## Lecturers in Discipline

- Banu Baydil
- Wayne Lee
- Ronald Neath
- Joyce Robbins
- Gabriel Young

## Major in Statistics

*The requirements for this program were modified in March 2016. Students who declared this program before this date should contact the director of undergraduate studies for the department in order to confirm their options for major requirements.*

The major should be planned with the director of undergraduate studies. Courses taken for a grade of Pass/D/Fail, or in which the grade of D has been received, do not count toward the major. The requirements for the major are as follows:

Code | Title | Points |
---|---|---|

Mathematics and Computer Science Prerequisites | ||

MATH UN1101 | CALCULUS I | |

MATH UN1102 | Calculus II | |

MATH UN1201 | Calculus III | |

MATH UN2010 | Linear Algebra | |

One of the following five courses | ||

Honors Introduction to Computer Science | ||

Introduction to Computing for Engineers and Applied Scientists | ||

Introduction to Computer Science and Programming in MATLAB | ||

Applied Statistical Computing | ||

Introduction to Computer Science and Programming in Java | ||

Core courses in probability and statistics | ||

STAT UN1201 | Calculus-Based Introduction to Statistics | |

STAT GU4203 | PROBABILITY THEORY | |

STAT GU4204 | Statistical Inference | |

STAT GU4205 | Linear Regression Models | |

STAT GU4206 | Statistical Computing and Introduction to Data Science | |

STAT GU4207 | Elementary Stochastic Processes | |

Three approved electives in statistics or, with permission, a cognate field. |

- Students preparing for a career in actuarial science are encouraged to replace STAT GU4205 Linear Regression Models with STAT GU4282 Linear Regression and Time Series Methods, and should take as one of their electives STAT GU4281 Theory of Interest.
- Students preparing for graduate study in statistics are encouraged to replace two electives with MATH GU4061 INTRO MODERN ANALYSIS I and MATH GU4062 Introduction To Modern Analysis II .

**Introductory Courses**

Students interested in statistical concepts, but who do not anticipate undertaking statistical analyses, should take STAT UN1001 Introduction to Statistical Reasoning. Students seeking an introduction to applied statistics or preparing for the concentration should take STAT UN1101 Introduction to Statistics (without calculus). Students seeking a foundation for further study of probability theory and statistical theory and methods should take STAT UN1201 Calculus-based Introduction to Statistics. Students seeking a one-semester calculus-based survey should take STAT GU4001 Introduction to Probability and Statistics. The undergraduate seminar STAT UN1202 features faculty lectures prepared with undergraduates in mind; students may attend without registering.

**STAT UN1001 Introduction to Statistical Reasoning.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement, BC: Fulfillment of General Education Requirement: Quantitative and Deductive Reasoning (QUA).

A friendly introduction to statistical concepts and reasoning with emphasis on developing statistical intuition rather than on mathematical rigor. Topics include design of experiments, descriptive statistics, correlation and regression, probability, chance variability, sampling, chance models, and tests of significance.

Fall 2020: STAT UN1001 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 1001 | 001/12833 | T Th 10:10am - 11:25am Online Only |
Guy Cohen | 3 | 150/150 |

**STAT UN1010 Statistical Thinking for Data Science with Python Labs.** *4 points*.

CC/GS: Partial Fulfillment of Science Requirement

The advent of large scale data collection and the computer power to analyze the data has led to the emergence of a new discipline known as Data Science. Data Scientists in all sectors analyze data to derive business insights, find solutions to societal challenges, and predict outcomes with potentially high impact. The goal of this course is to provide the student with a rigorous understanding of the statistical thinking behind the fundamental techniques of statistical analysis used by data scientists. The student will learn how to apply these techniques to data, understand why they work and how to use the analysis results to make informed decisions. The student will gain this understanding in the classroom and through the analysis of real-world data in the lab using the programming language Python. The student will learn the fundamentals of Python and how to write and run code to apply the statistical concepts taught in the classroom.

Fall 2020: STAT UN1010 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 1010 | 001/12406 | W 4:10pm - 5:25pm Online Only |
Anthony Donoghue | 4 | 20/86 |

STAT 1010 | 001/12406 | M W 6:10pm - 7:25pm Online Only |
Anthony Donoghue | 4 | 20/86 |

**STAT UN1101 Introduction to Statistics.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement, BC: Fulfillment of General Education Requirement: Quantitative and Deductive Reasoning (QUA).

Prerequisites: intermediate high school algebra.

Designed for students in fields that emphasize quantitative methods. Graphical and numerical summaries, probability, theory of sampling distributions, linear regression, analysis of variance, confidence intervals and hypothesis testing. Quantitative reasoning and data analysis. Practical experience with statistical software. Illustrations are taken from a variety of fields. Data-collection/analysis project with emphasis on study designs is part of the coursework requirement.

Fall 2020: STAT UN1101 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 1101 | 001/12835 | M W 8:40am - 9:55am Online Only |
Banu Baydil | 3 | 86/86 |

STAT 1101 | 002/12889 | T Th 6:10pm - 7:25pm Online Only |
Ha Nguyen | 3 | 73/86 |

STAT 1101 | 003/12837 | M W 2:40pm - 3:55pm Online Only |
Tian Zheng | 3 | 70/50 |

**STAT UN1201 Calculus-Based Introduction to Statistics.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement, BC: Fulfillment of General Education Requirement: Quantitative and Deductive Reasoning (QUA).

Prerequisites: one semester of calculus.

Designed for students who desire a strong grounding in statistical concepts with a greater degree of mathematical rigor than in *STAT W1111*. Random variables, probability distributions, pdf, cdf, mean, variance, correlation, conditional distribution, conditional mean and conditional variance, law of iterated expectations, normal, chi-square, F and t distributions, law of large numbers, central limit theorem, parameter estimation, unbiasedness, consistency, efficiency, hypothesis testing, p-value, confidence intervals, maximum likelihood estimation. Serves as the pre-requisite for *ECON W3412*.

Fall 2020: STAT UN1201 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 1201 | 001/12844 | M W 11:40am - 12:55pm Online Only |
David Rios | 3 | 94/86 |

STAT 1201 | 002/12845 | T Th 11:40am - 12:55pm Online Only |
Joyce Robbins | 3 | 86/86 |

STAT 1201 | 003/12846 | M W 10:10am - 11:25am Online Only |
Carsten Chong | 3 | 86/86 |

STAT 1201 | 004/12847 | T Th 6:10pm - 7:25pm Online Only |
Arian Maleki | 3 | 86/86 |

**STAT UN1202 Undergraduate Seminar.** *1 point*.

Prerequisites: Previous or concurrent enrollment in a course in statistics would make the talks more accessible.

Prepared with undergraduates majoring in quantitative disciplines in mind, the presentations in this colloquium focus on the interface between data analysis, computation, and theory in interdisciplinary research. Meetings are open to all undergraduates, whether registered or not. Presenters are drawn from the faculty of department in Arts and Sciences, Engineering, Public Health and Medicine.

Fall 2020: STAT UN1202 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 1202 | 001/12882 | F 10:10am - 12:00pm Online Only |
Ronald Neath | 1 | 18/50 |

**STAT GU4001 Introduction to Probability and Statistics.** *3 points*.

BC: Fulfillment of General Education Requirement: Quantitative and Deductive Reasoning (QUA).

Prerequisites: Calculus through multiple integration and infinite sums.

A calculus-based tour of the fundamentals of probability theory and statistical inference. Probability models, random variables, useful distributions, conditioning, expectations, law of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. This course replaces SIEO 4150.

Fall 2020: STAT GU4001 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4001 | 001/12888 | T Th 2:40pm - 3:55pm Online Only |
Larry Wright | 3 | 120/120 |

## Applied Statistics Concentration Courses

The applied statistics sequence, together with an introductory course, forms the concentration in applied statistics. STAT UN2102 Applied statistical computing may be used to satisfy the computing requirement for the major, and the other concentration courses may be used to satisfy the elective requirements for the major. (Students who sat STAT GU4205 Linear Regression for the major would find that they have covered essentially all of the material in STAT UN2103 Applied Linear Regression Analysis.

**STAT UN2102 Applied Statistical Computing.** *3 points*.

Corequisites: An introductory course in statistic (STAT UN1101 is recommended).

This course is an introduction to R programming. After learning basic programming component, such as defining variables and vectors, and learning different data structures in R, students will, via project-based assignments, study more advanced topics, such as recursion, conditionals, modular programming, and data visualization. Students will also learn the fundamental concepts in computational complexity, and will practice writing reports based on their statistical analyses.

**STAT UN2103 Applied Linear Regression Analysis.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement

Prerequisites: An introductory course in statistics (STAT UN1101 is recommended). Students without programming experience in R might find STAT UN2102 very helpful.

Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, non-linear and logistic models, random-effects models. Implementation in a statistical package. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.

Fall 2020: STAT UN2103 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 2103 | 001/12883 | M W 6:10pm - 7:25pm Online Only |
Daniel Rabinowitz | 3 | 58/60 |

**STAT UN2104 Applied Categorical Data Analysis.** *3 points*.

Prerequisites: STAT UN2103 is strongly recommended. Students without programming experience in R might find STAT UN2102 very helpful.

This course covers statistical models amd methods for analyzing and drawing inferences for problems involving categofical data. The goals are familiarity and understanding of a substantial and integrated body of statistical methods that are used for such problems, experience in anlyzing data using these methods, and profficiency in communicating the results of such methods, and the ability to critically evaluate the use of such methods. Topics include binomial proportions, two-way and three-way contingency tables, logistic regression, log-linear models for large multi-way contingency tables, graphical methods. The statistical package R will be used.

**STAT UN3105 Applied Statistical Methods.** *3 points*.

Prerequisites: At least one, and preferably both, of STAT UN2103 and UN2104 are strongly recommended. Students without programming experience in R might find STAT UN2102 very helpful.

This course is intended to give students practical experience with statistical methods beyond linear regression and categorical data analysis. The focus will be on understanding the uses and limitations of models, not the mathematical foundations for the methods. Topics that may be covered include random and mixed-effects models, classical non-parametric techniques, the statistical theory causality, sample survey design, multi-level models, generalized linear regression, generalized estimating equations and over-dispersion, survival analysis including the Kaplan-Meier estimator, log-rank statistics, and the Cox proportional hazards regression model. Power calculations and proposal and report writing will be discussed.

Fall 2020: STAT UN3105 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 3105 | 001/12884 | T Th 11:40am - 12:55pm Online Only |
Wayne Lee | 3 | 27/60 |

**STAT UN3106 Applied Data Mining.** *3 points*.

Prerequisites: STAT UN2103. Students without programming experience in R might find STAT UN2102 very helpful.

This course will be taught as a machine learning class. We will cover topics including data-based prediction, classification, specific classification methods (such as logistic regression and random forests), and basics of neural networks. Programming in homeworks will require R; students without programming experience in R might find STAT UN2102 helpful.

## Foundation Courses

The calculus-based foundation courses for the core of the statistics major. These courses are GU4203 Probability Theory, GU4204 Statistical Inference, GU4205 Linear Regression, GU4206 Statistical Computing and Introduction to Data Science, and GU4207 Elementary Stochastic processes. Ideally, students would take Probability theory or the equivalent before taking either Statistical Inference or Elementary Stochastic Processes, and would have taken Statistical Inference before, or at least concurrently with taking Linear Regression Analysis, and would have taken Linear Regression analysis before, or at least concurrently, with taking the computing and data science course. A semester of calculus should be taken before Probability, additional semesters of calculus are recommended before Statistical Inference, and a course in linear algebra before Linear Regression is strongly recommended. For the more advanced electives in stochastic processes, Probability Theory is an essential prerequisite, and many students would benefit from taking Elementary Stochastic Processes, too. Linear Regression and the computing and data science course should be taken before the advanced electives in machine learning and data science. Linear Regression is a strongly recommended prerequisite, or at least co-requisite, for the remaining advanced statistical electives.

Code | Title | Points |
---|---|---|

STAT GU4203 | PROBABILITY THEORY | |

STAT GU4204 | Statistical Inference | |

STAT GU4205 | Linear Regression Models | |

STAT GU4206 | Statistical Computing and Introduction to Data Science | |

STAT GU4207 | Elementary Stochastic Processes |

## Advanced Statistics Courses

Advanced statistics courses combine theory with methods and practical experience in data analysis. Undergraduates enrolling in advanced statistics courses would be well-advised to have completed STAT GU4203 (Probability Theory), GU4204 (Statistical Inference), and GU4205 (Linear Regression).

**STAT GU4221 Time Series Analysis.** *3 points*.

Prerequisites: STAT GU4205 or the equivalent.

Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.

Fall 2020: STAT GU4221 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4221 | 001/12556 | T Th 1:10pm - 2:25pm Online Only |
Flavio Bartmann | 3 | 5/50 |

**STAT GU4222 Nonparametric Statistics.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement

Prerequisites: STAT GU4204 or the equivalent.

Statistical inference without parametric model assumption. Hypothesis testing using ranks, permutations, and order statistics. Nonparametric analogs of analysis of variance. Non-parametric regression, smoothing and model selection.

**STAT GU4223 Multivariate Statistical Inference.** *3 points*.

Prerequisites: STAT GU4205 or the equivalent.

Multivariate normal distribution, multivariate regression and classification; canonical correlation; graphical models and Bayesian networks; principal components and other models for factor analysis; SVD; discriminant analysis; cluster analysis.

**STAT GU4224 BAYESIAN STATISTICS.** *3.00 points*.

Prerequisites: STAT GU4204 or the equivalent.

Prerequisites: STAT GU4204 or the equivalent. Bayesian data analysis: building, fitting, evaluating and improving probability models. Prior information, hierachical models and combining information. Linear and nonlinear models. Simulation of fake data and evaluation of methods. Computing using R and Stan

Fall 2020: STAT GU4224 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4224 | 001/12490 | M W 6:10pm - 7:25pm Online Only |
Ronald Neath | 3.00 | 37/35 |

**STAT GU4231 Survival Analysis.** *0 points*.

Prerequisites: STAT GU4205 or the equivalent.

Survival distributions, types of censored data, estimation for various survival models, nonparametric estimation of survival distributions, the proportional hazard and accelerated lifetime models for regression analysis with failure-time data. Extensive use of the computer.

**STAT GU4232 Generalized Linear Models.** *3 points*.

CC/GS: Partial Fulfillment of Science Requirement

Prerequisites: STAT GU4205 or the equivalent.

Statistical methods for rates and proportions, ordered and nominal categorical responses, contingency tables, odds-ratios, exact inference, logistic regression, Poisson regression, generalized linear models.

**STAT GU4233 Multilevel Models.** *3 points*.

Prerequisites: STAT GU4205 or the equivalent.

Theory and practice, including model-checking, for random and mixed-effects models (also called hierarchical, multi-level models). Extensive use of the computer to analyse data.

**STAT GU4234 Sample Surveys.** *3 points*.

Prerequisites: STAT GU4204 or the equivalent.

Introductory course on the design and analysis of sample surveys. How sample surveys are conducted, why the designs are used, how to analyze survey results, and how to derive from first principles the standard results and their generalizations. Examples from public health, social work, opinion polling, and other topics of interest.

Fall 2020: STAT GU4234 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4234 | 001/12492 | M W 1:10pm - 2:25pm Online Only |
Rongning Wu | 3 | 12/25 |

**STAT GU4241 Statistical Machine Learning.** *3 points*.

Prerequisites: STAT GU4206.

The course will provide an introduction to Machine Learning and its core models and algorithms. The aim of the course is to provide students of statistics with detailed knowledge of how Machine Learning methods work and how statistical models can be brought to bear in computer systems - not only to analyze large data sets, but to let computers perform tasks that traditional methods of computer science are unable to address. Examples range from speech recognition and text analysis through bioinformatics and medical diagnosis. This course provides a first introduction to the statistical methods and mathematical concepts which make such technologies possible.

**STAT GU4261 Statistical Methods in Finance.** *3 points*.

Prerequisites: STAT GU4205 or the equivalent.

A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.

Fall 2020: STAT GU4261 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4261 | 001/12494 | F 8:40am - 11:25am Online Only |
Hammou El Barmi | 3 | 25/25 |

**STAT GU4263 Statistical Inference and Time Series Modelling.** *3 points*.

Prerequisites: STAT GU4204 or the equivalent. STAT GU4205 is recommended. Modeling and inference for random processes, from natural sciences to finance and economics. ARMA, ARCH, GARCH and nonlinear models, parameter estimation, prediction and filtering. This is a core course in the MS program in mathematical finance.

Fall 2020: STAT GU4263 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4263 | 001/12495 | T Th 6:10pm - 7:25pm Online Only |
Li Haoran | 3 | 12/35 |

STAT 4263 | 002/12496 | Sa 10:10am - 12:40pm Online Only |
Li Haoran | 3 | 17/35 |

**STAT GU4291 Advanced Data Analysis.** *3 points*.

Prerequisites: STAT GU4205 and at least one statistics course numbered between GU4221 and GU4261.

This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and non-standard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.

Fall 2020: STAT GU4291 |
|||||

Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
---|---|---|---|---|---|

STAT 4291 | 001/12515 | F 6:10pm - 8:40pm Online Only |
Demissie Alemayehu | 3 | 25/25 |

## Actuarial Sciences Courses

Only students preparing for a career in actuarial sciences should consider the courses in this section. Such students may also be interested in courses offered through the School of Professional Studies M.S. Program in Actuarial Science, but must check with the academic advisors in their schools to know whether they are allowed to register for those courses. Students majoring in statistics and preparing for a career in actuarial science may take STAT GU4282 (Regression and Time Series Analysis) in place of the major requirement STAT GU4205 (Linear Regression Analysis).

Code | Title | Points |
---|---|---|

STAT GU4281 | Theory of Interest | |

STAT GU4282 | Linear Regression and Time Series Methods |

## Advanced Data Science Courses

In response to the ever growing importance of ``big data” in scientific and policy endeavors, the last few years have seen an explosive growth in theory, methods, and applications at the interface between computer science and statistics. The Department offers a sequence that begins with the core course STAT GU4206 (Statistical Computing and Introduction to Data Science) and continues with the advanced electives GU4241 (Statistical Machine Learning) and GU4242 (Advanced Machine Learning), and also the advanced elective STAT GU4243 (Applied Data Science). Undergraduate students without experience in programming would likely benefit from taking the statistical computing and data science course before attempting GU4241, GU4242, or GU4243.

Code | Title | Points |
---|---|---|

STAT GU4241 | Statistical Machine Learning | |

STAT GU4242 | Advanced Machine Learning | |

STAT GU4243 | Applied Data Science | |

STAT GU4702 | Exploratory Data Analysis and Visualization |

## Advanced Stochastic Processes Courses

The stochastic processes electives in this section have STAT GU4203 (Probability Theory) or the equivalent as prerequisites Most students would also benefit from taking STAT GU4207 (Elementary Stochastic Processes) before embarking on the more advanced stochastic processes electives.

Code | Title | Points |
---|---|---|

STAT GU4262 | Stochastic Processes for Finance | |

STAT GU4264 | STOCHASTC PROCSSES-APPLIC | |

STAT GU4265 | Stochastic Methods in Finance |