Program Details
Program Dates & Times
June 12, 2017 to August 4, 2017 (8 weeks)
Mondays, Wednesdays & Thursdays: 10:30am - 6:30pm
Tuesdays: 10:30am - 8:30pm
Fridays: 10:30am - 5:00pm

Dinner provided on Tuesdays, and lunch provided on Thursdays
Number of Participants
We typically have capacity for eight students.
Target Audience
Upper-level undergraduate students attending college in the New York City area who are interested in attending computer science graduate school, and who would benefit from an intensive introduction to data science. We seek to increase diversity in computer science, and so we especially encourage women, minorities, individuals with disabilities, and students from smaller colleges to apply to the program.
Course Description
This introduction to data science will cover tools and techniques for acquiring, cleaning, and utilizing real-world data for research purposes. In contrast to traditional course work, where one is often handed a prepackaged dataset obtained by a third party and prepared for a specific exercise, research projects often involve not only cleaning and preparing "messy" data, but often also acquiring that data oneself (e.g., through an API). The initial phase of these projects involves a good deal of exploratory analysis to gain a preliminary understanding of the dataset. Students will be introduced to scripting (on the command line and with Python and R) for these purposes, and will gain direct experience in acquiring and modeling data from online sources.

The course also serves as an introduction to problems in applied statistics and machine learning. We will cover the theory behind simple but effective methods for supervised and unsupervised learning. Emphasis will be on formulating real-world modeling and prediction tasks as optimization problems and comparing methods in terms of practical efficacy and scalability. Students will learn to fit and evaluate such models, with applications including spam filtering and recommendation systems.
All course material from the 2016 program is available on GitHub.
Research Projects
Students will break up into two groups of four students each. Each group will be lead by two Microsoft research scientists, and the groups will collectively select a data-driven project to work on during the summer.