INTRODUCTION TO DATA SCIENCE BOOTCAMP

About the workshop
Data science has become a central component in most analytical and decision-making processes. This week long bootcamp presents the foundations of data science, providing a broad and solid introduction to this important field.
Using a hands-on methodology, students will learn in practice how to use mathematical and computational tools to solve real-world problems involving massive data. In particular, students will learn how data science works in practice, and how to face issues in data cleaning, visualization, prediction, and pattern recognition.

Who is this workshop for
Introduction to Data Science is for anyone interested in improving their skills to tackle data problems from industry, government, financial market, and academy. A familiarity with basic statistics, linear algebra, and basic programming skills is desirable, but not mandatory.

Data Science in Practice
Theoretical lectures are followed by practical sessions (Lab classes) using real data. Students are strongly encouraged to bring their laptops to the classes in order to put into practice the learned content. The following softwares will be used during the labs and we recommend to have them installed in your personal computer:

1.  Python 3 and packages: numpy, scipy, matplotlib, sklearn, altair
2. Jupyter Notebook

Event Coordinators

  • Claudio Silva
  •  Luis Gustavo Nonato

 

SCHEDULE

09/07 – Monday

9h - 10h30: Programming for Data Science - Part I
11h - 12h30: Programming for Data Science - Part II
14h - 16h00: Lab Class - Handling Real Messy Data
16h15 -17h15: Talk 1
17h30 - 18h: Opening Section

10/07 – Tuesday

9h - 10h30: Machine Learning for Data Science - Part I
11h - 12h30: Machine Learning for Data Science - Part II
14h - 16h: Lab Class - Predicting and Classifying 
  Urban and Social Phenomena
16h15 - 17h15: Talk 2

11/07 – Wednesday

9h- 10h30: Big Data - Part I
11h - 12h30: Big Data - Part II
14h - 16h: Lab Class
16h15 - 17h15: Talk 3

12/07 - Thursday

9h - 10h30: Data Visualization - Part I
11h - 12h30: Data Visualization - Part II
14h - 16h: Lab Class
16h15 - 17h15: Talk 4

13/07 – Friday

9h - 10h30: Deep predictions using deep networks
16h15 - 17h15: Talk 5

Registration until 02/07/2018

Date: July 09 – 13, 2018
Time: 9h - 18h
Workload: 40h

:: Values

R$ 4,000.00 external public
R$ 1,000.00 student FGV and employees
R$ 2,000.00 former FGV

:: Form of payment
2X without interest or on view at billet
Differentiated payment to 30/05

Contents

The data science bootcamp comprises thirty hours of activities split in the following themes:

Programming for Data Science

  • Python data types, syntax, and control flow
  • Pandas: dealing with real data
  • Numpy: first steps towards data analysis
  • Matplotlib: simple data visualization

Machine Learning for Data Science

  • Regression Methods: predictions made simple
  • Clustering: finding patterns in the city
  • Classification: classifying urban phenomena
  • Regression and Classification Trees: empowering your analytical tools

Big Data

  • Map Reduce Algorithms and Design Patterns
  • Relational model and Big Data
  • Mining spatio-temporal datasets

Data Visualization

  • The Value of Visualization
  • Introduction to Altair
  • Visualizing financial time series
  • Visualization for communication of results

Deep Learning

  • The basics of deep learning: from shallow to deep models
  • Deep Learning X Multilayer Perceptron: empowered predictions
  • Convolutional Networks
  • Learning tricks: batch normalization, dropout and optimization
  • Autoencoders for unsupervised deep learning
  • TensorFlow implementation
 

Speakers Short Biography

Claudio Silva
Cláudio Silva is a professor of computer science and engineering and data science at New York University. He received a BS in mathematics from the Federal University of Ceará (Brazil) in 1990, and a PhD in computer science from State University of New York at Stony Brook in 1996. He has held positions in academia and industry, including at AT&T, IBM, Lawrence Livermore, Sandia, and the University of Utah. Cláudio has advised and/or mentored over 45 PhD, MS, and post-doctoral associates. He has published over 250 research publications, is an inventor of 12 US patents, and co-authored 14 papers that have received “Best Paper Awards” (including honorable mention). He has over 14,000 citations according to Google Scholar. He is an IEEE Fellow and was the recipient of the 2014 Visualization Technical Achievement Award. He worked on the research and development team for Major League Baseball (MLB) MLB.com's Statcast player tracking system, which has won the Alpha Award for Best Analytics Innovation/Technology at the 2015 MIT Sloan Sports Analytics Conference and a 2017 Technology & Engineering Emmy Award from the National Academy of Television Arts & Sciences (NATAS).

Juliana Freire
Juliana Freire is a Professor of Computer Science and Engineering and Data Science at New York University. She holds an appointment at the Courant Institute for Mathematical Science, is a faculty member at the NYU Center for Urban Science and at the NYU Center of Data Science. She is the executive director of the NYU Moore-Sloan Data Science Environment, chair of the ACM SIGMOD and a council member of the Computing Community Consortium (CCC). Her recent research has focused on big-data analysis and visualization, large-scale information integration, web crawling and domain discovery, provenance management, and computational reproducibility. Prof. Freire is an active member of the database and Web research communities, with over 180 technical papers, several open-source systems, and 12 U.S. patents. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She has chaired or co-chaired workshops and conferences, and participated as a program committee member in over 70 events. Her research grants are from the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T, the University of Utah, New York University, Microsoft Research, Yahoo! and IBM.

Jorge Poco
Jorge Poco is an assistant professor in the Research and Innovation Center in Computer Science (RICS) at the San Pablo Catholic University (UCSP). Previously, he was a research associate in the Interactive Data Lab (IDL) at the University of Washington. He obtained his PhD from the NYU Polytechnic School of Engineering in 2015. He has an M.S. in Computer Science from the Instituto de Ciências Matemáticas e de Computação (ICMC) at the University of São Paulo (USP), Brasil-2010, and a B.E. in System Engineering from the National University of San Agustin (UNSA), Peru-2008. As part of his professional life, he worked in zAgile Inc. (2008), Google Inc. (2008 and 2010), Kitware Inc (2011), Oak Ridge National Laboratory (2012) and Xerox Research (2013). His research has focused on data visualization, visual analytics, and data science. He has participated in projects on information visualization, scientific visualization, and visual analytics. He was also involved in interdisciplinary collaborations that focused on the development of novel visualization methods to enable both climate and urban data analysis.

Luis Gustavo Nonato
Luis Gustavo Nonato received the PhD degree in Applied Mathematics from the Pontificia Universidade Católica do Rio de Janeiro, Rio de Janeiro - Brazil, in 1998. His research interests include visualization, visual analytics, geometric computing, and data science. Nonato is a full professor at the Institute of Mathematical and Computer Sciences - University of São Paulo, São Carlos, Brazil, and he is currently a visiting professor at the Center for Data Science - New York University, New York, USA. From 2008 to 2010 Nonato was a visiting scholar at the Scientific Computing and Imaging Institute - University of Utah, Salt Lake City, USA. Besides having served in several program committees, including IEEE SciVis, IEEE InfoVis, and EuroVis, he was associate editor of Computer Graphics Forum from April 2011 to March 2014. Nonato is currently the co-editor of the SBMAC SpringerBriefs in Applied Mathematics and Computational Sciences. He was also the president of the Special Committee on Computer Graphics and Image Processing of the Brazilian Computer Society from October 2011 to September 2013. Nonato has advised 9 PhD, 14 MS students, mentored 5 post-doctoral associates and he has published over 120 journal and conference papers, some of which warded as "Best Paper" and "Honorable Mention" in important conferences such as IEEE InfoVis, PacificVis, Sibgrapi, among others.

Moacir Ponti
Moacir Antonelli Ponti received his PhD (2008) and MSc (2004) degrees at the Universidade Federal de São Carlos, Brazil. He is currently a Professor at the Institute of Mathematical and Computer Sciences, Universidade de São Paulo, Brazil. During 2016 he was a visiting scholar at the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, UK. He was principal investigator of projects funded by Brazilian research agencies CNPq and FAPESP as well as international ones e.g. UGPN. Additionally, he got a Latin America Research Award from Google in 2017. Author of more than 40 papers in peer reviewed journals and conferences, his current research interests include signal, image and video processing, in particular representation learning, feature extraction and deep learning.

 

Location
FGV - Auditorium FGV
Avenida Nove de Julho, 2.029 – Bela Vista
São Paulo/SP

For more information

Contact 11.3799-3350
E-mail: economia@fgv.br

Contact details: economia@fgv.br

Portal FGVENG

Escolas FGV

Acompanhe na rede