Costa Rica National High Technology Center
December 4-8, 2017
The Costa Rica National Research and Education Network: RedCONARE and the Advanced Computing Laboratory (CNCA) at Costa Rica National High Technology Center (CeNAT) are proud to host the first Costa Rica Big Data School.
The main goal of this event is training students, teachers, and researchers of the Costa Rica Public Education System in main Big Data topics: R, Python, Hadoop, Spark, Data Analytics, Data Visualization, Machine and Deep Learning, and more.
Big Data is a new concept that is quickly emerging in Computer Science due to the newness of its concept and the wideness of its applications. That’s why we decided to host the first school of Big Data in Costa Rica, free of charge for those involved in the educational public system. We hope you enjoy what we have prepared for months and take all the advantage of it that you can possibly can.
Ing. Mariano José Sánchez Bontempo
RedCONARE Scientific Coordinator
Costa Rica Big Data School Chair
Speakers
Antonio González Torres, Ph.D.
Assistant Professor at Tecnológico de Costa Rica and ULACIT (Big Data Visual Analytics)
Biography: Dr. González-Torres has experience in designing and programming visual analytics tools for uncovering patterns in the testing of microprocessors (Intel), cyber threat events (Equifax), the evolution of software projects (Ph.D. thesis) and the analysis of security data.
He got his Master’s and Doctor’s degree at Universidad de Salamanca and he is currently an assistant professor at Ingeniería en Computadores at Costa Rica Institute of Technology and ULACIT.
Luis L. Pérez, Ph.D.
Chief Data Scientist at Singularities
Biography: Dr. Pérez is the Chief Data Scientist at Singularities, a Costa Rican company dedicated to developing large-scale machine learning and artificial intelligence solutions for the manufacturing, consumer goods, and finance industries. He also teaches distributed databases at Cenfotec.
He obtained his Ph.D. in Computer Science from Rice University in Houston, TX with research on distributed databases and applied machine learning. He is a co-recipient of the IEEE ICDE 2017 Best Paper Award for work on scalable linear algebra over distributed relational databases.
Paul Rodriguez, Ph.D.
Research Analyst at San Diego Supercomputing Center
Biography: Dr. Rodriguez received his Ph.D. in Cognitive Science at the University of California, San Diego (UCSD) in 1999. He spent several years doing research in neural network modeling, dynamical systems simulations, time series analysis, and statistical methods for analysis and predictions in fMRI data. He has more recently worked in data mining for health care fraud identification, and optimization of data-intensive network flow models.
Mai H Nguyen, Ph.D.
Lead for Data Analyticst at San Diego Supercomputing Center
Biography: Dr. Nguyen has extensive industry and academic experience in machine learning, data mining, business intelligence, data warehousing, and software design & development. She is a data scientist at the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD), where she works on combining machine learning algorithms with distributed computing to process large-scale data. She has worked in many application areas, including remote sensing, personalized medicine, image analysis, and speech recognition. She has M.S. and Ph.D. degrees in Computer Science from UCSD, with focus on machine learning and artificial intelligence.
Intel Costa Rica Data Science
Intel Costa Rica Data Science is a group of people who have a common interest in facilitating learning, information sharing, ideation and embedding of Data Science into Intel Costa Rica.
This Data Science group is part of the Intel Costa Rica Communities of Practice effort. Data Science CoP is divided into 3 main areas, Big Data, Analytics and Business Intelligence, the members participate providing training, technical talks and collaborating on projects within Intel on topics related to Data Science.
Instructors
Mahidhar Tatineni, Ph.D.
User Support Group Lead at San Diego Supercomputer Center
Biography: Dr. Tatineni joined San Diego Supercomputer Center (SDSC) in 2005. Prior to that, he obtained a Master’s and a Ph.D. degree in Aerospace Engineering at University of California Los Angeles (UCLA). He graduated with a Bachelor’s degree in Aerospace Engineering from Indian Institute of Technology Madras.
He currently leads the User Services group at SDSC and has done many optimization and parallelization projects on the supercomputing resources including Gordon.
Robert Sinkovits, Ph.D.
Applications Lead for the Gordon project at San Diego Supercomputer Center
Biography: Dr. Sinkovits is responsible for ensuring that data-intensive problems can make effective use of this unique system. He had recently moved into this role after five years with the Baker Cryoelectron Microscopy Laboratory at UCSD where he was the lead developer of the AUTO3DEM single particle image reconstruction system and IHRSR++, an enhanced version of Ed Egelman’s original helical reconstruction software.
In conjunction with serving as the Gordon Applications Lead, he is working actively with Michael Gilson’s lab on the development of fast conformation sampling algorithms and Jonathan Myers from the LSST on asteroid tracking.
Agenda
DAY & TIME | Monday 3rd | Tuesday 4th | Wednesday 5th | Thursday 6th | Friday 7th |
---|---|---|---|---|---|
8:00am – 8:30am | Registry | ||||
8:30am – 10:00am | Inauguration (Ing. Sánchez Bontempo) | Introduction to Python (Dr. Sinkovits) | Intermediate Python (Dr. Sinkovits) | Data Analysis with Python (Dr. Sinkovits) | Data Visualization with Python (Dr. Sinkovits) |
10:00am – 10:30am | Morning Break | ||||
10:30am – 12:00pm | Keynote: Machine Learning from Small to Big Data (Dr. Pérez) | Introduction to Hadoop (Dr. Tatineni) | Deep Dive into Hadoop (Dr. Tatineni) | Data Analysis with Spark (Dr. Tatineni) | Data Visualization with R (Dr. Sinkovits) |
12:00pm – 1:00pm | Lunch | ||||
1:00pm – 2:30pm | Big Data Fundamentals (Intel Costa Rica Data Science) | Introduction to R (Dr. Sinkovits) | Programming with R (Dr. Sinkovits) | Data Analysis with R (Dr. Tatineni) | Deep Learning on SDSC Comet (Dr. Rodriguez & Dr. Nguyen) |
2:30pm – 3:00pm | Afternoon Break | ||||
3:00pm – 4:30pm | Big Data Fundamentals (Intel Costa Rica Data Science) | Introduction to Spark (Dr. Tatineni) | Programming with Spark (Dr. Tatineni) | Future Technologies (Dr. Tatineni) | Data Visualization with Spark (Dr. Tatineni) |
Material
Presentations
- Big Data Visual Analytics (Dr. González Torres)
- Machine Learning from Small to Big Data (Dr. Pérez)
- Big Data Fundamentals (Intel Costa Rica Data Science Community of Practice)
- Introduction to Hadoop (Dr. Tatineni)
- Introduction to Spark (Dr. Tatineni)
- Hadoop Deep Dive (Dr. Tatineni)
- Spark Programming (Dr. Tatineni)
- Machine Learning Overview (Dr. Sinkovits)
- Future Technologies (Dr. Tatineni)
- Deep Learning (Dr. Rodriguez) & Transfer Learning with CNN (Dr. Nguyen)
- Data Visualization with Spark (Dr. Tatineni)
Repositories
- Python Series (Dr. Sinkovits)
- R Series (Dr. Sinkovits)
- Spark Hands-on (Dr. Tatineni)
- R Hands-on (Dr. Tatineni)
- SDSC Summer Institute 2017
Registry
Tuition fee
Participation is free. There are no tuition costs associated with participating in this school for those affiliated to CONARE institutions.
Maximum quota
The maximum quota is 50 participants.
Inscription
The following form has to be fully filled before November, Wednesday 22th. Accepted participants will be notified via email on Friday 24th.
Important dates
- Start of the application process to the School: November 1st.
- The closing of the application process to the School: November 22nd.
- Notification of acceptance/rejection in the participation of the School: November 24th.
Requirements
To be a student, professor, or researcher at any public university (UCR, TEC, UNA, UNED, UTN), from CONARE or any of its ascribed programs: CeNAT, PEN and SINAES.
Have an intermediate English level (reading and listening). Some presentations and exercises may be in English.
Having basic programming skills (knowledge in R and Python are desired) and basic Linux handling.
Scholarships
Costa Rica Big Data Schools will have a scholarship program for students of the public universities of Costa Rica that live outside the Great Metropolitan Area (GAM).
Inscription
The following form has to be fully filled before November, Sunday 19th. Accepted scholarships will be notified via email on Monday 20th.
Important dates
- Start of the application process for the scholarship to participate in the School: November 1st.
- The closing of the application process for the scholarship to participate in the School: November 19th.
- Notification of acceptance/rejection of the scholarship to participate in the School: November 20th.
Requirements
- Being an active graduate student, with at least one completed major year, of any of the subsidiary institutions of any of the five public universities (UCR, TEC, UNA, UNED, UTN) located outside the Great Metropolitan Area (GAM).
- English intermediate knowledge (reading and hearing). All of the presentations and exercises are going to be in this language.
- Having basic programming skills (knowledge in R and Python are desired) and basic Linux handling.
- Participation in all the School activities. In case of not attending the total of the talks and workshops, the Organization may require the scholarship holder to refund the total amount of the scholarship.
- Attach the following documents in a single PDF file, which does not weigh more than 10 Mb. Applications with incomplete documents will not be accepted:
- Participant’s letter of interest detailing why he/she wants to participate in the School.
- Letter from the Director or Coordinator of the current career, stating that he is an active student of said career and university. Important to clarify the subsidiary where you are enrolled.
- Two recommendation letter from two faculty staff members.
Organizers
The Advanced Computing Laboratory (CNCA) is a multidisciplinary space where scientific discovery is accelerated through an advanced computing infrastructure. This infrastructure includes not only specialized and updated hardware, but also a set of efficient applications and a well-trained staff in order to take advantage all of this technology. This allows the CNCA to work in the main dimensions of research project development, training and service provision.