Discovery Engines Workshop: Python and Pandas

By Rob Mitchum // January 23, 2015

Recent years have brought an explosion of software and tools for working with data, making once arduous tasks such as processing, analysis, machine learning, and visualization easy and accessible. But as the data science toolbox grows larger and larger, it becomes harder to find the proper tool for the job at hand, not to mention how to get the most out of its use. “Discovery Engines: Under the Hood” is a new monthly workshop series organized by the Computation Institute, offering practical, hands-on instruction with new and popular computational tools. Attendees will learn from experienced users at CI research centers about programming libraries, web platforms, visualization software, and analytic techniques, that they can use in their own research and data projects.

Our first workshop focused on Statistical Learning with Python and pandas — using the programming language and the analytics library to structure, plot, and analyze data. Misha Teplitskiy, a sociology graduate student, Knowledge Labresearcher, and Data Science for Social Good alum, led the session, using iPython notebooks to walk the workshop through the core methods pandas uses to simplify working with data and statistics. Teplitskiy’s complete presentation is available below, and you can download the iPython notebooks used for the workshop at Github.

Future Discovery Engines: Under The Hood workshops will focus on web scraping, cloud-based data workflows, natural language processing, and reproducible research. Stay tuned for future announcements, or e-mail us to add your name to the mailing list.