From Congressional minutes and legislation to online debates and Twitter comments, textual data can provide valuable information essential for understanding decision-making, public sentiment, and social interaction. Yet, real world data is often messy and highly unstructured. This workshop series will address how to extract high quality data from online text resources and apply natural language processing (NLP) to turn mined information into impactful summaries and visualizations.
With recent improvements in NLP, users now have many options for solving complex challenges. However, it is not always clear which tools or libraries work best for social science research. In this dynamically evolving field of data collection, the method and libraries that you should use and in what order can be overwhelming. This workshop aims to provide a series of blueprints for best practice solutions to common tasks in text analytics and natural language processing.
In three two-hour sessions, we will:
- Week 1
- Extract data from APIs and web pages
Prepare textual data for statistical analysis and machine learning
- Week 2
- Use machine learning for classification, topic modeling, and summarization
- Week 3
- Explore and visualize semantic similarities with word embeddings
Create a knowledge graph based on named entities and their relations
Basic knowledge of Python including Pandas and some familiarity with Jupyter notebooks would be helpful but is not essential. In order to participate and install required libraries, you should have access to a computer on which you have administrative privileges.
Event Details:
Date: Wednesday, November 3rd, 10th, and 17th
Time: 12:00pm – 2:00pm Central Time
Location: Zoom
Audience: Open to faculty, graduate students, and research staff.
The Zoom meeting link will be sent to registrants on the morning of the session.