SC207-5-FY-CO:
Social Data Science: Code, Text and Networks

The details
2021/22
Sociology
Colchester Campus
Full Year
Undergraduate: Level 5
Current
Thursday 07 October 2021
Friday 01 July 2022
30
07 October 2021

 

Requisites for this module
(none)
(none)
(none)
(none)

 

(none)

Key module for

BSC L310 Sociology with Data Science,
BSC L311 Sociology with Data Science (including Year Abroad),
BSC L312 Sociology with Data Science (including Placement Year),
BSC L313 Sociology with Data Science (including foundation Year)

Module description

With research methods rapidly changing in response to the large-scale generation of data within society, social science needs to ensure it is engaged with new digital methods to both benefit from them, and to shape them. The module is designed for first-time programmers, from social science and humanities backgrounds, and focuses less on statistics and more on the application and practice of key techniques in social network analysis and text analysis. Skills students will learn include the analysis of social media data, visualising patterns in social interaction and unearthing topics and themes in texts using large-scale datasets. We teach skills that are in high demand, from a perspective that emphasises application and real-world questions.

In this module students will learn to combine their growing knowledge about society, social processes and research design, with powerful tools to both draw on and analyse the vast amounts and forms of new social data in a way that is critical, ethical and valuable.

This module provides a practical introduction to a range of methods that utilise intensive computational processing. Students will be taught in Python, a general-purpose accessible programming language popular in data science and used across a vast range of sectors. Students are not expected to have any prior programming experience, making this a valuable opportunity to learn new research techniques, as well as a skill that is in great demand.

Using Python, students will learn how to generate their own datasets by drawing on social media platforms and build custom tools to responsibly scrape data from websites to build their own unique datasets. Students will also learn how to manage, clean and explore very large varied datasets in preparation for analysis whilst learning about data ethics, and the practical and social responsibilities of handling data. Throughout the module students will be given practical introductions to a range of analytical techniques within Social Network Analysis and Computational Content Analysis, learning how to find patterns in data through unsupervised machine learning, topic modelling, document clustering and entity recognition as well as visualise social networks, find influential people and understand patterns of social connection.

Module aims

The course aims to provide students with:

1. A basic knowledge of the Python programming language
2. The ability to acquire data both through API’s and the web.
3. Knowledge and understanding of cleaning, managing and reporting on large datasets.
4. The ability to perform basic Social Network Analysis, and visualise networks using Gephi.
5. The ability to perform basic text analysis and topic modelling.
6. Knowledge and understanding of the legal and ethical issues surrounding computational social science practice.

Module learning outcomes

By the end of the course, students should:

1. Have a fundamental proficiency in the Python programming language.
2. Be able to generate new datasets through the use of a social media API
3. Be able to store, query and clean large datasets of varied data.
4. Understand the impact of different pre-processing techniques on later analysis outcomes.
5. Be able to visualise and measure social networks using Gephi
6. Be able to find themes and patterns in textual data using topic modelling and document clustering.
7. Be able to clearly communicate method and findings both through visualisations and written reports.
8. Understand the ethical and legal dimensions of computational social science.
9. Be able to situate computational techniques within broader principles of research design in the social sciences.

Module information

Outline Syllabus

Autumn Fundamentals

Week 2 Session 1 - What is Computational Social Science? Let's Get Started
Week 3 Session 2 - Python Fundamentals: Loops, Lists and Strings, oh my!
Week 4 Session 3 - Python Fundamentals: Functions, scopes and objects.
Week 5 Session 4 - Structuring and managing data with Pandas
Week 6 Session 5 - Exploring and visualising data with Pandas

APIs & Social Network Analysis
Week 7 Session 6 - The practice and problems of gathering Twitter data.
Week 8 Session 7 - Exploring and summarising Twitter Data
Week 9 Session 8 - Restructuring your data into a Network with Networkx
Week 10 Session 9 - Social Network Analysis with Gephi

Catch-up

Week 11 Session 10 - Catch up and code surgery

Spring Web scraping

Week 16 Session 11 - Understanding HTML and website structures
Week 17 Session 12 - Restructuring a webpage into a dataset
Week 18 Session 13 - Automated and robust, polite and ethical webscraping

Text Mining

Week 19 Session 14 - Introduction to text as data and entities
Week 20 Session 15 - Analysing and summarising collections of text
Week 21 No Session - Reading Week – Independent Project Time
Week 22 Session 16 - From terms to values: preparing text for AI analysis and discovering significant terms.
Week 23 Session 17 - Finding themes in text with topic models
Week 24 Session 18 - Testing and refining your topic models for accurate results.

Catch-up

Week 25 Session 19 - Catch-up and code surgery

Summer Presentations

Weeks 31/32 Students will be assigned to one of the scheduled presentation sessions.

Please click on the link below to view the Introduction video to SC207 Computational Social Science

Learning and teaching methods

Teaching approach As there are still restrictions related to COVID-19 in place, some of the teaching on most modules will take place online. Most modules in Sociology are divided into lectures of around 50 minutes and a class of around 50 minutes. Some are taught as a 2hr seminar, and others via a 50-minute lecture and 2-hr lab. For the majority of modules the lecture-type content will be delivered online – either timetabled as a live online session or available on Moodle in the form of pre-recorded videos. You will be expected to watch this material and engage with any suggested activities before your class each week. Most classes labs and seminars will be taught face-to-face (assuming social distancing allows this). Please note that you should be spending up to eight hours per week undertaking your own private study (reading, preparing for classes or assignments, etc.) on each of your modules (e.g. 32 hours in total for four 30-credit modules). This module SC207-5-FY will include a range of activities to help you and your teachers to check your understanding and progress. These are: online quizzes, activities and coding exercises provided by DataCamp, independent research projects using taught techniques, and student produced video presentations. The lecture-type videos provide discussion of particular techniques and approaches to deepen student’s understanding of the practical methods taught. Live coding sessions will provide step-by-step demonstration, explanation and opportunity for practical application of computational methods. The live coding sessions will take place on Zoom or face-to-face (should this be deemed safe). You are strongly encouraged to attend the sessions as they provide an opportunity to talk with your teacher and other students. The sessions will be recorded and available for you to watch or listen again. However, if you want to gain the most you can from these sessions it is very important that you attend and engage.

Bibliography

  • Analyzing Social Media Data in Python | DataCamp, https://www.datacamp.com/courses/analyzing-social-media-data-in-python
  • Pandas Foundations | DataCamp, https://www.datacamp.com/courses/pandas-foundations
  • Researchers just released profile data on 70,000 OkCupid users without permission - Vox, https://www.vox.com/2016/5/12/11666116/70000-okcupid-users-data-release
  • Feature engineering for NLP | DataCamp, https://learn.datacamp.com/courses/feature-engineering-for-nlp-in-python
  • Web Scraping with Python | DataCamp, https://www.datacamp.com/courses/web-scraping-with-python
  • Python: Network Analysis | DataCamp, https://www.datacamp.com/courses/network-analysis-in-python-part-1
  • Natural Language Processing Fundamentals in Python | DataCamp, https://www.datacamp.com/courses/natural-language-processing-fundamentals-in-python
  • Python Functions | DataCamp, https://www.datacamp.com/courses/python-data-science-toolbox-part-1
  • Veltri, Giuseppe A. (2019-10-25) Digital Social Research: John Wiley and Sons Ltd.
  • GEPHI – Introduction to Network Analysis and Visualization, http://www.martingrandjean.ch/gephi-introduction/

The above list is indicative of the essential reading for the course. The library makes provision for all reading list items, with digital provision where possible, and these resources are shared between students. Further reading can be obtained from this module's reading list.

Assessment items, weightings and deadlines

Coursework / exam Description Deadline Weighting
Coursework   Coding Task 1  19/11/2021  30% 
Coursework   Report 1  28/01/2022  30% 
Coursework   Coding Task 2  18/02/2022  10% 
Coursework   Video Presentation   13/05/2022  20% 
Coursework   Report 2   20/05/2022  20% 

Overall assessment

Coursework Exam
100% 0%

Reassessment

Coursework Exam
100% 0%
Module supervisor and teaching staff
Dr James Allen-Robertson, email: jallenh@essex.ac.uk.
Dr Valentin Danchev, email: valentin.danchev@essex.ac.uk.
Dr James Allen-Robertson, Dr Valentin Danchev
Jane Harper, Student Administrator, Telephone: 01206 873052 E-mail: socugrad@essex.ac.uk

 

Availability
Yes
Yes
Yes

External examiner

Dr Lorien Jasny
Resources
Available via Moodle
Of 2312 hours, 1 (0%) hours available to students:
2311 hours not recorded due to service coverage or fault;
0 hours not recorded due to opt-out by lecturer(s).

 

Further information
Sociology

Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements, industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules. The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.

The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.