GV918-7-AU-CO:
Data for Social Data Science
PLEASE NOTE: This module is inactive. Visit the Module Directory to view modules and variants offered during the current academic year.
2025/26
Government
Colchester Campus
Autumn
Postgraduate: Level 7
Inactive
Thursday 02 October 2025
Friday 12 December 2025
30
08 August 2024
Requisites for this module
(none)
(none)
(none)
(none)
(none)
This module introduces principles and applications of the electronic storage, structuring, manipulation, transformation, extraction, and dissemination of data. In the age of `Big Data`, the vast amount of data is generated in each day, and if equipped with a right set of skills, computational social scientists can obtain valuable insights only attainable through a data-driven approach. This module is aimed to provide an opportunity for learning such skills through programming in Python.
We focus on four key aspects of data management. The first is studying the various types of data, data shapes, and how to clean and transform them to fit for future data analysis. The next key component is the data acquisition. Most data nowadays are stored electronically on the Internet. We will learn what data are available online and how to obtain them through both scraping of websites and accessing APIs of online databases and social network services. The third key component of the module is to learn about the data storage solution, in particular about databases in both relational and non-relational forms. The module covers the fundamental concepts of database and how to create, populate, modify, and query relational databases. Lastly, this module uses a project-based learning approach, including group-based collaboration, essential ingredients of modern data science projects. We will learn various collaboration and management tools, such as the shared computational environment on the cloud and use of version control tools.
The aims of this module are
- To provide the following knowledge and comprehension on the basic of modern data science, through lectures and hands-on coding classes:
- An overview of the lifecycle of the data in social data science, from data acquisition, pre-processing, storing to analysis
- Knowledge of collaborative working space such as shared computing environments and version control systems
- A general review of cloud computing
- Basic principles of machine learning
By the end of the module, students will be:
1. Able to work with data sets using Python programming language and to summarise and visualise the data
2. Able to work with colleagues securely and effectively using online collaborative working space
3. Familiar to how to set up the cloud computing environment and able to know when to go on the cloud.
4. Capable of implementing online data collection projects for their research and managing/handling large data sets
5. Equipped with the understanding the fundamentals of machine learning, essential to the next steps of their data science learning.
Advisory Note
In this course, many of the assignments will be done through Python programming, which is very difficult for beginners in programming to follow without some preparation before the start of the term. Students who do not have previous experience in statistical programming are strongly encouraged to consult with the module supervisor.
Students are expected to have a basic understanding of statistical analysis with a successful completion of a module in introductory statistics.
This module will be delivered via
- 4 hours per week
- 2hr lectures per week
- 2hr class per week
-
Vanderplas, J.T. (2016f)
Python Data Science Handbook. 1st edn. Available at:
https://jakevdp.github.io/PythonDataScienceHandbook/.
-
James, G.
et al. (2021c)
An Introduction to Statistical Learning. 2nd edn. Available at:
https://link-springer-com.uniessexlib.idm.oclc.org/book/10.1007/978-1-0716-1418-1.
-
Teate, R.M. (2021c)
SQL for data scientists: a beginner’s guide for building datasets for analysis. Hoboken, New Jersey: John Wiley & Sons, Inc. Available at:
https://learning.oreilly.com/library/view/sql-for-data/9781119669364/?sso_link=yes&sso_link_from=university-of-essex.
-
Grimmer, J. (2015) ‘We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together’,
PS: Political Science & Politics, 48(01), pp. 80–83. Available at:
https://doi.org/10.1017/S1049096514001784.
-
Counts, S.
et al. (2014) ‘Computational social science’, in
Proceedings of the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp. 105–108. Available at:
https://doi.org/10.1145/2556420.2556849.
-
-
Li, N.
et al. (2023) ‘Artistic representations of data can help bridge the US political divide over climate change’,
Communications Earth & Environment, 4(1). Available at:
https://doi.org/10.1038/s43247-023-00856-9.
-
Weed, E. (no date c)
Learning Statistics with Python. Available at:
https://ethanweed.github.io/pythonbook/landingpage.html.
-
Google Cloud Platform Overview (no date). Available at:
https://cloud.google.com/docs/overview.
-
McKinney, W. (no date)
Python for data analysis: data wrangling with pandas, NumPy, and IPython. Second edition. Sebastopol, CA: O’Reilly Media, Inc. Available at:
https://learning.oreilly.com/library/view/python-for-data/9781491957653/?sso_link=yes&sso_link_from=university-of-essex.
-
Mitchell, R.E. (2018d)
Web scraping with Python: collecting more data from the modern web. 2nd edition. Beijing: O’Reilly. Available at:
https://ebookcentral.proquest.com/lib/universityofessex-ebooks/detail.action?docID=5326894.
-
Boicea, A., Radulescu, F. and Agapin, L.I. (2012) ‘MongoDB vs Oracle -- Database Comparison’, in
2012 Third International Conference on Emerging Intelligent Data and Web Technologies. IEEE, pp. 330–335. Available at:
https://doi.org/10.1109/EIDWT.2012.32.
-
‘Analyzing Big Data in less time with Google BigQuery’ (2017). YouTube: Google Cloud Tech. Available at:
https://www.youtube.com/watch?v=qqbYrQGSibQ.
-
Beaulieu, A. (2020b)
Learning SQL. 3rd Revised edition. Sebastopol: O’Reilly Media, Inc, USA. Available at:
https://ebookcentral.proquest.com/lib/universityofessex-ebooks/detail.action?docID=6128233.
-
Grimmer, J., Roberts, M.E. and Stewart, B.M. (2021) ‘Machine Learning for Social Science: An Agnostic Approach’,
Annual Review of Political Science, 24(1), pp. 395–419. Available at:
https://doi.org/10.1146/annurev-polisci-053119-015921.
-
Cranmer, S.J. and Desmarais, B.A. (2017) ‘What Can We Learn from Predictive Modeling?’,
Political Analysis, 25(2), pp. 145–166. Available at:
https://doi.org/10.1017/pan.2017.3.
The above list is indicative of the essential reading for the course.
The library makes provision for all reading list items, with digital provision where possible, and these resources are shared between students.
Further reading can be obtained from this module's
reading list.
Assessment items, weightings and deadlines
Coursework / exam |
Description |
Deadline |
Coursework weighting |
Exam format definitions
- Remote, open book: Your exam will take place remotely via an online learning platform. You may refer to any physical or electronic materials during the exam.
- In-person, open book: Your exam will take place on campus under invigilation. You may refer to any physical materials such as paper study notes or a textbook during the exam. Electronic devices may not be used in the exam.
- In-person, open book (restricted): The exam will take place on campus under invigilation. You may refer only to specific physical materials such as a named textbook during the exam. Permitted materials will be specified by your department. Electronic devices may not be used in the exam.
- In-person, closed book: The exam will take place on campus under invigilation. You may not refer to any physical materials or electronic devices during the exam. There may be times when a paper dictionary,
for example, may be permitted in an otherwise closed book exam. Any exceptions will be specified by your department.
Your department will provide further guidance before your exams.
Overall assessment
Reassessment
Module supervisor and teaching staff
Dr Akitaka Matsuo, email: a.matsuo@essex.ac.uk.
Akitaka Matsuo
Please contact govpgquery@essex.ac.uk
No
No
Yes
Dr Kyriaki Nanou
Durham University
Associate Professor in European politics
Available via Moodle
No lecture recording information available for this module.
Government
* Please note: due to differing publication schedules, items marked with an asterisk (*) base their information upon the previous academic year.
Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can
be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements,
industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist
of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules.
The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.
The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.