This module introduces principles and applications of the electronic storage, structuring, manipulation, transformation, extraction, and dissemination of data. In the age of `Big Data`, the vast amount of data is generated in each day, and if equipped with a right set of skills, computational social scientists can obtain valuable insights only attainable through a data-driven approach. This module is aimed to provide an opportunity for learning such skills through programming in Python.
We focus on four key aspects of data management. The first is studying the various types of data, data shapes, and how to clean and transform them to fit for future data analysis. The next key component is the data acquisition. Most data nowadays are stored electronically on the Internet. We will learn what data are available online and how to obtain them through both scraping of websites and accessing APIs of online databases and social network services. The third key component of the module is to learn about the data storage solution, in particular about databases in both relational and non-relational forms. The module covers the fundamental concepts of database and how to create, populate, modify, and query relational databases. Lastly, this module uses a project-based learning approach, including group-based collaboration, essential ingredients of modern data science projects. We will learn various collaboration and management tools, such as the shared computational environment on the cloud and use of version control tools.