Postgraduate: Level 7
Monday 13 January 2020
Friday 20 March 2020
01 October 2019
Requisites for this module
MSC G30412 Data Science,
MSC G30424 Data Science,
MSC G304PP Data Science with Professional Placement,
DIP G30009 Statistics,
MSC G30012 Statistics
This module covers four application areas of statistics: multivariate methods, support vector machines, demography and epidemiology, and sampling (PG module 5 of the syllabus of the Royal Statistical Society).
Vectors of expected values. Covariance and correlation matrices. Discriminant analysis, choice between two populations, calculation of discriminant function, and probability of misclassification, test and training samples, leave-one-out and k-fold cross-validation, idea of extension to several populations. Principal components; definition, interpretation of calculated components, use in regression. Cluster analysis, similarity measures, single-link and other hierarchical methods, k-means. Informal approaches to checking for multivariate Normality. Tests and confidence regions for multivariate means.
Support vector machines
Maximal margin classifiers. Hyperplanes. Classification using a separating hyperplane. Non-separable case. Support vector classifiers. Support vector machines; classification with non-linear decision boundaries. The support vector machine. Support vector machines with more than two classes; one-versus-one and one-versus-all classification. Relationship to logistic regression.
Demography and epidemiology
Population pyramids. Life tables. Standardised rates (e.g. mortality). Incidence and prevalence. Design and analysis of cohort (prospective) studies. Design and analysis of case-control (retrospective) studies. Confounding and interaction. Matched case control design and analyses, using McNemar's test. Causation. Relative risk. Odds ratio. Estimation and confidence intervals for 2x2 tables. Mantel-Haenszel procedure. Sensitivity, specificity, ROC curves, positive predictive value, negative predictive value.
Census and sample survey design. Target and study populations, uses and limitations of non-probability sampling methods, sampling frames, sampling fraction. Simple random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Sampling fraction and finite population correction. Ratio and regression estimators. Stratified random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Cost functions. Proportional and optimal allocations. Limitations of stratified sampling. One-stage cluster sampling. Estimators for totals, means and proportions with equal cluster sizes and with different cluster sizes. Estimated standard errors, confidence intervals and precision. Link with systematic sampling. Description of two-stage sampling and of multi-stage sampling. Limitations.
On completion of the course students should be able to:
- Understand and to apply multivariate methods;
- Assess the results of discriminant analysis, principal components, cluster analysis and multivariate analysis of variance;
- Understand and to apply demographical and epidemiological methods;
- Understand and to use the notion of support vector machines, with applications to classification;
- Understand and to apply sampling methods.
No additional information available.
The module has 38 contact hours in total. These consist of 25 lectures, 5 labs and 5 classes during the spring term, together with 3 revision lectures in the summer term. A project is undertaken in groups. Coursework consists of problem sheets, a project report and presentation.
- James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert. (2013) An introduction to statistical learning: with applications in R, New York: Springer. vol. Springer texts in statistics
The above list is indicative of the essential reading for the course. The library makes provision for all reading list items, with digital provision where possible, and these resources are shared between students. Further reading can be obtained from this module's reading list.
Assessment items, weightings and deadlines
|Coursework / exam
||Group Presentation and Group Project
||180 minutes during Summer (Main Period) (Main)
Module supervisor and teaching staff
Professor Berthold Lausen (firstname.lastname@example.org), Dr Fanlin Meng (email@example.com), Dr Stella Hadjianto (firstname.lastname@example.org)
Professor Berthold Lausen (email@example.com)
Prof Fionn Murtagh
Professor of Data Science
Available via Moodle
Of 38 hours, 33 (86.8%) hours available to students:
5 hours not recorded due to service coverage or fault;
0 hours not recorded due to opt-out by lecturer(s).
Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can
be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements,
industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist
of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules.
The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.
The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.