MA321-7-SP-CO:
Applied Statistics

The details
2020/21
Mathematics, Statistics and Actuarial Science (School of)
Colchester Campus
Spring
Postgraduate: Level 7
Current
Sunday 17 January 2021
Friday 26 March 2021
15
16 July 2020

 

Requisites for this module
(none)
(none)
(none)
(none)

 

(none)

Key module for

MSC G30412 Data Science,
MSC G30424 Data Science,
MSC G304PP Data Science with Professional Placement,
DIP G30009 Statistics,
MSC G30012 Statistics,
MPHDG30048 Statistics,
PHD G30048 Statistics,
MPHDG30448 Data Science,
PHD G30448 Data Science

Module description

This module covers four application areas of statistics: multivariate methods, support vector machines, demography and epidemiology, and sampling (PG module 5 of the syllabus of the Royal Statistical Society).

Module aims

The aim of the module is to foster statistical thinking and create data scientists that have both theoretic knowledge and application skills.

Module learning outcomes

On completion of the course students should be able to:

- Understand and to apply multivariate methods;
- Assess the results of discriminant analysis, principal components, cluster analysis and multivariate analysis of variance;
- Understand and to apply demographical and epidemiological methods;
- Understand and to use the notion of support vector machines, with applications to classification;
- Understand and to apply sampling methods.

Module information

Syllabus

Multivariate methods

Vectors of expected values. Covariance and correlation matrices. Discriminant analysis, choice between two populations, calculation of discriminant function, and probability of misclassification, test and training samples, leave-one-out and k-fold cross-validation, idea of extension to several populations. Principal components; definition, interpretation of calculated components, use in regression. Cluster analysis, similarity measures, single-link and other hierarchical methods, k-means. Informal approaches to checking for multivariate Normality. Tests and confidence regions for multivariate means.

Support vector machines

Maximal margin classifiers. Hyperplanes. Classification using a separating hyperplane. Non-separable case. Support vector classifiers. Support vector machines; classification with non-linear decision boundaries. The support vector machine. Support vector machines with more than two classes; one-versus-one and one-versus-all classification. Relationship to logistic regression.

Demography and epidemiology

Population pyramids. Life tables. Standardised rates (e.g. mortality). Incidence and prevalence. Design and analysis of cohort (prospective) studies. Design and analysis of case-control (retrospective) studies. Confounding and interaction. Matched case control design and analyses, using McNemar's test. Causation. Relative risk. Odds ratio. Estimation and confidence intervals for 2x2 tables. Mantel-Haenszel procedure. Sensitivity, specificity, ROC curves, positive predictive value, negative predictive value.

Sampling

Census and sample survey design. Target and study populations, uses and limitations of non-probability sampling methods, sampling frames, sampling fraction. Simple random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Sampling fraction and finite population correction. Ratio and regression estimators. Stratified random sampling. Estimators of totals, means and proportions; bias. Estimated standard errors, confidence intervals and precision. Cost functions. Proportional and optimal allocations. Limitations of stratified sampling. One-stage cluster sampling. Estimators for totals, means and proportions with equal cluster sizes and with different cluster sizes. Estimated standard errors, confidence intervals and precision. Link with systematic sampling. Description of two-stage sampling and of multi-stage sampling. Limitations.

Learning and teaching methods

Teaching will be delivered in a way that blends face-to-face classes, for those students that can be present on campus, with a range of online lectures, teaching, learning and collaborative support.

Bibliography

  • James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert. (2013) An introduction to statistical learning: with applications in R, New York: Springer. vol. Springer texts in statistics

The above list is indicative of the essential reading for the course. The library makes provision for all reading list items, with digital provision where possible, and these resources are shared between students. Further reading can be obtained from this module's reading list.

Assessment items, weightings and deadlines

Coursework / exam Description Deadline Coursework weighting
Coursework   Assignment 1    25% 
Coursework   Assignment 2    75% 
Exam  Main exam: 240 minutes during Summer (Main Period) 

Exam format definitions

  • Remote, open book: Your exam will take place remotely via an online learning platform. You may refer to any physical or electronic materials during the exam.
  • In-person, open book: Your exam will take place on campus under invigilation. You may refer to any physical materials such as paper study notes or a textbook during the exam. Electronic devices may not be used in the exam.
  • In-person, open book (restricted): The exam will take place on campus under invigilation. You may refer only to specific physical materials such as a named textbook during the exam. Permitted materials will be specified by your department. Electronic devices may not be used in the exam.
  • In-person, closed book: The exam will take place on campus under invigilation. You may not refer to any physical materials or electronic devices during the exam. There may be times when a paper dictionary, for example, may be permitted in an otherwise closed book exam. Any exceptions will be specified by your department.

Your department will provide further guidance before your exams.

Overall assessment

Coursework Exam
20% 80%

Reassessment

Coursework Exam
20% 80%
Module supervisor and teaching staff
Dr Fanlin Meng, email: fanlin.meng@essex.ac.uk.
Dr Fanlin Meng & Dr Stella Hadjiantoni
Dr Fanlin Meng (fanlin.meng@essex.ac.uk), Dr Stella Hadjiantoni (stella.hadjiantoni@essex.ac.uk)

 

Availability
Yes
No
No

External examiner

Prof Fionn Murtagh
University of Huddersfield
Professor of Data Science
Resources
Available via Moodle
Of 2553 hours, 0 (0%) hours available to students:
2553 hours not recorded due to service coverage or fault;
0 hours not recorded due to opt-out by lecturer(s).

 

Further information

Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements, industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules. The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.

The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.