MA338-6-SP-CO:
Dynamic programming and reinforcement learning
2021/22
Mathematics, Statistics and Actuarial Science (School of)
Colchester Campus
Spring
Undergraduate: Level 6
Current
Monday 17 January 2022
Friday 25 March 2022
15
18 July 2021
Requisites for this module
(none)
(none)
(none)
(none)
(none)
Machine learning has become a prominent tool in data analytics. One major category of it, i.e. the reinforcement learning/adaptive learning, has been widely used in industry to maximize the notion of cumulative reward. This module is concerned with the conceptual background of reinforcement learning, i.e. Markov decision process (MDP) and dynamic programming.
Modern reinforcement learning approaches and typical applications will also be covered throughout the teaching and laboratory practices.
Adaptive learning/ reinforcement learning, has been covered under Dynamic Programming for decades. DP is designed on the divide-and-conquer basis which fits well into the computing concepts for MSc Optimization and Analytics and MSc Data Science. The stochastic version of DP links closely with the stochastic process, with the similar idea of describing the problem status by stages, states and transition matrices, but allowing decisions in the whole process. So this module fits naturally well into the current course structure, whereas compensates what we are offering by linking several topics (maths and computing, deterministic and stochastic) together.
This module can certainly be used, at least as an option, by the computational pathway of G100. It will create one more compulsory module to the MSc Optimization and Data Analytics, and one more optional for MSc Data Science (the largest MSc program we are having so far).
Actually this module can be taken by any MSc students with or without optimization background, because the divide-and-conquer idea behind it is straightforward to get at the very beginning (even without knowing Linear Programming), whereas the later contents will be largely linked to statistics (e.g. regression to approximate the value-to-go), machine learning (e.g. neural networks to predict what's going to happen in later stages) and stochastic process (e.g. stochastic dynamic programming where information reveals as time going). DP also has wide applications in Finance so we can also make it available as an option for MSc Maths and Finance.
To teach the basic mathematical concepts behind dynamic programming and reinforcement learning and link the solution approach with statistics, computing, simulation, calculus and standard optimization techniques.
1. Conceptual understanding of the “divide and conquer” idea behind dynamic programming and be able to construct deterministic and stochastic dynamic programming models;
2. Detailed and systematic knowledge of key elements of a dynamic programming system and be able to write Bellman’s equations ;
3. Understand the basic techniques of backward induction and be able to write pseudo codes to implement on small scale problems;
4.Be able to identify suitable approaches/software packages to solve some typical applications
- Basics of sequential decision process: stage, state, action, objective, policy, etc.
- Deterministic dynamic Programming and Bellman's Equation
- Stochastic: Markov Decision Process (MDP)
- Standard solution approach – backward induction and its computational challenge
- Approximation approaches – aggregation, regression, neural networks, etc.
- Reinforcement learning techniques: Monte Carlo Methods, Q-learning, Temporal difference methods
- Applications: Knapsack problem, Multi-armed Bandit, Revenue Management, Vehicle Routing etc.
This module is a joint UG and PGT module, shared by various courses in the department. Teaching will be provided as a combination of standard lecture room hours and lab sessions, as there are quite a few computationally intensive contents. Lab sessions will be designed to support students’ programming and analytical skills. At least one piece of coursework (10%) will be lab-based report.
- 15 hours of standard lecture room teaching
- 10 hours lab based teaching, Python or Matlab will be used for carrying out the computational experiments
- 5 hours classes
- 3 hours revision classes in summer term
This module does not appear to have a published bibliography for this year.
Assessment items, weightings and deadlines
Coursework / exam |
Description |
Deadline |
Coursework weighting |
Coursework |
Lab report 1 |
|
|
Coursework |
Lab report 2 |
|
|
Exam |
Main exam: 180 minutes during Summer (Main Period)
|
Exam format definitions
- Remote, open book: Your exam will take place remotely via an online learning platform. You may refer to any physical or electronic materials during the exam.
- In-person, open book: Your exam will take place on campus under invigilation. You may refer to any physical materials such as paper study notes or a textbook during the exam. Electronic devices may not be used in the exam.
- In-person, open book (restricted): The exam will take place on campus under invigilation. You may refer only to specific physical materials such as a named textbook during the exam. Permitted materials will be specified by your department. Electronic devices may not be used in the exam.
- In-person, closed book: The exam will take place on campus under invigilation. You may not refer to any physical materials or electronic devices during the exam. There may be times when a paper dictionary,
for example, may be permitted in an otherwise closed book exam. Any exceptions will be specified by your department.
Your department will provide further guidance before your exams.
Overall assessment
Reassessment
Module supervisor and teaching staff
Dr Felipe Maldonado, email: felipe.maldonado@essex.ac.uk.
Dr Felipe Maldonado
felipe.maldonado@essex.ac.uk
Yes
Yes
Yes
Dr Yinghui Wei
University of Plymouth
Available via Moodle
Of 1313 hours, 20 (1.5%) hours available to students:
1293 hours not recorded due to service coverage or fault;
0 hours not recorded due to opt-out by lecturer(s).
Disclaimer: The University makes every effort to ensure that this information on its Module Directory is accurate and up-to-date. Exceptionally it can
be necessary to make changes, for example to programmes, modules, facilities or fees. Examples of such reasons might include a change of law or regulatory requirements,
industrial action, lack of demand, departure of key personnel, change in government policy, or withdrawal/reduction of funding. Changes to modules may for example consist
of variations to the content and method of delivery or assessment of modules and other services, to discontinue modules and other services and to merge or combine modules.
The University will endeavour to keep such changes to a minimum, and will also keep students informed appropriately by updating our programme specifications and module directory.
The full Procedures, Rules and Regulations of the University governing how it operates are set out in the Charter, Statutes and Ordinances and in the University Regulations, Policy and Procedures.