An approximate diffusion process for environmental stochasticity in infectious disease transmission modelling

Sanmitra Ghosh , et al.

This is a PLOS Computational Biology Methods paper.

1 Introduction

Mathematical modelling of the complex dynamics of infectious diseases remains an essential tool to inform public health policies during epidemic outbreaks. The major focus of such modelling work is describing the intrinsic transmission dynamics and the flow of individuals between compartments that segregate the population as per their disease state. However, an epidemic is also driven by a number of extrinsic factors, including population mobility, social cycles (e.g. holidays), non-pharmaceutical interventions, and climatic variations [1]. In a compartmental model, such factors are often introduced explicitly through the description of the hazard (force) of infection when information about these external drivers is available [24]. However, while it is impossible to fully account for all extrinsic factors influencing transmission, yet ignoring this epistemic uncertainty often known as “environmental stochasticity” leads to a structural miss-specification of the model, a “model discrepancy”. Model discrepancy can lead to miss-calibrated models that underestimate uncertainty and produce biased predictions [5]. An elegant approach to account for the un-modelled model discrepancy is to represent the force of infection as a stochastic process. For example, [6, 7] use a diffusion process for this purpose, while [8] use a discrete time stochastic process. Parameter estimation for such stochastic models is, however, challenging. Inference, particularly in a Bayesian context, requires estimation of the joint posterior distribution of both the latent path of the stochastic process and the model parameters. Estimation using a Markov chain Monte Carlo (MCMC) algorithm, involves sampling the realisation of the stochastic process, a high dimensional object, often through data-augmentation techniques, which incur a hefty computational cost [9]. As a result efficient calibration of a compartmental model, which embeds a stochastic process, has received significant attention in the literature [10, 11] with the goal of alleviating the computational bottleneck associated with the inference of the stochastic process.

In this paper we propose a new approach to the calibration problem through the use of a path-wise approximation of a diffusion process. Specifically, we apply a truncated Fourier expansion of a Brownian motion to obtain the approximation. Application of this series expansion turns the task of inferring a high dimensional latent diffusion sample path into the task of inferring a smaller dimensional object, the expansion coefficients, which can be carried out without data-augmentation. This method is also applicable in the context of discrete time processes that converge to a diffusion in the continuous time limit. Such processes can be approximated by first carrying out the series expansion of the limiting diffusion and then applying a suitable time discretisation.

We validate the proposed method, firstly, against a data augmentation technique carried out using a particle MCMC sampler proposed in [6], using a dataset from an influenza outbreak in a boarding school. We then further validate this approach using a model where the force of infection is depicted as a sinusoidal signal. Finally, we apply this method to fit a model of COVID-19 spread in England during the first wave.

2 Background: Epidemic models with a time-varying transmission-potential

We consider the canonical SIR (Susceptible-Infected-Removed) model [12] to introduce the stochastic modelling framework, although the methodology can be applied to other more complex compartmental models. In the SIR model the compartments denote the number of susceptible (S), infected (I), and recovered (R) people in a population subjected to an epidemic at time t. For a population of size N, the SIR model is defined by the following ODE system:
where is the force of infection, describing the generation of infections with a transmission-potential β, between susceptible individuals and the fraction, It/N, of infectious individuals. The expected period spent in the compartment is given by γ−1. The individual compartment sizes sum to N = St + It + Rt.

To include environmental stochasticity we introduce a time-varying βt [1315] to mitigate model discrepancy, leading to a reformulation of the model in Eq (1):
where xt follows a diffusion process described by an Itô stochastic differential equation (SDE) [
16] with drift a(⋅), and diffusion b(⋅) functions parameterised by the vectors ξa and ξb respectively; Wt is a standard Brownian motion; and g(⋅) is a nonlinear transformation that enforces βt > 0, such as exponential or inverse-logit transformation. Here we make some mild assumptions about a(⋅) and b(⋅) such as, for example, being locally Lipschitz with a linear growth bound [16] to ensure a non-explosive solution.

Inference for the stochastic model in Eq (2) within a Bayesian framework, requires inference of the latent sample path x of the diffusion xt, which is indirectly observed through the time evolution of the disease states: St, It, Rt. This is a missing data problem that can be addressed through data-augmentation based MCMC methods [6, 10] in which a high resolution (in time) Euler-Maruyama discretisation of xt is sampled along with the model parameters. Such MCMC methods incur high computational costs and have reduced efficiency in terms of mixing and speed of convergence. In what follows we will investigate a scalable approximation of xt that is faster to sample.

3 Methods

Following [1719], we carry out a Fourier expansion of a Brownian motion Wt and obtain a smooth path-wise series approximation. Using this approximation of a Brownian motion, we can in turn approximate the SDE for xt with a random ODE. Inference of xt can then be carried out by inferring coefficients of this ODE, without requiring data-augmentation.

3.3 Inference using the series approximation

Using the path-wise series approximation of a diffusion process xt, presented in the previous sections, we can re-write the canonical SIR model in Eq (2) as a system of coupled ODEs given by
where a′(⋅) is given by
Eq (8). Note that the randomness in the above model is now encapsulated in the expansion coefficients Z = (Z1, …, Zn). Inference in this model is then relegated to the inference of all the parameters: Z, ξa, ξb, γ, and the initial values: x0, S0, I0, R0. We denote the vector of the parameters governing the dynamics as θ = (ξa, ξb, γ). We denote the state vector evolving in continuous time by Xt = (xt, St, It, Rt), and by X0 = (x0, S0, I0, R0) the vector of initial values.

In order to explain the inferential framework based on the series approximation, in Eq (10), we assume that the available data are the noisy observations of the state It at m time-points. Here we are simply considering prevalence data for the ease of exposition, however the same idea can be extended to more complex observational models where the observed data only provide partial (and often indirect) information of the states Xt [8].

The inferential goal is to learn the posterior distribution of all the unknown quantities, given the data . We place priors p(θ), p(Z), p(X0) on the parameters, expansion coefficients and the initial values. Note that, by construction, the Z = (Z1, …, Zn) have an independent standard Normal prior, see Section 3.1. We then numerically solve Eq (10) to obtain a likelihood , based on the noise assumption, where is the numerical solution of the state It evaluated at the m time-points, and ϵ are the parameters of the chosen data distribution. The posterior distribution, up to a normalisation constant, follows from the Bayes rule:
Samples from the posterior distribution can be obtained using MCMC. The samples of the latent approximate diffusion path are simply the numerical solution of the ODE for evaluated using samples of θ, Z, X0 from the posterior distribution.

Note that if we had described xt using a SDE, as in Eq (2), then we had to infer the sample path of the diffusion along with the parameters. For this we had to discretise the path of xt using, say, the Euler-Maruyama scheme [22]. Specifically, we had to introduce l points between each pair of successive observation times ti and ti+1 (i = 0,1,…,m-1) to allow sufficient accuracy of the Euler-Maruyama scheme. Thus, for a step of δ = 1/(l + 1) we would have ended-up with a discrete skeleton , representing the sample path. This sample path x along with the parameters θ can then be estimated using a data-augmentation based MCMC algorithm, where the path xp(x|y, θ) is preferably drawn using a particle filter as proposed in [6]. If we now contrast this inference scheme, where xt is described as a SDE, to the case where xt is described by an ODE (after applying the series expansion), it can be seen that as long as n = dim(Z) is chosen to be substantially smaller than dim(x), inference using the series expansion will be a simpler problem as we have to deal with the estimation of fewer random variables. Below we show, empirically, that choosing a value of n substantially smaller than dim(x) still renders an estimate of the posterior distribution that is a reliable approximation to the true posterior.

4 Evaluation

To evaluate the proposed approximation method we fit the model in Eq (10) to the data of an outbreak of influenza at a boarding school [23], on the number of infections for a period of T = 14 days among a population of size N = 763. This dataset is publicly available in the R package outbreaks [24]. This dataset was previously used in [25, 26] to fit a SIR model with a time-varying transmission-potential βt under the assumption that βt evolves as a stationary stochastic process, whose dynamics can be modelled using an Ornstein–Uhlenbeck (OU) SDE (since the solution of this SDE evolves as a stationary process). If we describe the dynamics of xt, in Eq (10), through the OU SDE
where (ξ1ξ2xt) is the linear drift function and ξ3 is a standard deviation parameter, commonly referred to as the volatility parameter, representing a constant diffusion function, then the stochastic model in
Eq (10) becomes similar to the model in [25]. We can re-write the model in Eq (10) as follows:
where, following [
25] and [26], the logarithm of the transmission-potential evolves as a stationary stochastic process.

Here we specifically want to compare the outcome of inference using the true OU diffusion used above (SDE) with its series approximation (SA), leading to a model such as in Eq (10), where the dynamics of is given by the following ODE:
where we have chosen the generalised Fourier basis
Eq (4) as the function ϕi(t).

For the SDE model the latent sample path x, the OU SDE parameters ξ, initial value x0 and the parameter γ were also estimated together with the initial susceptibility, s0 = S(t = 0)/N, assuming the initial recovered fraction r0 = 0 and thus i0 = 1 − s0. As this is count data we have specified a Poisson likelihood:
where in this case X0 = (x0, s0) and θ = (ξ, γ). For the SA model we used the inferential framework introduced in the previous section and used the Poisson likelihood as above:

We chose a weakly-informative prior for the parameters governing the dynamics ξ1, …, ξ3, γ ∼ Γ(2, 2). For s0 a Beta(2, 1), since we expect the true value to be near or greater than 2/3, and for the initial value of the diffusion we used a prior , which is the stationary distribution of the OU diffusion.

For the SDE model, data-augmentation using a particle filter was employed to sample the ‘true’ diffusion’s path, following [6], and produce an unbiased estimate of the likelihood. Parameters γ, ξ, X0 were estimated using the Metropolis-Hastings (MH) algorithm, with an adaptive random-walk proposal based on algorithm 4 of [27]. See Appendix B in S1 Text. for further details on this proposal mechanism. The likelihood estimate produced by the particle filter was used in the acceptance step of the MH algorithm. This particle-marginal Metropolis-Hastings (PMMH) MCMC scheme for jointly updating the latent diffusion path along with the parameters has been shown to have superior performance [6] when compared to other data-augmentation approaches. For the PMMH, we used a Bootstrap particle filter [28], where the particles are propagated using Euler-Maruyama discretisation, and set the number of particles to 1000. Following [25], we carried out the Euler-Maruyama iterations with a stepsize δt = 0.1, leading to l = 9 time-points between two observations.

For the SA model we used the Metropolis-Hastings algorithm with the same adaptive random-walk proposal (RWMH) used with the PMMH scheme and the Euler method to numerically solve the ODE adopting the same step-size that is used with the Euler-Maruyama scheme for the SDE.

Note that inference for the SDE model using PMMH will be substantially more computationally heavy compared to the inference for the ODE based SA model, irrespective of the value of n. This is due to the particle filter requiring multiple evaluation of the Euler-Maruyama scheme at each MCMC iteration. Even when parallelised, the particle filter will be bottlenecked by a weight-updating step (see [28] for details) requiring message-passing across processes. The Euler scheme for solving the ODE in Eq (14), in comparison, is evaluated once every iteration of a Metropolis-Hastings algorithm targeting the posterior distribution in Eq (11).

A crucial parameter for the proposed method is the number of basis functions n. If a value of n produces a close match between the marginal densities of the true and approximate diffusion at the end of the analysis period T then the approximation will be valid throughout the course of the epidemic. In this case T = 14. In Fig 1 we compare the time T marginal densities obtained by solving the ODE in Eq (14) associated with the SA, and p(xt)|t=T obtained from the original OU diffusion, both based on some trial parameters sampled from the prior. The value n = 15 produces a close match between the marginal densities. We defer further discussion of the effect of n on estimation to section 4.2.

4.1 Results: Comparison between true and approximate diffusion

We fitted the two models, SDE and SA respectively, using the associated algorithms as described above to the influenza dataset. We ran two chains of both PMMH, for the SDE model, and RWMH, for the SA one, for 106 iterations where the first 5 × 105 iterations were discarded as burn-in and the remaining samples thinned to obtain 1000 samples from the posterior distribution. The running times were 15907 and 2397 seconds for the PMMH and RWMH with n = 15, respectively. We implemented a vectorised particle filter and the Euler solver for the ODE using Jax [29]. The adaptive MCMC algorithm was implemented using Python.

We notice a good agreement between the parameter estimates obtained using the SDE and SA counterparts (see Fig 2). Furthermore, in Fig 3 we compare the goodness-of-fit and display the posterior distribution of the latent diffusion paths and , corresponding to the SDE and SA. Additionally, for aid of visualisation, we have also plotted draws from the (posterior) sample paths for both models in Fig 3(c) and 3(d).


Fig 2. Comparison of the posterior marginal densities of the parameters obtained using the SDE and the SA (with n = 15 basis function). These densities are summarised using a kernel density estimate.


Fig 3.

Influenza dataset: Goodness-of-fit (a); posterior distribution of the latent diffusion paths corresponding to the SDE and SA counterparts (b), with densities summarised by the mean (solid lines) and 95% credible intervals (broken lines); and samples from the posterior distribution of the latent diffusion paths, SDE (c) and SA (d).

We observe a good agreement between the epidemic curves obtained using the SDE and the SA, but for the posterior distribution of the latent diffusion paths the credible intervals are narrower for the SA. The SA, due to the truncation of the infinite series expansion, produces smoother paths, slightly underestimating the variability in the evolution of the latent diffusion path. On a closer introspection of the posterior means (Fig 3(b)), it is noticeable that the latent diffusion paths drop and increase again in the period between the 4-th and 9-th day, around the peak, indicating an increase in transmissibility. These changes are reflected in the estimates of both SDE and SA. After the 9-th day, the variability in the latent paths increase for both SDE and SA and the posterior means match closely. This is expected since after the peak, when the epidemic is receding, a large change in βt will have negligible effect on the case counts. Note that there is also a considerable amount of uncertainty about the SA’s estimate of the maximum value of the transmission-potential around the 9-th day (see the sample paths in Fig 3(d)). That is the SA is uncertain whether the maximum value of the transmission-potential occurs on the 8-th or the 9-th day. This increased uncertainty, in turn, results in the credible intervals of SA matching the credible intervals of SDE after day 9.

These results were confirmed in a simulation study where the simulated datasets mimicked this influenza dataset (see Appendix C in S1 Text).

Additionally, we have fitted the model while using a negative binomial likelihood to allow overdispersion. Details are given in Appendix I in S1 Text.

4.2 The effect of n on the convergence of the posterior

In Fig 1 we noticed that the marginal distribution of the latent diffusion path and its series approximation starts agreeing beyond n ≥ 10 terms. It is worth investigating whether such a threshold exist for the posterior distributions obtained using the SDE and the SA. We did this by further comparing the joint posterior distribution p(θ, X0|y), from SDE and SA while varying n. Note that θ and X0 are quantities which were estimated using both the SDE and SA counterparts, and thus the joint posterior of these were chosen for comparison. For this comparison we estimated the posterior distribution by fitting the SA repeatedly with number of basis set to n = 3, 5, 10, 15, 20, 25, 30. To compare the posterior distributions, we used the maximum mean discrepancy (MMD) metric [30], a divergence metric that can be calculated using samples from the distributions. See Appendix H in S1 Text for further details on this metric.

In Fig 4 we plot the MMD between the posteriors from SDE and SA for increasing n. For n ≥ 10 we found good agreement between the two posteriors, consistent with the results from comparing the marginal densities (Fig 1). This reinforces our approach of choosing the number of basis by comparing marginals of the latent process, while using the SA. We summarise the runtimes of MCMC with the RWMH proposal for each choice of n in Table 1, noting that the increase in the runtimes as we varied n was negligible, especially when compared to the PMMH with SDE.


Table 1. Runtimes (rounded to nearest integer), in seconds, of MCMC for SA, as a function of the number of basis n, in comparison with the runtime of SDE. These were run on a 3.6 GHz machine with 16 GB memory.

5 Approximate diffusion for capturing a complex signal

In the previous section, when fitting the SIR model to an influenza dataset, we considered a stationary stochastic process representing the evolution of the transmission-potential. To further demonstrate the usefulness of the proposed series approximation technique we investigate here the capability of the approximate diffusion process model to recover a complex signal representing the time evolution of the transmission-potential. To this end, we follow closely the experiment furnished in [7], where observations were simulated from a SIRS model whose time-varying transmission potential was given by a deterministic sinusoidal signal. More specifically, we consider the SIRS model given by [7]
where Ct gives the number of new cases, 1/α is the average duration of immunity, γ is the recovery rate and μ is the recruitment or mortality rate. Observation , from this model, were simulated on a discrete time-grid t1:m = (t1, t2…, tm), with observation interval δt = titi−1, (i = 1, …, m) being 7 days for a total of m = 156 weeks constituting an analysis time-span of three years. Again following [
7], we assumed a Poisson observation model for the number of new cases. Thus, the generative model is given by
where hδt(⋅) denotes the cumulative sum of Ct over the observation interval.

The parameters were set to the exact values used in [7] for generating the simulated data. We did the same for the initial values which were chosen in [7] to generate transients realistically reflecting a real application. To fit the data we employed the same SIRS model as in Eq (17), but we assumed the transmission-potential to evolve as an approximate Brownian motion, to capture a non-stationary signal, given by (following Eq (6))
where σ is a volatility parameter and we solve this ODE for βt along with the compartmental ODEs in
Eq (17), using an Euler solver. We chose n = 20, through a comparison of the marginal density at p(βt)|t=T, where T = 7 × 156 = 1092. We again chose the generalised Fourier basis in Eq (4) as the function ϕi(t).

The unknown parameters included θ = (1/α, 1/γ, σ) (μ was assumed known) and the initial values X0 = (S0, I0, β0). The Poisson observation model in Eq (18) was considered as the likelihood. We retained the same priors as in [7]. The priors and the generative values are given in Appendix J in S1 Text. We ran the adaptive MCMC algorithm, used previously for the influenza dataset, for 5 × 105 iterations, discarded the first half as burn-in, and thinned appropriately to collect 1000 samples representing the posterior distribution. The run-time was 808 seconds, which is orders-of-magnitude faster than what was reported in [7].

In Fig 5 we plot the goodness-of-fit together with the posterior sample path of the transmission-potential. The posterior distribution of the parameters are shown in Appendix J in S1 Text. Note how (Fig 5(b)) the approximate Brownian motion model for βt can recover well the true underlying signal, which clearly demonstrates that using an approximate diffusion process we can reliably reproduce the time evolution of the transmission-potential.


Fig 5.

SIRS model with sinusoidal transmission-potential: Goodness-of-fit (a); posterior distribution of the latent diffusion paths corresponding to the SDE and SA approaches (b), with densities summarised by the mean (solid lines) and 95% credible intervals (broken lines).

6 Application: Modelling COVID-19 outbreak in England

Our proposed method of modelling the time-varying transmission-potential as an approximate diffusion can also be applied to a discrete time stochastic process that converges to a diffusion in the continuous time limit. For example, an AR(1) process converges to a OU diffusion. Thus, if one is already using an AR(1) process to model the transmission-potential, then a discretised version of the series approximation of OU diffusion, the ODE in Eq (14), can be chosen as its replacement.

To exemplify the application of the series expansion method in replacing a discrete time stochastic process, we have chosen to fit a compartmental model, originally proposed in [8], whose dynamics are described as a set of first order difference equations, to data from the first wave of the COVID-19 outbreak in England, between February and August 2020. This model captures the effect of unknown extrinsic factors on the force of infection through a time-varying transmission-potential modelled as a Gaussian random-walk. We introduce the model of [8] in what follows and introduce an alternative formulation using the series approximation of Brownian motion.

6.1 Transmission model for COVID-19

This is an age and spatially structured transmission model, stratifying the population into nA = 7 age groups and nr = 7 regions. Within each region, the transmission dynamics are governed by a system of first order difference equations:
where: , , represent the time tk, k = 1, …, K, partitioning of the population of individuals in a region r, r = 1, …, nr, in age-group a, a = 1, …, nA, into S (susceptible), E (exposed) and I (infectious) disease states. The average period spent in the exposed and infectious states are given by the parameters dL and dI respectively; and is the time- and age-varying rate with which susceptible individuals become infected, the force of infection. Time steps of δt = 0.5 days are chosen to be sufficiently small relative to the latent and infectious periods. Following [
31] the initial conditions of the system states S, E1, E2, I1, I2 at t0 are given by region-specific parameters ψr and I0,r, describing the initial exponential growth and the initial number of infectious individuals, respectively. New infections are generated as
where is driven over time by a region-specific time-varying transmission potential, , (see
Eq (22)) which moderates the rate at which effective contact take place. This region-specific transmission-potential captures the discrepancy between how actual contact take place between the age groups, and that encoded by a set of time-varying contact matrices. We refer the reader to [8] for further details on the model dynamics and parameterisation.

Over time is not allowed to vary unconstrained and a smoothing is imposed by assuming, a priori that its evolution follows a Gaussian random-walk process with volatility :
where tlock indicates the time-point corresponding to the lockdown introduced in England on 23rd March 2020. This random-walk formulation requires the inference of the high-dimensional (due to the choice of δt) sample path of this process, an extremely challenging task using MCMC. We will discuss this inferential difficulty later in Section 6.3.2. To restrict the dimensionality of the process, the model formulation in [
8] considered the transmission-potential to be piecewise constant with weekly changepoints, and its values at these changepoints to evolve following a random-walk. Denote wkw(tk) the week in which time tk falls. Then the time evolution of the transmission-potential is modelled at a slower weekly time-scale:
as a Gaussian random-walk, with volatility , following the first week of the lockdown. Realisation of the process, for each region, can then be obtained by sampling the vector Δβr of all the weekly increments . It was assumed in [
8] that the contact matrices sufficiently described how actual contacts took place between different age groups prior to the lockdown and thus over that period. We have retained this assumption in our work.

6.3 Inference

The unknown quantities that need to be inferred can be divided into two groups: (i) Global parameters shared between regions, and (ii) regional parameters specific to each region: . After placing the same priors as was used in [8] (and listed in Appendix E in S1 Text), the posterior distribution of the unknown quantities is as follows:
where we denote by the data for all time-points, ages and regions corresponding to deaths and sero-positive tests, respectively.

6.3.1 Sampling from the posterior.

Sampling from the posterior distribution Eq (26) is challenging due to the large number of random-walk increments corresponding to all regions and weeks since lockdown. MCMC with a vanilla RWMH proposal, as applied in [8], due to the linear scaling of convergence time with increasing dimensions mixes poorly and requires a large number of iterations (≈ 107) of the Markov chain before convergence is reached. To improve convergence we instead used a random-scan Metropolis-within-Gibbs [32] (MwG) algorithm that circumvent the updating of a large parameter vector at each iteration. This MwG algorithm exploits the independence between the regional parameters. Our proposed sampling strategy consists of sampling alternatively, at each MCMC iteration, from the posterior of the global parameters conditioned on all the regional ones: (i) , and regional parameters for one randomly chosen region conditioned on the global ones (since the regional parameters are conditionally independent of any other region’s parameters): (ii) p(θr*|θg, yd, ys), where r* ∼ Uniform(1, nr). Samples from each of these conditional distributions are obtained using an adaptive RWMH move with the same adaptation mechanism used in Section 4.1. The pseudocode for this MwG algorithm is furnished in Appendix F in S1 Text.

6.3.2 An alternative formulation.

The number of region-specific random-walk increments that needs to be sampled increases with time. The performance of the MwG algorithm starts deteriorating and exhibiting poor mixing and slow convergence, as this number becomes large. This limits dramatically the usefulness of this model in the context of a real-time application.

For the model in Eq (23), this problem can be tackled by increasing the time between two successive changepoints thus reducing the number of increments to be sampled for a period of analysis. This is however driven by computational convenience, and it would be more meaningful to learn these changes from data. We could model the time evolution of the transmission-potential at a faster time-scale, for example as in Eq (22). However, in this case the number of random-walk increments, to be sampled per region, equals the number of time-points between lockdown and the end of analysis date. Any MCMC sampler, that uses a RWMH proposal, would struggle severely to move efficiently in such a high-dimensional parameter space.

To alleviate these problems we propose to model the transmission-potential as a Brownian motion Wt,r with volatility evolving in continuous time t and apply the series approximation as follows:
where the second equality follows from choosing ϕi as given in
Eq (4) and carrying out the integration. We can then discretise this approximation using the same time-step of δt that is used for the compartmental dynamics to obtain the following path-wise approximation evolving in discrete time tk:
where T is the number of days between lockdown and analysis date. Note that in this formulation the problem of sampling a large vector of increments Δβr is reduced to that of sampling a n-dimensional vector of the coefficients Zr = (Z1,r, …, Zn,r). From the comparison of the time T marginal distributions of the true and approximate Brownian motion, as for the OU process (see
Fig 1), we found n = 10 to produce a good path-wise approximation. Thus, we used n = 10 for the subsequent comparative evaluations. The regional parameter vector, , now contains the expansion coefficients instead of the random-walk increments Δβr.

6.4 Results: Comparative evaluations

We ran the MwG algorithm to target the posterior distribution in Eq (26) while using the random-walk based piecewise constant transmission-potential in Eq (23) and the Brownian motion approximation using series expansion with discretisation (SAd) in Eq (28). In both cases we ran 3 × 106 iterations, discarded the first half of the iterations as burn-in and subsequently thinned the remaining samples to obtain 1000 samples. We implemented the epidemic model in C++. The MwG algorithm was implemented using Python.

Fig 6(a) compares, for the two alternative choices of modelling the transmission-potential, the posterior predictive distributions of the death data aggregated across all ages and regions with the observed data (see Appendix G in S1 Text for region-specific plots). Clearly the goodness-of-fit is indistinguishable between the two models. In Fig 6(b) we show summaries of the posterior distributions of the latent infections p(Δinfec|yd, ys), aggregated across all ages and regions (region-wise infections are shown in Appendix G in S1 Text) again showing close consistency across models, with the exception of a few days immediately following the lockdown where the number of infections estimated by the SAd is slightly higher.


Fig 6.

Goodness-of-fit of daily death data (a) and the inferred latent infections (b), produced using the random-walk (magenta lines) and SAd (orange lines). These densities are summarised by the mean (solid lines) and 95% credible intervals (broken lines). The black line indicates the day of lockdown in England 23rd March, 2020.

Following [8], we also obtain estimates of the effective region-specific reproduction number , their weighted average Rt,E representing the reproduction number for all of England, (formulae for these are given in Appendix D in S1 Text). In Fig 7 we show the posterior distributions for p(Rt,E|yd, ys), using the two alternative models. It is evident that the estimate obtained from the SAd appears to be smoother than what is obtained using the piecewise constant model, more realistically reflecting the actual transmission process that happens in continuous time. In Table 2 we present infection-fatality ratio estimates from the two models, again showing close agreement across models.

Computational gains.

The MwG algorithm took around 78 hours to finish for both models of the transmission-potential. However, the SAd allows inference at a faster time-scale producing a smoother estimate of Rt,E avoiding artificial model assumptions. Such an inference would be computationally infeasible if using a random-walk model at the more granular time-scale as in Eq (22), given the poor scaling of the RWMH proposal in high dimensions. Thus, using the series approximation we were able to extract more information about the transmission-potential and reproduction-ratio in comparison to the piecewise constant model, while incurring the same computational expense.

Had we used the random-walk model in Eq (22), we would have had to further partition each of the regional parameter block in separate chunks to accommodate a large vector of increments . Consequently, multiple Gibbs moves would have been necessary to update all the increments for a randomly chosen region. This, in turn, would have increased the number of likelihood computations, involving the computationally expensive updates of the transmission model, exponentially at each MCMC iteration.

7 Discussion

By modelling the force of infection as the function of a time-varying transmission-potential we can incorporate extrinsic, un-modelled effects in the description of the transmission process within a compartmental model. Describing this transmission-potential, in turn, as a stochastic process, a diffusion in particular, we can inject environmental stochasticity in an otherwise deterministic model. In this paper we proposed a path-wise approximation of a diffusion process as an alternative to modelling the dynamics of the transmission-potential as a SDE. Through the path-wise approximation we arrive at a random ODE approximating the SDE. As a function of its parameters, the path (solution) of an ODE is completely deterministic. As a result inference of the transmission-potential is simplified, with no need to solve a missing data problem using a computationally expensive data-augmentation procedure.

We demonstrate the efficacy of the proposed path-wise approximation using two epidemic models. In the first one, an influenza model, we replaced an OU SDE with an equivalent path-wise approximation. We noticed similar inference outcomes in terms of parameter estimates and goodness-of-fit using the SDE and its ODE approximation. In the second one, a toy SIRS model, we used an approximation of Brownian motion to reconstruct the non-stationary time evolution of a deterministic sinusoidal transmission potential from simulated data. The path-wise series approximation of Brownian motion was able to faithfully reconstruct the generative values of the sinusoidal transmission potential, as well as the model parameters. Importantly, for both models we observed orders-of-magnitude improvement in computational efficiency of the inference process when using a path-wise approximation of a diffusion process instead of the true diffusion.

We then applied the path-wise approximation to replace a Gaussian random-walk with a discretised path-wise approximation of Brownian motion to model the transmission-potential within a compartmental model of COVID-19 pandemic spread in England. Again we noticed consistent estimates of crucial unknown quantities such as infection-fatality rate, latent infections and a time-varying estimate of the reproduction number. In addition, the path-wise approximation allows the transmission-potential to be modelled at a more granular time-scale providing a smooth estimate of the effective reproduction number. This would be impossible to achieve using the random-walk model due to an exorbitant computational burden.

As an alternative to using our path-wise approximation of Brownian motion to model the transmission-potential, at a faster time-scale, we could have used a different MCMC algorithm, such as the No-U-Turn sampler [33], that is known to perform well for high dimensional problems. This algorithm proposes a move based on the gradient of the target density. Evaluating gradients, however, for the COVID-19 model is challenging as this requires, in addition to extra computations, a complete re-implementation of the model using an automatic differentiation package. However, for modelling studies where such re-implementation is straightforward, we like to point out that by applying a path-wise approximation of a diffusion process we are left with the task of sampling from a posterior distribution with a standard Gaussian prior (over the coefficients). The No-U-Turn sampler generally excels at this task.

In this paper we have used simple diffusion models whose transition densities are known analytically. However, if additional prior information about the force of infection is available, then such information can be incorporated in more complex nonlinear SDEs as models of the time-varying transmission-potential. Moreover, SDEs can be used to model any other time-varying parameters, in addition to the transmission-potential. Our methodology can be seamlessly applied in all such cases to arrive at a path-wise approximation of such complex diffusion processes.

Any time-varying parameter, that is modelled as a diffusion process, can also be modelled using our proposed approximate diffusion process. The usefulness or necessity of considering multiple time-varying parameters, beyond just the transmission potential, depends on the particular application context. In this work our focus primarily was to show the speed-up in inference that can be achieved using an approximate diffusion model of the transmission potential only.


  1. 1.

    Bretó C, He D, Ionides EL, King AA. Time series analysis via mechanistic models. The Annals of Applied Statistics. 2009; p. 319–348.
  2. 2.

    Knock ES, Whittles LK, Lees JA, Perez-Guzman PN, Verity R, FitzJohn RG, et al. Key epidemiological drivers and impact of interventions in the 2020 SARS-CoV-2 epidemic in England. Science Translational Medicine. 2021;. pmid:34158411
  3. 3.

    Keeling MJ, Dyson L, Guyver-Fletcher G, Holmes A, Semple MG, Tildesley MJ, et al. Fitting to the UK COVID-19 outbreak, short-term forecasts and estimating the reproductive number. medRxiv. 2020;.
  4. 4.

    Davies NG, Kucharski AJ, Eggo RM, Gimma A, Edmunds WJ, Jombart T, et al. Effects of non-pharmaceutical interventions on COVID-19 cases, deaths, and demand for hospital services in the UK: a modelling study. The Lancet Public Health. 2020;5(7):e375–e385. pmid:32502389
  5. 5.

    Brynjarsdóttir J, O’Hagan A. Learning about physical parameters: the importance of model discrepancy. Inverse Problems. 2014;30(11):114007.
  6. 6.

    Dureau J, Kalogeropoulos K, Baguelin M. Capturing the time-varying drivers of an epidemic using stochastic dynamical systems. Biostatistics. 2013;14(3):541–555. pmid:23292757
  7. 7.

    Cazelles B, Champagne C, Dureau J. Accounting for non-stationarity in epidemiology by embedding time-varying parameters in stochastic models. PLoS computational biology. 2018;14(8):e1006211. pmid:30110322
  8. 8.

    Birrell P, Blake J, Van Leeuwen E, Gent N, De Angelis D. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Philosophical Transactions of the Royal Society B. 2021;376(1829):20200279. pmid:34053254
  9. 9.

    De Angelis D, Presanis AM, Birrell PJ, Tomba GS, House T. Four key challenges in infectious disease modelling using data from multiple sources. Epidemics. 2015;10:83–87. pmid:25843390
  10. 10.

    Fuchs C. Inference for diffusion processes: with applications in life sciences. Springer Science & Business Media; 2013.
  11. 11.

    Sottinen T, Särkkä S. Application of Girsanov theorem to particle filtering of discretely observed continuous-time non-linear systems. Bayesian Analysis. 2008;3(3):555–584.
  12. 12.

    Anderson RM, Anderson B, May RM. Infectious diseases of humans: dynamics and control. Oxford university press; 1992.
  13. 13.

    Ellner SP, Bailey B, Bobashev GV, Gallant A, Grenfell B, Nychka D. Noise and nonlinearity in measles epidemics: combining mechanistic and statistical approaches to population modeling. The American Naturalist. 1998;151(5):425–440. pmid:18811317
  14. 14.

    Martinez-Bakker M, King AA, Rohani P. Unraveling the transmission ecology of polio. PLoS biology. 2015;13(6):e1002172. pmid:26090784
  15. 15.

    Cazelles B, Chau N. Using the Kalman filter and dynamic models to assess the changing HIV/AIDS epidemic. Mathematical biosciences. 1997;140(2):131–154. pmid:9046772
  16. 16.

    Oksendal B. Stochastic differential equations: an introduction with applications. Springer Science & Business Media; 2013.
  17. 17.
    Lyons SMJ, Storkey AJ, Särkkä S. The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States; 2012. p. 1961–1969.
  18. 18.

    Luo W. Wiener chaos expansion and numerical solutions of stochastic partial differential equations. California Institute of Technology; 2006.
  19. 19.
    Ghosh S, Birrell PJ, De Angelis D. Differentiable Bayesian inference of SDE parameters using a pathwise series expansion of Brownian motion. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. vol. 151 of Proceedings of Machine Learning Research. PMLR; 2022. p. 10982–10998.
  20. 20.

    Lyons SM, Särkkä S, Storkey AJ. Series expansion approximations of Brownian motion for non-linear Kalman filtering of diffusion processes. IEEE Transactions on Signal Processing. 2014;62(6):1514–1524.
  21. 21.

    Wong E, Zakai M. On the Convergence of Ordinary Integrals to Stochastic Integrals. The Annals of Mathematical Statistics. 1965;36(5):1560–1564.
  22. 22.

    Kloeden PE, Eckhard P. Numerical Solution of Stochastic Differential Equations. springer Berlin; 1992.
  23. 23.

    Jackson C, Vynnycky E, Hawker J, Olowokure B, Mangtani P. School closures and influenza: systematic review of epidemiological studies. BMJ open. 2013;3(2). pmid:23447463
  24. 24.
    Jombart T, Frost S, Nouvellet P, Campbell F, Sudre B. outbreaks: A Collection of Disease Outbreak Data; 2020. Available from:
  25. 25.

    Del Moral P, Murray LM. Sequential Monte Carlo with highly informative observations. SIAM/ASA Journal on Uncertainty Quantification. 2015;3(1):969–997.
  26. 26.
    Ryder T, Golightly A, McGough AS, Prangle D. Black-Box Variational Inference for Stochastic Differential Equations. In: Proceedings of the 35th International Conference on Machine Learning; 2018. p. 4423–4432.
  27. 27.

    Andrieu C, Thoms J. A tutorial on adaptive MCMC. Statistics and computing. 2008;18(4):343–373.
  28. 28.

    Gordon N, Salmond D, Ewing C. Bayesian state estimation for tracking and guidance using the bootstrap filter. Journal of Guidance, Control, and Dynamics. 1995;18(6):1434–1443.
  29. 29.
    Bradbury J, Frostig R, Hawkins P, Johnson MJ, Leary C, Maclaurin D, et al.. JAX: composable transformations of Python+NumPy programs; 2018. Available from:
  30. 30.

    Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A. A kernel two-sample test. Journal of Machine Learning Research. 2012;13(Mar):723–773.
  31. 31.

    Birrell PJ, Ketsetzis G, Gay NJ, Cooper BS, Presanis AM, Harris RJ, et al. Bayesian modeling to unmask and predict influenza A/H1N1pdm dynamics in London. Proceedings of the National Academy of Sciences. 2011;108(45):18238–18243.
  32. 32.

    Roberts GO, Rosenthal JS. Markov-chain Monte Carlo: some practical implications of theoretical results. The Canadian Journal of Statistics/La Revue Canadienne de Statistique. 1998; p. 5–20.
  33. 33.

    Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research. 2014;15(1):1593–1623.

Source link