Abstract:
Physiological time series are fundamental to clinical practice and life science research. Prominent examples include heart activity recordings (electrocardiograms, or ECGs) and brain activity recordings (electroencephalograms, or EEGs).
However, these time series come with several challenges. In many cases, they are noisy or contain missing values. Additionally, clinicians and researchers are often interested in inferring unobserved physiological variables from these time series, for example, when diagnosing diseases or estimating physiological parameters.
In recent years, deep probabilistic models, which use deep neural networks to model complex, high-dimensional probability distributions, have emerged as a powerful, data-driven approach to tackling many of these challenges.
Despite these advances, many issues remain. For example, deep probabilistic models often require a large amount of data to accurately capture the desired distribution. For physiological time series, however, the amount of available data is often limited because datasets are subject to privacy restrictions or are expensive to obtain. Additionally, the neural networks used in deep probabilistic models often contain millions of parameters, which limits interpretability and can lead to unexpected failure cases.
To address these issues, we introduce data-efficient and interpretable deep probabilistic models for physiological time series, which can be used to predict clinical events, generate high-fidelity neurophysiological time series, and infer parameters for simulators of physiological processes.
Our first contribution concerns apnea prediction in newborn infants. We propose an interpretable model that predicts these clinical events directly from polysomnograms recorded during sleep and reveals which features of the recording are especially predictive.
In our second contribution, we create data-efficient generative models of neurophysiological time series, such as EEGs, that generate realistic synthetic samples, which accurately capture many neuroscientific statistics. These models also allow for relevant time-series applications, such as imputing missing values.
Finally, we propose new methods for inferring distributions of unobserved parameters from observations. Both methods concern the simulation-based setting, where a simulator (e.g. of a physiological process) is used to train a deep probabilistic model for inference.
Specifically, our third contribution introduces a method for estimating high-entropy parameter distributions, which prevents the omission of valid regions in parameter space. Our fourth contribution is a method for simulation-efficient Bayesian inference that requires substantially fewer simulation data than previous approaches.
We demonstrate the effectiveness of these methods on challenging parameter inference tasks involving simulators of neural voltage dynamics.
Together, the contributions in this thesis advance the applicability of deep probabilistic models for physiological time series, enabling more data-efficient and interpretable prediction, generation, and inference.