Data Augmentation for Time Series¶
In deep learning, our dataset should help the optimization mechanism locate a good spot in the parameter space. However, real-world data is not necessarily diverse enough that covers the required situations with enough records. For example, some datasets may be extremely imbalanced class labels which leads to poor performance in classification tasks 1. Another problem with a limited dataset is that the trained model may not generalize well 23.
We will cover two topics in this section: Augmenting the dataset and application of the augmented data to model training.
Augmenting the Dataset¶
- Random transformations, e.g., jittering;
- Pattern mixing, e.g., DBA;5
- Generative models, e.g.,
We also treat the first two methods, random transformations and pattern mixing as basic methods.
In the following table, we group some of the data augmentation methods by two dimensions, the category of the method, and the domain of where the method is applied.
|Projected Domain||Time Scale||Magnitude|
|Random Transformation||Frequency Masking, Frequency Warping, Fourier Transform, STFT||Permutation, Slicing, Time Warping, Time Masking, Cropping||Jittering, Flipping, Scaling, Magnitude Warping|
|Pattern Mixing||EMDA10, SFM11||Guided Warping12||DFM7, Interpolation, DBA5|
For completeness, we will explain some of the methods in more detail in the following.
Perturbation in Fourier Domain¶
In the Fourier domain, for each the amplitude \(A_f\) and phase \(\phi_f\) at a specific frequency, we can perform13
- magnitude replacement using a Gaussian distribution, and
- phase shift by adding Gaussian noise.
We perform such perturbations at some chosen frequency.
Slicing, Permutation, and Bootstrapping¶
We can slice a series into small segments. With the slices, we can perform different operations to create new series.
- Window Slicing (WS): In a classification task, we can take the slices from the original series and assign the same class label to the slice 14. The slices can also be interpolated to match the length of the original series 2.
- Permutation: We take the slices and permute them to form a new series 15.
- Moving Block Bootstrapping (MBB): First, we remove the trend and seasonability. Then we draw blocks of fixed length from the residual of the series until the desired length of the series is met. Finally, we combine the newly formed residual with trend and seasonality to form a new series 16.
Both the time scale and magnitude can be warped. For example,
- Time Warping: We distort time intervals by taking a range of data points and upsample or downsample it 4.
- Magnitude Warping: the magnitude of the time series is rescaled.
Another class of data augmentation methods is mixing the series. For example, we take two randomly drawn series and average them using DTW Barycenter Averaging (DBA) 5. (DTW, dynamic time warping, is an algorithm to calculate the distance between sequential datasets by matching the data points on each of the series 517.) To augment a dataset, we can choose from a list of strategies 1819:
- Average All series using different sets of weights to create new synthetic series.
- Average Selected series based on some strategies. For example, Forestier et al proposed choosing an initial series and combining it with its nearest neighbors 19.
- Average Selected with Distance is Average Selected but neighbors that are far from the initial series are down-weighted 19.
Some other similar methods are
- Equalized Mixture Data Augmentation (EMDA) calculates the weighted average of spectrograms of the same class label10.
- Stochastic Feature Mapping (SFM) is a data augmentation method in audio data11.
Data Generating Process¶
Time series data can also be augmented using some assumed data generating process (DGP). Some methods, such as GRATIS 6, utilize simple generic methods such as AR/MAR. Some other methods, such as Gaussian Trees 20, utilize more complicated hidden structures using graphs, which can approximate more complicated data generating processes. These methods do not necessarily reflect the actual data generating process but the data is generated using some parsimonious phenomenological models. Some other methods are more tuned toward detailed mechanisms. There are also methods using generative deep neural networks such as GAN.
Dynamic Factor Model (DFM)¶
For example, we have a series \(X(t)\) which depends on a latent variable \(f(t)\)7,
where \(f(t)\) is determined by a differential equation
In the above equations, \(\eta(t)\) and \(\xi(t)\) are the irreducible noise.
The above two equations can be combined into one first-order differential equation.
Once the model is fit, it can be used to generate new data points. However, we will have to understand whether the data is generated in such processes.
Applying the Synthetic Data to Model Training¶
Once we prepared the synthetic dataset, there are two strategies to include them in our model training 18.
|Pooled Strategy||Synthetic data + original data -> model|
|Transfer Strategy||Synthetic data -> pre-trained model; pre-trained model + original data -> model|
The pooled strategy takes the synthetic data and original data then feeds them together into the training pipeline. The transfer strategy uses the synthetic data to pre-train the model, then uses transfer learning methods (e.g., freeze weights of some layers) to train the model on the original data.
Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data 2019; 6: 1–48. ↩
Petitjean F, Ketterlin A, Gançarski P. A global averaging method for dynamic time warping, with applications to clustering. Pattern recognition 2011; 44: 678–693. ↩↩↩↩
Stock JH, Watson MW. Chapter 8 - dynamic factor models, Factor-Augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In: Taylor JB, Uhlig H (eds). Handbook of macroeconomics. Elsevier, 2016, pp 415–525. ↩↩↩
Yoon J, Jarrett D, Schaar M van der. Time-series generative adversarial networks. In: Wallach H, Larochell H, Beygelzime A, Buc F dAlche, Fox E, Garnett R (eds). Advances in neural information processing systems. Curran Associates, Inc., 2019https://papers.nips.cc/paper/2019/hash/c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.html. ↩
Cui X, Goel V, Kingsbury B. Data augmentation for deep neural network acoustic modeling. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2014, pp 5582–5586. ↩↩
Le Guennec A, Malinowski S, Tavenard R. Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data. 2016https://halshs.archives-ouvertes.fr/halshs-01357973/document. ↩
Um TT, Pfister FMJ, Pichler D, Endo S, Lang M, Hirche S et al. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. 2017.http://arxiv.org/abs/1706.00527. ↩
Bergmeir C, Hyndman RJ, Benı́tez JM. Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation. International journal of forecasting 2016; 32: 303–312. ↩
Forestier G, Petitjean F, Dau HA, Webb GI, Keogh E. Generating synthetic time series to augment sparse datasets. In: 2017 IEEE international conference on data mining (ICDM). 2017, pp 865–870. ↩↩↩
Cao H, Tan VYF, Pang JZF. A parsimonious mixture of gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE transactions on neural networks and learning systems 2014; 25: 2226–2239. ↩