Skip to content

Data Augmentation for Time Series

In deep learning, our dataset should help the optimization mechanism locate a good spot in the parameter space. However, real-world data is not necessarily diverse enough that covers the required situations with enough records. For example, some datasets may be extremely imbalanced class labels which leads to poor performance in classification tasks 1. Another problem with a limited dataset is that the trained model may not generalize well 23.

We will cover two topics in this section: Augmenting the dataset and application of the augmented data to model training.

Augmenting the Dataset

There are many different ways of augmenting time series data 24. We categorize the methods into the following groups:

  • Random transformations, e.g., jittering;
  • Pattern mixing, e.g., DBA;5
  • Generative models, e.g.,
    • phenomenological generative models such as AR 6,
    • first principle models such as economical models 7,
    • deep generative models such as TimeGAN or TS GAN 89.

We also treat the first two methods, random transformations and pattern mixing as basic methods.

Basic Methods

In the following table, we group some of the data augmentation methods by two dimensions, the category of the method, and the domain of where the method is applied.

Projected Domain Time Scale Magnitude
Random Transformation Frequency Masking, Frequency Warping, Fourier Transform, STFT Permutation, Slicing, Time Warping, Time Masking, Cropping Jittering, Flipping, Scaling, Magnitude Warping
Pattern Mixing EMDA10, SFM11 Guided Warping12 DFM7, Interpolation, DBA5

For completeness, we will explain some of the methods in more detail in the following.

Perturbation in Fourier Domain

In the Fourier domain, for each the amplitude \(A_f\) and phase \(\phi_f\) at a specific frequency, we can perform13

  • magnitude replacement using a Gaussian distribution, and
  • phase shift by adding Gaussian noise.

We perform such perturbations at some chosen frequency.

Slicing, Permutation, and Bootstrapping

We can slice a series into small segments. With the slices, we can perform different operations to create new series.

  • Window Slicing (WS): In a classification task, we can take the slices from the original series and assign the same class label to the slice 14. The slices can also be interpolated to match the length of the original series 2.
  • Permutation: We take the slices and permute them to form a new series 15.
  • Moving Block Bootstrapping (MBB): First, we remove the trend and seasonability. Then we draw blocks of fixed length from the residual of the series until the desired length of the series is met. Finally, we combine the newly formed residual with trend and seasonality to form a new series 16.

Warping

Both the time scale and magnitude can be warped. For example,

  • Time Warping: We distort time intervals by taking a range of data points and upsample or downsample it 4.
  • Magnitude Warping: the magnitude of the time series is rescaled.

Series Mixing

Another class of data augmentation methods is mixing the series. For example, we take two randomly drawn series and average them using DTW Barycenter Averaging (DBA) 5. (DTW, dynamic time warping, is an algorithm to calculate the distance between sequential datasets by matching the data points on each of the series 517.) To augment a dataset, we can choose from a list of strategies 1819:

  • Average All series using different sets of weights to create new synthetic series.
  • Average Selected series based on some strategies. For example, Forestier et al proposed choosing an initial series and combining it with its nearest neighbors 19.
  • Average Selected with Distance is Average Selected but neighbors that are far from the initial series are down-weighted 19.

Some other similar methods are

  • Equalized Mixture Data Augmentation (EMDA) calculates the weighted average of spectrograms of the same class label10.
  • Stochastic Feature Mapping (SFM) is a data augmentation method in audio data11.

Data Generating Process

Time series data can also be augmented using some assumed data generating process (DGP). Some methods, such as GRATIS 6, utilize simple generic methods such as AR/MAR. Some other methods, such as Gaussian Trees 20, utilize more complicated hidden structures using graphs, which can approximate more complicated data generating processes. These methods do not necessarily reflect the actual data generating process but the data is generated using some parsimonious phenomenological models. Some other methods are more tuned toward detailed mechanisms. There are also methods using generative deep neural networks such as GAN.

Dynamic Factor Model (DFM)

For example, we have a series \(X(t)\) which depends on a latent variable \(f(t)\)7,

\[ X(t) = \mathbf A f(t) + \eta(t), \]

where \(f(t)\) is determined by a differential equation

\[ \frac{f(t)}{dt} = \mathbf B f(t) + \xi(t). \]

In the above equations, \(\eta(t)\) and \(\xi(t)\) are the irreducible noise.

The above two equations can be combined into one first-order differential equation.

Once the model is fit, it can be used to generate new data points. However, we will have to understand whether the data is generated in such processes.

Applying the Synthetic Data to Model Training

Once we prepared the synthetic dataset, there are two strategies to include them in our model training 18.

Strategy Description
Pooled Strategy Synthetic data + original data -> model
Transfer Strategy Synthetic data -> pre-trained model; pre-trained model + original data -> model

The pooled strategy takes the synthetic data and original data then feeds them together into the training pipeline. The transfer strategy uses the synthetic data to pre-train the model, then uses transfer learning methods (e.g., freeze weights of some layers) to train the model on the original data.


  1. Hasibi R, Shokri M, Dehghan M. Augmentation scheme for dealing with imbalanced network traffic classification using deep learning. 2019.http://arxiv.org/abs/1901.00204

  2. Iwana BK, Uchida S. An empirical survey of data augmentation for time series classification with neural networks. 2020.http://arxiv.org/abs/2007.15951

  3. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of Big Data 2019; 6: 1–48. 

  4. Wen Q, Sun L, Yang F, Song X, Gao J, Wang X et al. Time series data augmentation for deep learning: A survey. 2020.http://arxiv.org/abs/2002.12478

  5. Petitjean F, Ketterlin A, Gançarski P. A global averaging method for dynamic time warping, with applications to clustering. Pattern recognition 2011; 44: 678–693. 

  6. Kang Y, Hyndman RJ, Li F. GRATIS: GeneRAting TIme series with diverse and controllable characteristics. 2019.http://arxiv.org/abs/1903.02787

  7. Stock JH, Watson MW. Chapter 8 - dynamic factor models, Factor-Augmented vector autoregressions, and structural vector autoregressions in macroeconomics. In: Taylor JB, Uhlig H (eds). Handbook of macroeconomics. Elsevier, 2016, pp 415–525. 

  8. Yoon J, Jarrett D, Schaar M van der. Time-series generative adversarial networks. In: Wallach H, Larochell H, Beygelzime A, Buc F dAlche, Fox E, Garnett R (eds). Advances in neural information processing systems. Curran Associates, Inc., 2019https://papers.nips.cc/paper/2019/hash/c9efe5f26cd17ba6216bbe2a7d26d490-Abstract.html

  9. Brophy E, Wang Z, She Q, Ward T. Generative adversarial networks in time series: A survey and taxonomy. 2021.http://arxiv.org/abs/2107.11098

  10. Takahashi N, Gygli M, Van Gool L. AENet: Learning deep audio features for video analysis. 2017.http://arxiv.org/abs/1701.00599

  11. Cui X, Goel V, Kingsbury B. Data augmentation for deep neural network acoustic modeling. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2014, pp 5582–5586. 

  12. Iwana BK, Uchida S. Time series data augmentation for neural networks by time warping with a discriminative teacher. 2020.http://arxiv.org/abs/2004.08780

  13. Gao J, Song X, Wen Q, Wang P, Sun L, Xu H. RobustTAD: Robust time series anomaly detection via decomposition and convolutional neural networks. 2020.http://arxiv.org/abs/2002.09545

  14. Le Guennec A, Malinowski S, Tavenard R. Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data. 2016https://halshs.archives-ouvertes.fr/halshs-01357973/document

  15. Um TT, Pfister FMJ, Pichler D, Endo S, Lang M, Hirche S et al. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. 2017.http://arxiv.org/abs/1706.00527

  16. Bergmeir C, Hyndman RJ, Benı́tez JM. Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation. International journal of forecasting 2016; 32: 303–312. 

  17. Hewamalage H, Bergmeir C, Bandara K. Recurrent neural networks for time series forecasting: Current status and future directions. 2019.http://arxiv.org/abs/1909.00590

  18. Bandara K, Hewamalage H, Liu Y-H, Kang Y, Bergmeir C. Improving the accuracy of global forecasting models using time series data augmentation. 2020.http://arxiv.org/abs/2008.02663

  19. Forestier G, Petitjean F, Dau HA, Webb GI, Keogh E. Generating synthetic time series to augment sparse datasets. In: 2017 IEEE international conference on data mining (ICDM). 2017, pp 865–870. 

  20. Cao H, Tan VYF, Pang JZF. A parsimonious mixture of gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE transactions on neural networks and learning systems 2014; 25: 2226–2239. 


Contributors: LM