Multilayer Perceptron Experiments on Sine Wave Part 2

Multilayer Perceptron Experiments on Sine Wave Part 2#

In the previous experiment, we showed that an MLP with ReLU network cannot extrapolate the sine function. Recent researches showed that ReLU networks tend to extrapolate linearly and so are not fit for extrapolating periodic functions Xu, et al, 2021.

To induce periodic extrapolation bias in neural networks, Ziyin, et al, 2020 proposed a simple activation function called “Snake activation” with the form \(x + \frac{1}{a}sin^2(ax)\) where \(a\) can be treated as a constant hyperparameter or learned parameter.

We’ve experimented on the Snake activation to see if it can fit and extrapolate a simple sine function. We also experimented how Snake activation compares against alternative, but similar-looking activation functions

\(sin(ax)\)
\(sin^2(ax)\)
\(x + \frac{1}{a}sin(ax)\)
\(x + sin^2(ax)\)
\(x + \frac{1}{a}sin^2(x)\)
\(x\)

Take-aways#

We performed an ablation study on how the activation function learns to extrapolate a sine function from a short segment of data.

It looks like Snake activation does perform best compared to alternative, but similar-looking activation functions. The activation with the closest performance to Snake is \(x + \frac{1}{a}sin(ax)\) that doesn’t square the sine.

Snake has more trouble “interpolating” (loose definition) a missing segment in a sine function compared to Snake activation. This could be a limitation of Snake. How is it that it can extrapolate outward but not “interpolate” (or more precisely, extrapolate between training data)

Interestingly, the model that used \(x + sin^2(ax)\) and \(x + \frac{1}{a}sin^2(x)\) “collapsed” to just predicting a near-horizontal line.

This tells us that the precise formulation of Snake is important and deviations from this formula can lead to collapse or worse extrapolation.

We also found the importance of selecting the \(a\) parameter for the Snake activation. If it is too small, then the model does not learn to extrapolate well. It seems for this task, Snake is not sensitive to very high values of \(a\)

Multilayer Perceptron Experiments on Sine Wave Part 2

Contents

Multilayer Perceptron Experiments on Sine Wave Part 2#

Extrapolation Experiment#

\(sin(ax)\)#

\(sin^2(ax)\)#

\(x + \frac{1}{a}sin(ax)\)#

\(x + sin^2(ax)\)#

\(x + \frac{1}{a}sin^2(x)\)#

\(x\)#

\(x + \frac{1}{a}sin^2(ax)\) (Snake)#

Interpolation Experiment#