3. Hurst Exponent Estimation
One of the statistical measures used in
to classify the time series is Hurst exponent. Random series is recognised by
H=0.5 while H>0.5 indicated reinforcing series in trends. When two
consecutive data intervals are very high then the consistance of the signal is
negative. The value of H=0 denotes that the time series
is a white noise whose autocorrelation function (ACF) decreases rapidly with
delay.. For this, the upcoming values have a
tendency to return to a long-term mean. Hence it becomes slower than
standard Brownian motion. With an increase in the tendency in the time series,
the value of H will tend to 0. The signal contains short-range
dependent (SRD) memory that exhibits fractal behaviour. The ACF decreases
exponentially with lag and is relatively slower than that of the white noise,
and H=0.5 denotes that the time series will show Standard Brownian motion
through Markov chain feature. The ACF decay is slow compared to the
anti-persistent time series. Arbitrary fluctuations are seen in the signal. Irregularity in
behaviour will appear with the difference in the various data points of the
time series. When the value of H lies within the range of 0.5-1.0
then it shows that with an increase in the successive data intervals the
persistency of the signal shows positive behaviour.
The Hurst value will tend towards 1. The signal shows long-range
dependence (LRD) and non-periodical cycle. LRD unlike the SRD series exhibits
similar statistical properties at different scales (lower or higher). The ACF
decays hyperbolically and is slower compared to standard Brownian motion. The
consistency of the signal is smooth.When the value of H is equal to 1.0 then the
time series appears to be perfectly smooth and the ACF comes to a constant
level.
Different estimators
for the estimation of the Hurst Exponent of any signal or data are available.
In this paper, two Hurst estimation methods have been used. The very recent
method, Rescaled Range (R/S) analysis has been used along with traditional Generalized
Hurst Exponent (GHE) estimation method. The Rescaled Range method is used for
statistical measurement of a time series. Its aim is to provide an estimation
of how the variability of a series changes with the length of the time-period. GHE
provides the best finite sample behaviour among all the methods in respect of
the bias and lowest variance. GHE is suitable for any data series/signal
irrespective of the size of its distribution tail.
3.1. R/S Analysis:
R/S analysis (Rescaled Range analysis) was initially coined by Harold
Edwin Hurst in the year 1951. This method can be implemented in a program by
providing a direct estimation of the Hurst Exponent. The Hurst Exponent is a
precious indicator of the state of randomness of a time-series.
Given a time-series with n elements
X
, X
,…,X
, the R/S
statistic is defined as:
=
Where
,
is the
arithmetic mean and
is the standard deviation from the mean.
With this R/S value, Hurst found a generalization of a result in the
following formula:
E
= C
as n
Where H is the Hurst exponent.From there, it is clear that an estimation
of the Hurst exponent can be obtained from an R/s analysis.
3.2. Generalized Hurst Exponent (GHE) method:
This method was
coined by (Hurst,
Black, & Sinaika, 1965) defines a function
as
Where
is the time series.pis
the order of the moment of distribution and
is
the lag which ranges between
and
.
Generalised Hurst Exponent (GHE), is related to
through a power law:
Depending upon
whether it is
independent of p or not, a time series can be
judged as uni-fractal or multi-fractal (Matteo, 2007)
respectively. The GHE h
yields the value of original Hurst Exponent
for
,
i.e.
.
3. Test for Stationarity of Non-Stationarity:
3.1.Kwiatkowski–Phillips–Schmidt–Shin
(KPSS) tests:
Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests
(Kwiatkowski, Phillips, Schmidt, & Shin, 1992)are used for testing a null hypothesis to check
whether the observable time series is stationary
or termed stationary or is non-stationary.This
test is used as a complement to the standard tests in analyzing time series
properties.
The KPSS test is based on linear
regression. The time series is broken down into three parts: a deterministic trend
(?t), a random walk
(rt), and a stationary error (?t), with the regressio
equation:
xt = rt + ?t + ?1
If the data is stationary, it will have a fixed
element for an intercept or the series will be stationary around a fixed level (W.Wang, 2006).
The test uses OLS to find
the equation, which differs slightly depending on whether you want to test for
level stationarity or trend stationarity. A simplified version, without the
time trend component, is used to test level stationarity.
3.2.
Continuous Wavelet Transform (CWT) test:
Realword data or signals are frequently
exhibit slowly changing trend or oscillations punctuated with transient. Though
Fourier Transform (FT) is a powerful tool for data analysis, however it does
not represent abrupt changes efficiently. FT represents data as sum of sine
waves which are not localized in time or space. These sine waves oscillate
forever, therefore to accurately analyse signals that have abrupt changes, need
to use new class of functions that are well localized with time and frequency.
These bring the topic of wavelets.
The primary objective of the Continuous Wavelet Transform
(CWT) is to get the signal’s energy distribution in the time and frequency
domain simultaneously.The continuous wavelet
transform is a generalization of the Short-Time FourierTransform (STFT) that
allows for the analysis of non-stationary signals at multiple scales.Key
features of CWT are time frequency analysis and filtering of time localized
frequency components. The mathmetical equation for CWT is given below(Shoeb & Clifford, 2006):
C (a,
) =
(
) x(t) dt
Where C(a,
) is the function of the parameter a,
.
The a parameter is the dilation
of wavelet (scale) and
defines a
translation of the wavelet and indicates the time localization, ?(t)
is the wavelet. The coefficient
is an energy
normalized factor (the energy of the wavelet must be the same for different a
value of the scale).
4.
Results & Discussion
The values of
Hurst exponents for the two time series a) daily
dropped calls and b) daily busy hour call initiated has been calculated using
the three methods, VGA, HFD and GHE which are being tabulated below in Table 2.
Table
2: Hurst
parameter
values for daily
dropped calls and daily busy hour call initiation
Hurst exponent (H)
Methods
Daily dropped calls
Daily busy hour
Initiated calls
R/S
0.2707
0.2405
GHE
0.2461
0.1565
The Hurst
exponents for both the series are less than 0.5. The Hurst exponent for daily
busy hour initiated calls is lower than that of the daily dropped calls.
These results claim the anti-persistent behaviour of both of them i.e. their
future values have the tendency to revert to their long-term mean with the
daily busy hour initiated calls profile has more tendency to return to its mean
compared to the daily dropped calls profile. Since there are the tendencies for
both the profiles to return to their respective mean, it can be said that there
must be some driving forces which bring back the series towards their means
when the profiles deviate from the mean (the most stable position of any
fluctuation). This implies that some negative feedback system must be working
which continuously try to stabilise the profiles. Moreover these low values of
H signify that both the signals
have short-range dependent (SRD) memory. The self similar nature in short scale
for both the times series is evident from this SRD phenomenon of them.
The SPWVD based time-frequency
spectrum for the two time series are shown in Figure 2 and Figure 3
respectively.
Figure
2 CWT for daily call initiation
Figure
3 CWT for daily call drop
Figure 3
undoubtedly indicates that the daily dropped calls frequency is varying with
time.So, daily dropped calls data set is non-stationary in nature.
Figure
2 shows that this signal is nearly stationary as the frequency contents do not
change with time. So it can be concluded that busy hour initiated calls data
are stationary. In a non-stationary signal the frequency contents are the
functions of time i.e. they are not independent of time change. Frequency any
event signifies the number of events happen per unit time. So, it can be
inferred that the number call drops per unit time is not independent of time
but varies with time. In case of busy hour call initiation profile there are
nearly eight types of frequency contents as is evident
from figure 3 but all of them remains constant with respect to time. This can
be interpreted as the rates of busy hour call initiation is not varying with
time and hence proper modelling and forecasting of the busy hour call
initiation can be made easily.
5. Conclusion
One
of the statistical measures used in to classify the time series is Hurst
exponent. Using the value of H, the attributes within the time series can be
predicted: H=0: The time series is a white noise whose autocorrelation function
decreases rapidly with lag, a value of H in the range 0 – 0.5 indicates a
time series with long-term switching between high and low values in adjacent
pairs, meaning that a single high value will probably be followed by a low
value and that the value after that will tend to be high, with this tendency to
switch between high and low values lasting a long time into the future.
The persistency of the signal is negative (or anti) where the probability of
opposite trend between any two successive data intervals is very high. This
means that future values have a tendency to return to a long-term mean and
hence it is slower than classical/standard Brownian motion. If this
tendency is more in the time series, the value of H will be found to be closer
to 0. A value of H=0.5 can indicate a completely uncorrelated series, but
in fact it is the value applicable to series for which the autocorrelations at
small time lags can be positive or negative but where the absolute values of
the autocorrelations decay exponentially quickly to zero. Whereas H=1 denotes time
series ideally smooth and the autocorrelation function does not vary with lag
but settle to constant level signal has arbitrary fluctuation. If the
value of H is in this range 0.5–1, indicates a time series with long-term
positive autocorrelation, meaning both that a high value in the series will
probably be followed by another high value and that the values a long time into
the future will also tend to be high. The persistency of the signal is
positive where the probability of related trend between any two successive data
intervals is very high. The stronger the trend, the H value moves towards 1.