Using Time Series Models to Predict the Numbers of People Afflicted with (COVID-19) in Iraq, Saudi Arabia and United Arab Emirates

Covid-19 disease is an infectious disease caused by the newly discovered Coronavirus. There was no knowledge of this virus before an outbreak broke out in the Chinese city of Yuhan in December 2019.  The Corona epidemic has caused the world to go through a major challenge as it has claimed the lives of many people and also disrupted the economy in most countries of the world. This has prompted many researchers in various disciplines to conduct studies and research to stand in the face of this epidemic. It is known that statistical methods have great importance for all sciences The other that stood against this epidemic.In this paper, we use time series ARIMA models by Box- Jenkins  to predict the numbers of people afflicted with  (COVID-19) in Iraq, Saudi Arabia and United Arab Emirates and compare them based on a daily time series represent the numbers of people afflicted  in those countries for the period from 3/15/2020 to 4/5/2020 the emergence of that epidemic in those countries.


Introduction
Corona viruses are a wide range of viruses that may cause disease in animals and humans.It is known that a number of coronaviruses cause human respiratory diseases in severity ranging from common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and severe acute respiratory syndrome (SARS).The newly discovered Coronavirus causes Covid-19 disease.Box-Jenkins models are considered an important statistical method as they are used to represent time series data for a specific phenomenon and to predict its future values provided that the series is stationary and highly correlated.These models have been used in various economic, financial and medical sectors, and so on, as forecasting and decision-making are important matters in the planning process in all Domains.The objective of this paper is to use time series ARIMA models by Box-Jenkins to predict the numbers of people afflicted with (COVID-19) in Iraq, Saudi Arabia and United Arab Emirates and compare them based on a daily time series represent the numbers of people afflicted in those countries for the period from 3/15/2020 to 4/5/2020 the emergence of that epidemic in those countries.

Autoregressive Integrated Moving Average Model (ARIMA)
Box, Jenkins,( 1976) , described the model comprehensively and put together the method or approach of the information associated with understanding and treating stationaraty in the data and reached the model called autoregressive models and integrated moving averages in the event that the series is unstable, it can be converted into a series Stable by taking the differences of degree (d), (d = 1,2,…)and denoted by the symbol (ARIMA) and the form of the rank (p, d, q) written in the form of ARIMA (p, d, q) as the following form: d: The degree of difference taken in the time series B: backshift operator When taking the appropriate differences to convert the non-stationary time series into stationary, then the previous model can be written as follows: The time series is stationary if its data fluctuates around a constant average of the series, that is, there is no change in its mean and its variance, and therefore, the stationary series has mean and variance that does not depend on time t, The time series is either strictly stationary if the common distribution of observations   1 ,   2 , … ,    is the same distribution for observations   1+ ,   2+ ,… .,   + , this means that the distribution depends on the time period between the observations of the time series and not on the value of the real series observations, and thus the Zt series is a completely stationary series if it is

Autocorrelation Function:
The self-correlation is an indication of the strength of the relationship between the values of the variable itself at different lags (k), and its value ranges between (-1, 1) which is denoted by   , that is : - Where   compute by using the following formula: Where Zt: time series value at time t Z t+k : observed value after lag k N: -The size of the time series.
Whereas, plotting the autocorrelation coefficients with lag k, then is called Autocorrelation Function.

Partial Autocorrelation Function:
It is an indicator that measures the relationship between (Z t ) and (Z t+k ) for the same series, assuming that the series values are fixed and can be calculated according to the following formula: when plotting the partial autocorrelation coefficients (ϕkk) with lags (k), then the is called partial autocorrelation function.

Box-Jenkins Model building:
a-Identification: the first step in time series analysis is to draw it to determine whether the time series is stationary or non-stationary as they fluctuate around several averages, seasonal effects, or outliers values .
For this purpose we compute the autocorrelation and partial autocorrelation coefficients.To represent the time series data by the models ARIMA (p, d, q) we must diagnosis the order of p,d,q which can be made using several methods as follows: i. Akaike Information(AIC): It is a standard used to diagnose the degree of the ARMA model (p, q) and according to the following formula ii. Schwarz Criterion(SIC): b-Estimation: after the proposed model that represents the time series data under study has been identified and the appropriate rank has been determined for it, the parameters of the chosen model are estimated and often the main reason for estimating the model is to use it to calculate future predictions of the time series, there are several methods of estimation

 Maximum likelihood
To estimate the parameter parameters (ARMA) (p, q), we use the maximum and the aggregate function of the parameters by validating the observations, which are  The Method Of Moments  Ordinary Least Squares (OLS) c-Diagnostic Checking: For the purpose of diagnostic checking the proposed model that representing the time series data, we must calculate the residual is according to following : - Which should be random, unbound variables, by testing then the null hypothesis: H0: ρ = 0 against the alternative hypothesis H1: ρ ≠ 0. There are several tests as follow: In 1970 Box and Pierce reached a statistic by which the ARIMA (p, d, q) model diagnostic validity could be tested and assuming that we had m of the estimated autocorrelations of the residuals   (a ) that distributed a normal distribution with a mean of zero and variance 1 / N it is misfit where, n: represents the number of observations for the identified model, n = N -d N: original number of time series observations d: represents the numer of differences taken to achieve the stationarity Then Q calculated is compared with ( 2 ) tabular with a degree of freedom (m-p-q).If the calculated Q is smaller than tabular, it does not reject the null hypothesis, that is, random errors are not correlated and therefore the model is appropriate and good, but if it is larger than the model is inappropriate and in this case the stage must be repeated The first is to diagnose another model to represent the time series, estimate its parameters, and check it.

Box-Ljung (Q)
Ljung & Box modified the original Q-test formula proposed by (Box & Pierce) as follows: and they proved that they have the advantage in use, because are close to the expected values.
3. If the residuals autocorrelation coefficients are within the confidence limits of the 95% confidence level,−1.96 d-Forecasting: One of the primary objectives of time series analysis is prediction.When diagnosing the model and estimating its parameters and then the stage of examining the relevance of the model to time series data, it becomes ready to use for prediction as it is appropriate and matches the original data if it has the minimum mean squares of the prediction error For example, if we want to predict the value of the time series in the period (t + L), which is  ̂t(L), this value is calculated by taking the conditional prediction of (Z) at time (t + L).

The application side:
Data was collected, of three time series, each series consists of (51) observations , for the period from 3/15/2020 to 4/5/2020, and that these data represent the numbers of people afflicted with coronavirus disease in Iraq, Saudi Arabia and the United Arab Emirates , taken from data sheets World Health Organization.
i-Coronavirus afflicted series for Iraq: At this stage, data is prepared by drawing the time series, evaluating the autocorrelation and partial correlation coefficients, as well as the confidence limits of the autocorrelation function of the original data to know the behavior of that data, using the statistical program ( ), through Figure (1), which represents the time series data for the number of injuries in Iraq we note an increasing trend with time and that the variance tends to be stable, which indicates that the series is non-stationary and to make the series to be stationary ,we take the first difference as shown in Figure (3).
Thus, it is noted that there is no trend, no seasonal effects, and all autocorrelation coefficients for the sample within the confidence limits (−0.27 ≤   ≤ 0.27)as in Figure (2) and test the significance of the coefficients for the autocorrelation function using (Ljung & box) after taking the first difference so its value (24.161) was less than the tabular at the significance level(0.05) of (24.996) so we accept the null hypothesis.ii-Coronavirus afflicted series for Saudi Arabia The second time series related to the number of injuries in Saudi Arabia, we note an increasing trend with time as shown in Figure ( 4) so we can say that the series is non-stationary in mean , the first difference was taken the series to make it stationary as in Figure ( 6) and all the autocorrelation and partial autocorrelation coefficients of the sample within Confidence limits as in Figure ( 5) and the significance of the autocorrelation and partial correlation coefficients using the (Ljung & BOX) test where the test demonstrated its relevance.iii-Coronavirus afflicted series for United Arab Emirates the third time series related to the number of injuries in the United Arab Emirates, where an increasing trend was observed as shown in Figure (7) and thus the series is non-stationary , the first difference was taken to the series to make it stationary in mean as in

Diagnosis:
The first step in the construction stages of the time series model is to diagnose the model.Diagnostic criteria have been applied that depend on the curve shape of the sample partial autocorrelation function (ACF) and the shape of the partial autocorrelation function curve (PACF) and when matching the values of the autocorrelation and partial autocorrelation coefficients of the time series After taking the first difference with theoretical behavior, a function curve (ACF) is observed that gradually decreases with increasing displacement periods K.
1.The appropriate form for the first time series whose data represent the number of casualties in Iraq is ARIMA (2,1,2) 2. The appropriate model for the second time series whose data represents the number of cases of coronary disease in Saudi Arabia is ARIMA (1,1,1).
3. The appropriate model for the third time series whose data represent the number of cases of corona disease in the United Arab Emirates is ARIMA (0,1,1).

The test
For the random residual series test, the autocorrelation and partial autocorrelation coefficients of the estimated residues were calculated as shown in Figure ( 10), all the coefficients (r k (a ) ) fall within the confidence limits (−0.27 ≤r k (a ) ≤0.27).For the purpose of making sure of the suitability of the model, the test statistics (Ljung & Box) were applied and showed that the tabular value is greater than the calculated value and for all models, and this indicates the randomness of the residuals of these models.
2 , … . .,   ) = ( 1+ , …. ,  + ) and for all the correct k values and at n≥1 where ( =) indicates that the random vectors have the same common distribution function.The series is stationary from the second degree, meaning it has a weak stability (Weakly Stationary) if the next half is achieved the expected value to Zt is constant for all values of   .The covariance matrix for the variables   1 , … … . .,    is the same as the covariance matrix for the variables (  1+ , … … … ,   + ), this means that the change function depends on the time interval between the observations, i.e (  ,  + ) =  () = ((  − μ)( + − ) Where k is the lag , which is the time between observations.

Figure ( 3 )
Figure (3): autocorrelation and partial autocorrelation coefficients after taking the first difference.

Figure( 6 )
Figure(6): autocorrelation and partial autocorrelation coefficients after taking the first difference Figure (9),we note that all the autocorrelation and partial correlation coefficients of the sample within the confidence limits as in Figure (8) and the significance of the autocorrelation and partial correlation coefficients using the (Ljung & BOX) test, where the test demonstrated its relevance.

Figure( 8
Figure(8): autocorrelation and partial autocorrelation coefficients after taking the first difference.

Table (
EstimationAfter verifying the suitability of the model, testing the significance of the parameter, and testing the homogeneity of variance, the next step comes from the stages of building a model of time series is to estimate the models for those series and by applying the most estimate methods where the following results were obtained: