Normal-Compound Gamma priors with Count Data

Count data models have become very common in several disciplines in recent years. Since these types of models can often be studied incorrectly using OLS methods, several solutions have been proposed to address this problem. One of these methods the normal-scale mixture method with different types of priors of the scale parameter. The importance of this method is to solve the issue of the bias-variance tradeoff by adding a local scale parameter to reduce the variance at the origin and reduce the bias at the tails. In this paper, a compound-gamma prior is placed for the scale parameter and the relevant Gibbs sampler is solved for posterior inference. The comparison of the performance of the proposed model with some other existing methods using both very sparse and low sparsity simulated data shows that the proposed model performs very well. MSC..


Introduction
Consider the linear model where  = ( 1 , ⋯ ,   )  ,  = ( 1 , ⋯ ,   ),  = ( 1 , ⋯ ,   )  and  = ( 1 , ⋯ ,   )  ,defined by   ∼ (0,  2 ), are the vector of observations, the design matrix of covariates, the vector of unknown regression coefficient and the vector of errors, respectively.In this paper we assume that this model is concerned with count data.The study of Bayesian regression for count data has become an important area in this subject.Several studies have been proposed such as the study of the crash frequency and its influencing factors [17], the study of university credits and it is relationship to pre-enrolment assessment tests [7], the Malaysian motor insurance claim [5], socioeconomic factors and number of tuberculosis [16] and the application of count data to environmental epidemiology [11].These varies examples show the importance and the need to develop… The development of count data models in the framework of Bayesian regression was develop in several stages beginning in the 1970s to the modern day [14,6,13].The most notable of these stages is the work by [9] involving the use of a Poisson process to analysis regression models.Some of modern advances in this arena have involved the introduction of quantiles through conditional quantile functions [10] which requires some particular assumptions about the model while allowing the researcher to analyze the effect of the covariates on each quantile of the distribution [12].Since the dependent variable  defined by ( 1) is a generated using a Poisson countable process With only nonnegative integers then it is necessary to convert it to a continuous variable.In order to do this we use the jittering process presented in [12] by adding a uniform variable   to our dependent variable   then taking the log to produce the desired continuous variable In this paper, we will analyze the Bayesian regression framework in the present of count data with a normal scalemixture combined with a normal-compound gamma prior of the form where ( 1 , … ,   , ) is a compound-gamma distribution of order  and ( 0 ,  0 ) is the inverse gamma distribution with shape parameter  0 and scale parameter  0 .
The paper will be structured as follows: in section 2 the compound gamma prior will be introduced, in section 3 our sampler with be derived, in sections 4 and 5 we will use simulated and real data, respectively, to compare the accuracy of our model with other models.

The normal-compound gamma prior
The compound gamma prior can be written as Or alternatively [1] where  1 =  and  +1 =  is a constant.A more efficient way is to write this a product of multiple scale mixtures [1] with where (, ) is the gamma distribution with shape parameter  and inverse scale (rate) parameter .
Proof.The proof of this equivalence is giving in [1].The properties of this prior were extensively studied in [1].Most interestingly, it was shown that this prior is a generalization of various popular models (See [2,15,3] for  = 2 and [4] for  = 4).In particular, it was shown that this model works for data with different degrees of sparsity.This property is farther demonstrated in Figure 1.The above prior has been investigated in several scenarios and types of data.In this paper, it is our aim to derive the analyze the posterior inference in the area of count data.We notice that for small values of  1 , both in the cases of compounding two and four gamma distributions for there is singularity at zero and distribution mass is concentrated near zero.Thus, this shows how the model works for sparse data.This is shown by the thin lines in the graph.On the other hand, the thick lines show the prior for the non-sparse framework where more the distribution has at both tails of the distribution.In [2], it is shown that for  = 2 the EM algorithm can accurate sample the true sparsity (or density) of our data to get more insight to the value of our hyperparameters.

Posterior Inference
The full conditionals for  = 2 were calculated by [2] to obtain More generally for  ≥ 2, the full conditionals are given by [1] | ∼ (  , where We will set  0 =  0 = 10 −5 in the prior of  2 for estimating the hyperparameters   from the EM algorithm.The EM algorithm for finding the values of   will be used every few iterations of the MCMC algorithm.We will implement the self-adaptive normal compound model Monte Carlo EM (MCEM) algorithm.

Simulation Studies
To demonstrate the advantages of the proposed model, we will use simulated data to analyze the predicative ability of our prior and compare it to other published models.Specifically, the comparison will be applied with the Beta Prime model ( = 2) proposed by [3], our prior 10 with ( = 10), the Bayesian Lasso, the Bayesian adaptive Lasso (aLasso) and the elastic net (Enet).The simulated data will be compared with the mean squared error (MSE), the false positive rate (FPR) and the false negative rate (FNR).

Simulation 1
In our first simulation, we will study the simulated data generated with a very sparse model by setting  = (7, 0, 0, 0, 0, 0, 0, 0).From the results presented in Table 1 that averaged from  = 100 repeated simulations with 15000 iterations each, we can see that the results show that our prior produces better than other presented methods.Furthermore, we can see that the compound gamma prior gives the smallest MSE compared to all the other models presented.The hyperparameters are updated each 100 iterations using equation (11).Additionally, our model preforms very well in terms FPRs and FNRs which are necessary for selecting the best model for variable selection.
From the trace plots and histograms in Figure 2, we notice how well our prior converges compared to the stationary distribution.To study our model farther and get a deeper view of its behavior with different types of data will a similar size of covariates as the simulation above but we will decrease the sparsity of the model by setting  = (4.5, 2, 0, 3, 0, 0, 0, 8) with  = 100 repeated simulations with 10000 iterations each.The results are showing in Table 2.We see that the results farther prove what Figure 1 shows, namely, that as the sparsity increase the model performs better higher values of  and conversely, models with smaller values of  are better candidates for data with less sparsity.Similar to simulation 1, to the trace plots and histograms in Figure 3 show that our mode converges better than the stationary distribution.Similarly, the hyperparameters are updated each 100 iterations.

Conclusion
We have shown that our proposed method for count data using normal-compound gamma prior for the scale mixture in the framework of normal-scale mixture perform very well compared to other existing models such as the Beta Prime prior (NCG2) [3], The Bayesian lasso, the Bayesian adaptive lasso and the Bayesian elastic net.We aim to study this method further in the future with different types of data such as censored and quantile data.

Figure 1 .
Figure 1.Plot of the NCG prior with the solid, dashed represent  = 2 and  = 4, respectively, while the thick and thin lines represent larger and smaller values of  1 , respectively.