Mixture Model of the Exponential , Gamma and Weibull Distributions to Analyse Heterogeneous Survival Data

Aims: In this study a survival mixture model of three components is considered to analyse survival data of heterogeneous nature. The survival mixture model is of the Exponential, Gamma and Weibull distributions. Methodology: The proposed model was investigated and the Maximum Likelihood (ML) estimators of the parameters of the model were evaluated by the application of the Expectation Maximization Algorithm (EM). Graphs, log likelihood (LL) and the Akaike Information Criterion (AIC) were used to compare the proposed model with the pure classical parametric survival models corresponding to each component using real survival data. The model was compared with the survival mixture models corresponding to each component. Results: The graphs, LL and AIC values showed that the proposed model fits the real data better than the pure classical survival models corresponding to each component. Also the proposed model fits the real data better than the survival mixture models corresponding to each component. Conclusion: The proposed model showed that survival mixture models are flexible and maintain the features of the pure classical survival model and are better option for modelling heterogeneous survival data. Original Research Article Mohammed et al.; JSRR, 5(2): 132-139, 2015; Article no.JSRR.2015.080 133


INTRODUCTION
Survival analysis is concerned with the investigation of a particular event happening within a given duration of time.Survival analysis is widely applied in many fields such as Medical studies, biology, social sciences, economics and engineering to mention a few.The most commonly used methods in survival analysis are the nonparametric methods.Pure classical parametric survival models are commonly employed in survival analysis; they are better option when the chosen distribution seems to fit the data properly.The Exponential, Gamma, and Weibull distributions are commonly used in the literature for modeling survival data [1][2][3][4].Survival mixture models are most appropriate for modeling survival data when the data are believed to be heterogeneous in nature.Recently, many research works employed the methods of survival mixture models to analyse survival data.A two component mixture model of Weibull distributions was proposed to anlaysed survival data where the parameters of the model were estimated by the weighted least squares method [5].A two component survival mixture model of Weibull distributions was proposed; where the parameters of the models were estimated by graphical approach [6].Also a new technique for evaluating the parameters of a two component survival mixture model of Weibull distribution was developed to analyse survival data [7].
The Expectation Maximization Algorithm (EM) was employed to evaluate the parameters of the Weibull-Weibull survival mixture model of two components and the EM stability was investigated [8].Two components survival mixture models of Gamma-Gamma, Weibul-Weibull and Lognormal-lognormal were proposed to analyse survival data.Model selection method was used to select the model which better represents the real data [9].A survival mixture of mixed distribution was employed for analyzing heterogeneous survival data.The mixed distribution model is a two components survival model of the Extended Exponential-Geometric (EEG) distribution where the EM was employed to estimate the model parameters [10].Few researchers considered survival mixture models of different distributions.A two component parametric survival mixture model of different distributions of Exponentiated Pareto and Exponential distributions was used to model survival data [11].Two components survival mixture models of different distributions consisting of an Exponential-Gamma, an Exponential-Weibull and a Gamma-Weibull models were proposed for analysing heterogeneous survival data by employing EM [12].
Three components parametric survival mixture models did not receive much attention.In a situation of an open heart surgery study; the risk of death after surgery was divided into three different time overlapping phases which are better analysed by a three component mixture model [13][14][15].A parametric survival mixture model of the Exponential, Gamma and Weibull distributions was considered to fit heterogeneous survival data.Simulated data were used to investigate the stability and consistency of the EM [16].In another study, model selection technique was employed to compare the parametric survival mixture model of the Exponential, Gamma and Weibull distributions with the parametric survival mixture model corresponding to each component [17].A three component parametric survival mixture model of Weibull distributions was proposed to model survival data by applying Bayesian estimation method [18].EM was usually employed on data believed to consist of some missing or unobserved observations [19].The parameters of survival mixture models are commonly evaluated by implementing the EM Algorithm [20,21].
In this study, real data were used to investigate the flexibility and appropriateness of a three component survival mixture of the Exponential, Gamma and Weibull distributions in modelling heterogeneous survival data.The arrangement of this article is as follows; in section two, the survival analysis and some important probability functions were highlighted.The development of survival mixture model of three components and the application of the EM in estimating the ML parameters of the model were discussed.Section three was devoted to data application to evaluate the parameters of the proposed model and the discussion of the result.Finally in section four the summary and conclusion were presented.

SURVIVAL ANALYSIS AND THREE COMPONENTS MIXTURE MODEL
Survival analysis deals with applying particular statistical methods to model and analyse survival data.The focus of interest is the occurrence of a particular event of interest within a given period of time.The response of primary interest is the random variable T which is non-negative and gives the survival time of an object or an individual.The survival time can be represented by three functions which are interchangeable The probability density function (pdf) is denoted by Where ) (t F is the distribution function of the random variable T. The graphical representation of the probability density function is frequently used in the literature, the graph of , is commonly referred to as the density curve.The area between the curve and the t axis of the nonnegative density function Which gives the probability of an individual will fail within a small interval , provided that the individual was alive until the beginning of that interval.
Pure classical parametric survival models are powerful methods in survival analysis.They are preferred when the chosen probability distribution appropriately represents the data.The Exponential, Gamma and Weibull distributions are among the most important and frequently used distributions in survival analysis [1,2,3,4].
The probability density function Where  is a scale parameter Where  is the shape parameter and  is the scale parameter (7) Where is known as the incomplete Gamma function.
Weibull Distribution Where  is the shape parameter and  is the In survival analysis, mixture models are frequently used because they are flexible.They are the best option where pure classical parametric survival models do not fit the data of heterogeneous nature [20,22].Survival mixture model of three components is used when it is believed that the data consist of three subpopulation or subgroups.Equation (10) represents a survival mixture model of three components.
Where s i '  represent the mixing probabilities of the three subpopulations with and W f , as defined in (4), (6)   and ( 8), represent the probability functions of the Exponential, Gamma and Weibull distributions respectively.
One of the most efficient and effective methods commonly used to estimate the ML estimators of finite mixture models is the EM Algorithm [21].

Let
The functions , is by the implementation of the Lagrange method.The mixing probabilities will be obtained by; The ML estimator of the parameter  of the Exponential distribution for the proposed model can be obtained by the equation ( 19) [12,16,17].
The maximum likelihood estimators of the parameters 1  and 1  of the Gamma distribution for proposed model are evaluated using equations ( 20) and ( 21) respectively [9,12,16,17].
Where r is the number of Newton-Raphson iteration within EM Algorithm and (.)  and (.) '  are a digamma and trigamma functions respectively.
The shape and scale parameters 2  and 2  of the Weibull distribution in the proposed model are obtained by solving the equations ( 22) and ( 23) respectively [9,12,16,17].
Where , and r is the number of Newton-Raphson iteration within EM. (23)

REAL DATA APPLICATION AND DISCUSSION
The real data analysed in this section are the Kidney Catheter data which is included as one of the datasets in famous survival package [23] of the R statistical software [24].The data were studied originally in [25].The data give the recurrence times to infection, at the time of inserting catheters of kidney patients using portable dialysis equipment.It consists of 76 observations and 7 variables.The proposed model was used to analyse the data and then it was compared with the pure classical parametric survival models corresponding to each component using Log-likelihood (LL) and Akiake Information Criterion (AIC) value.Table 1.shows that the LL value of the proposed model is higher than that of the pure classical survival models and also the AIC value of the proposed model is lower than that of the pure classical survival models which makes the proposed model suitable for the real data used.
The proposed model was graphically compared with pure classical parametric survival models corresponding to each component of the mixture model.The probability functions of proposed model and the pure classical parametric survival models along with the histogram of the Kidney Cather data were presented in Fig. 1.Fig. 1.
shows that the proposed model analysed the real data better than the individual pure parametric survival models.
The Kidney Catheter data were used to compare the proposed model with the survival mixture models of the Exponential, Gamma and Weibull distributions, respectively to select the model that fits the data appropriately.Table 2. displays the estimated parameters of each model together with the LL and AIC values.It is observed that the proposed model represents the real data better than the other models.Also proposed model was compared graphically with the survival mixture models of the Exponential, Gamma and Weibull distributions, respectively.Fig. 2. shows the comparison of the density function of the proposed model with the other models.It is also observed that the proposed model represents the real data better than the other models.

CONCLUSION
This article proposed a survival mixture model of three components of the Exponential, Gamma and Weibull distributions to analyse survival data which is believed to be heterogeneous.Real data were used to estimate the parameters of the model.EM algorithm was employed in estimating the ML parameters of the proposed model.The comparison of the proposed model with the pure classical survival models and the survival mixture models corresponding to each distribution showed that the proposed model represents the data better than the other models.The proposed model showed that the survival mixture models are flexible and maintain the feature of pure classical survival models and they are better option to model heterogeneous survival data.
*Note: The R language version 3.0.2(2013-09-25) http://CRAN.R-project.org was used for all the calculations and graphs

COMPETING INTERESTS
Authors have declared that no competing interests exist.

2 )
Which estimated the probability of an individual surviving beyond a specified time t .The survival function calculated in the E-step will be maximized in the M-step of the EM under the condition

Fig. 1 .Fig. 2 .
Fig. 1.The probability density functions of proposed model and the pure classical distribution of the Kidney Catheter data