JOINT PREDICTION OF CHRONIC CONDITIONS ONSET: COMPARING MULTIVARIATE PROBITS WITH MULTICLASS SUPPORT VECTOR MACHINES

We consider the problem of building accurate models that can predict, in the short term (2-3 years), the onset of one or more specific chronic conditions at individual level. Methods: We consider 5 chronic conditions: heart disease, stroke, diabetes, hypertension and cancer and build two different models that predict all possible combinations of these conditions. Covariates for the models include standard demographic/socio-economic variables, risk factors and the presence of the chronic conditions at baseline. The first model is the multivariate probit, chosen because it allows to model correlated outcome variables. The second model is the Multiclass Support Vector Machine (MSVM), a leading predictive method in machine learning, specifically designed to take advantage of correlated outcomes. We use data from the Social, Economic, and Environmental Factory (SEEF) study, a follow up to the 45 and Up study survey, that allows to observe 60,000 individuals in NSW, over age 45, twice over a period of two to four years. Lessons Learned: MSVM captured the correlations across chronic conditions much better than multivariate probits. While the specificity of the two methods are comparable, the sensitivity of MSVM is  about xx percentage points better than the one of multivariate probits. Since sensitivities are in general low, this translates in a large relative improvement, of approximately xx%. Implications: Researchers in the field of comorbidities, that requires studying joint distributions of events, would greatly benefit from using MSVM rather multivariate probits, especially since R packages for MSVM are widely available and easy to use. Accepted for poster presentation at the 9th Health Services and Policy Research Conference, Melbourne, Dec 7-9, 2015 
Author(s):
S. Ghassem Pour, F. Girosi