We consider the problem of clustering hospitals based on their case-mix distributions. Hospitals belong to the same cluster if they offer the same mix of services and have similar demand for those services. The cluster labels can be used to control for case-mix in hospital level analyses. Methods: We obtained distributions of the 770 AR-DRGv7.0 for 148 de-identified private and public hospitals. The distance between two hospitals was defined as the mean square error between the two corresponding DRG distributions. The clustering algorithm of choice was Partitioning Around Medoids (PAM). The optimal number of clusters was found by minimizing  the Dunn validation index subject to the constraint that no cluster contains less than 3 hospitals, in order to avoid creating clusters of isolated outliers. Lessons Learned: The algorithm found 7 clusters, that is 7 groups of hospitals with similar case-mix. Some clusters were easily interpretable and were defined by being dominated by a narrow set of DRGs (psychiatric, neonatal, rehabilitation). Other clusters corresponded to hospitals that offered a broad set of services, but that differed significantly on the frequency of relatively few groups of DRGs. The 7 groups explained approximately 80% of the variance of length of stay across hospitals. Implications:  Clustering hospital DRG distributions produces highly stable hospital level variables that are driven purely by case-mix, easy to compute and with good predicting power. This offers an alternative to measures, such as the AIHW peer groups, which are based on a variety of variables in addition to case-mix. Accepted for oral presentation at the 9th Health Services and Policy Research Conference, Melbourne, Dec 7-9, 2015 
M. Hart, S. Ghassem Pour, F. Girosi