pages 795–802. IEEE, 2013. It is well known that supervised learning problems with ℓ1 (Lasso) and ℓ2 (Tikhonov or Ridge) regularizers will result in very different solutions. For example, the ℓ1 solution vector will be sparser and can potentially be used both for prediction and feature selection. However, given a data set it is often hard to determine which form of regularization is more applicable in a given context. In this paper we use mathematical properties of the two regularization methods followed by detailed experimentation to understand their impact based on four characteristics: non-stationarity of the data generating process, level of noise in the data sensing mechanism, degree of correlation between dependent and independent variables and the shape of the data set. The practical outcome of our research is that it can serve as a guide for practitioners of large scale data mining and machine learning tools in their day-to-day practice.