A common hurdle that every data scientist will come at some point need to address is what machine learning model to use? At one end of the spectrum, you have simple models which are easy to interpret but less accurate and at the other end, you have models that are complex and difficult to read but provides higher accuracy.
Models that are easy to interpret are generally grouped as Parametric methods. These include models that make an assumption about the relationship between the features and the output as having a functional form. Regression models are generally considered as parametric methods. Example, linear regression. Other examples of commonly used parametric models include logistic regression, polynomial regression, linear discriminant analysis, quadratic discriminant analysis, (parametric) mixture models, and naïve Bayes (when parametric density estimation is used). Approaches often used in conjunction with parametric models for model selection purposes include ridge regression, lasso, and principal components regression.
There are models which seem like complex, black boxes which provide high accuracy and they don’t make assumptions about the functional relationship between features and predictors. These are called Non-parametric methods. A simple example of a nonparametric model is a classification tree. A classification tree is a series of recursive binary decisions on the input features. The classification tree learning algorithm uses the target variable to learn the optimal series of splits such that the terminal leaf nodes of the tree contain instances with similar values of the target.
Other examples of nonparametric approaches to machine learning include k-nearest neighbors, splines, basis expansion methods, kernel smoothing, generalized additive models, neural nets, bagging, boosting, random forests, and support vector machines.
Most machine learning applications tend to use non-parametric methods to reflect the underlying complexity of relationships.