Data science is a vast field to learn. The techniques used in data science help in extracting meaningful insights from data. These techniques also serve as a foundation for many other well-known algorithms. However, they do differ in terms of functionality and the results from one another. Here are the major differences between four frequently used techniques in data science.
Logistic Regression v/s Discriminant Analysis
Logistic regression is used to predict the probability of dichotomous dependent variables based on one or more independent variables that can be either continuous or categorical.
Discriminant analysis is a statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables).
Both logistic regression and discriminant analysis look similar, but here are the differences.
Logistic Regression: Is based on maximum likelihood estimation.
Discriminant Analysis: Is based on least squares estimation; equivalent to linear regression.
Logistic Regression: Estimates probability (of group membership) immediately (the predictand is itself taken as probability, observed one) and conditionally.
Discriminant Analysis: Estimates probability mediately (the predictand is viewed as a binned continuous variable, the discriminant) via a classificatory device (such as naive Bayes) which uses both conditional and marginal information.
Logistic Regression: Not so exigent to the level of the scale and the form of the distribution in predictors.
Discriminant Analysis: Predictors desirably interval level with multivariate normal distribution.
Logistic Regression: No requirements about the within-group covariance matrices of the predictors.
Discriminant Analysis: The within-group covariance matrices should be identical in population.
Logistic Regression: The groups may have quite different n.
Discriminant Analysis: The groups should have similar n.
Logistic Regression: Not so sensitive to outliers.
Discriminant Analysis: Quite sensitive to outliers.
Logistic Regression: Usually preferred, because less exigent / more robust.
Discriminant Analysis: With all its requirements met, often classifies better than BLR (asymptotic relative efficiency 3/2 time higher then).
Factor Analysis v/s Cluster Analysis
The main application of factor analysis is to reduce the number of variables and detect structure in the relationships between variables, that is to classify variables.
Cluster Analysis is a group of multivariate techniques whose primary purpose is to group objects (eg. respondents, products, or other entities) based on their characteristics. It is a means of grouping records based upon attributes that make them similar.
Factor Analysis: Dimension reduction technique.
Cluster Analysis: A classification technique.
Factor Analysis: Inter-dependent technique.
Cluster Analysis: There is no prior information about the group.
Factor Analysis: The objective is to explain correlation in a set of data and related variable to each other.
Cluster Analysis: The objective is to address heterogeneity in each set of data.
Factor Analysis: There are no types
Cluster Analysis: The main types are Hierarchical clustering, Partitional clustering (K-means Fuzzy K-Means, Isodata) and Density based Clustering (Denclust, CLUPOT, Mean Shift, SVC, Parzen)
Factor Analysis: Statistics associated include Correlation Matrix, Communality, Eigenvalue, Factor Loadings and Factor Scores.
Cluster Analysis: Statistics associated include Agglomeration Schedule, Cluster Centroid, Cluster Centres and Dendrogram.
Factor Analysis: Examples include understanding the characteristic of customers.
Cluster Analysis: Examples include grouping the customers into different clusters for comparison.
I hope the article helped you understand the basic difference between these four techniques. These techniques find a lot of use in a wide variety of industries and fields including marketing and market research. They help in understanding the customer base of companies and enable them to market products or solve business problems on a real-time basis.