An Introduction to Cluster Analysis

February 15, 2018

What is Cluster Analysis?

Cluster analysis is a statistical method used to group similar objects into respective categories. It can also be referred to as segmentation analysis, taxonomy analysis, or clustering.

The goal of performing a cluster analysis is to sort different objects or data points into groups in a manner that the degree of association between two objects is high if they belong to the same group, and low if they belong to different groups.

Cluster analysis differs from many other statistical methods due to the fact that it’s mostly used when researchers do not have an assumed principle or fact that they are using as the foundation of their research.

This analysis technique is typically performed during the exploratory phase of research, since unlike techniques such as factor analysis, it doesn’t make any distinction between dependent and independent variables. Instead, cluster analysis is leveraged mostly to discover structures in data without providing an explanation or interpretation. 

Put simply, cluster analysis discovers structures in data without explaining why those structures exist. 

For example, when cluster analysis is performed as part of market research, specific groups can be identified within a population. The analysis of these groups can then determine how likely a population cluster is to purchase products or services. If these groups are defined clearly, a marketing team can then target varying cluster with tailored, targeted communication. 

Common Applications of Cluster Analysis 

Marketing

Marketers commonly use cluster analysis to develop market segments, which allow for better positioning of products and messaging.  company to better position itself, explore new markets, and development products that specific clusters find relevant and valuable.  

Insurance  

Insurance companies often leverage cluster analysis if there are a high number of claims in a given region. This enables them to learn exactly what is driving this increase in claims.  

Geology  

For cities on fault lines, geologists use cluster analysis to evaluate seismic risk and the potential weaknesses of earthquake-prone regions. By considering the results of this research, residents can do their best to prepare mitigate potential damage. 

Putting Clustering into Context

It’s easy to overthink cluster analysis, but our brains naturally cluster data on a regular basis in order to simplify the world around us. Whether we realize it or not, we deal with clustering in practically every aspect of our day-to-day lives.

For example, a group of friends sitting at the same table in a restaurant can be considered a cluster. 

In grocery stores, goods of a similar nature are grouped together in order to make shopping more convenient and efficient.

This list of events during which we use clustering in our everyday lives could go on forever, but perhaps it makes more sense to consider a more classic, archetypal example.

In biology, humans belong to the following clusters: primates, mammals, amniotes, vertebrates, and animals. In this example, note that as we move down the chain of clusters, humans show less and less similarities to the other members of the group. Humans have more in common with primates than they do with other mammals, and more in common with mammals than they do with all animals in general.

The Benefits of Cluster Analysis

Clustering allows researchers to identify and define patterns between data elements. 

Revealing these patterns between data points helps to distinguish and outline structures which might not have been apparent before, but which give significant meaning to the data once they are discovered.

Once a clearly defined structure emerges from the dataset at hand, informed decision-making becomes much easier.

The Different Types of Cluster Analysis

There are three primary methods used to perform cluster analysis:  

Hierarchical Cluster

This is the most common method of clustering. It creates a series of models with cluster solutions from 1 (all cases in one cluster) to n (each case is an individual cluster). This approach also works with variables instead of cases. Hierarchical clustering can group variables together in a manner similar to factor analysis

Finally, hierarchical cluster analysis can handle nominal, ordinal, and scale data. But, remember not to mix different levels of measurement into your study.

K-Means Cluster

This method is used to quickly cluster large datasets. Here, researchers define the number of clusters prior to performing the actual study. This approach is useful when testing different models with a different assumed number of clusters.

Two-Step Cluster

This method uses a cluster algorithm to identify groupings by performing pre-clustering first, and then performing hierarchical methods. Two-step clustering is best for handling larger datasets that would otherwise take too long a time to calculate with strictly hierarchical methods. 

Essentially, two-step cluster analysis is a combination of hierarchical and k-means cluster analysis. It can handle both scale and ordinal data, and it automatically selects the number of clusters.

What Does The Clustering Process Look Like?

Step #1: Build and Distribute a Survey

Your survey should be designed to include multiple measures of propensity to purchase and the preferences for the product at hand. It should be distributed to your population of interest, and your sample size should be large enough to inform statistically-based decisions.

Step #2: Analyze Response Data

It’s considered best practice to perform a factor analysis on your survey to minimize the factors being clustered. If after your factor analysis it’s concluded that a handful of questions are measuring the same thing, you should combine these questions prior to performing your cluster analysis. 

After reducing your data by factoring, perform the cluster analysis and decide how many clusters seem appropriate, and record those cluster assignments. You’ll now be able to view the means of all of your factors across clusters.

Step #3: Take Informed Action!

Comb through your data to identify differences in the means of factors, and name your clusters based on these differences. These differences between clusters are then able to inform your marketing, allowing you to target precise groups of customers with the right message, at the right time, in the right manner.

  • Get started with Alchemer today.
    Try Alchemer Start my free trial
  • See How Easy Alchemer Is to Use
    See Help Docs
  • Start making smarter decisions

    Start a free trial