Cluster Analysis: What it is and How to Use it

Derek Higgs, MPA, Researcher
February 12, 2020


A cluster analysis, or clustering, is an effective way to organize data into groups of similar characteristics and is performed near the beginning of the data analysis. This ‘clustering’ of data benefits the researcher by placing variables with similar attributes into a single group. Having similar variables together allows the researcher to better recognize the influencing power of a particular characteristic. So, if a researcher has an overwhelming mountain of data and is unsure where to begin, consider using a cluster analysis as a first step. Your OCD will thank you.

Researchers should view a cluster analysis more as a preliminary organizational tool, rather than an actual analysis. For example, I recently moved my family to a new house. Consider organizing your data by following the same method most of us use when moving our belongings from one house to another. If you imagine single variables within the data set as a single household item, a cluster analysis would be like putting all of the kitchen supplies together in a single box, all of the upstairs bathroom items into a separate box, and all of your wall art into its own, separate box. You are not moving the boxes to the new house yet, nor are you analyzing the quality and importance of each item; instead, you are simply taking a preliminary, organizational step to make your move easier. If you have ever moved houses in this manner, you have used the clustering analysis method!

I use cluster analyses in my own research projects. I am currently identifying high school course-taking patterns to measure how they influence a student’s level of ‘college readiness’. One of my first steps was to use a cluster analysis to group high school students together according to several demographics such as grade level, GPA, gender, school district, and their participation in college credit programs. By organizing the individual students into clusters based on a similar trait, it has been simpler for me to notice, for example, the relationship between a student’s GPA and their likelihood of being enrolled in a college credit program. It also helps the researcher keep track of all the data they are using. Every variable gets used to its max potential, and the risk of misplacing, or misusing, variables becomes minimal.

A cluster analysis differs from the popular ‘factor’ analysis because it groups variables together based on their similarities, as opposed to groups of randomly assigned variables. Cluster analyses have been used in marketing, medical sciences, and consumer behavior analysis. For example, credit card companies will use a cluster analysis to determine credit scores. The health care field uses cluster analyses to create patient profiles, which allows for better medical care from both the administrative and patient points of view.

Any researcher worth their weight in salt knows that organization is the key to a great data analysis. Using a cluster analysis is one of the best, and simplest, organizational methods for starting your analysis off on the right foot, and should be highly considered when at the beginning stages of your next data analysis project.