GWS 2020: Clustering eras
This entry was posted on November 9, 2020.
The prior series of GWS2020 analyses focused primarily on a descriptive analysis of a few select topics or survey questions. Today, we dive a bit deeper and turn attention toward predictive analytics.
For now, we take the familiar 'favorite gaming period'-analysis one step further. While earlier analyses counted responses across various categories, this analysis attempts to infer relationships and tendencies between respondents and their top selection of game periods. To accomplish this task, the top five game periods for each respondent are identified and added to the study. Respondent's choices (the Top 5 per respondent) are aggregated and examined using statistical modeling techniques. The technique for this study is cluster analysis in which game period choices are grouped in such a way that 'game periods' (or settings) in the same group are more similar to each other than to game periods in another group(s).
Some possible questions to consider before diving into the analysis are:
- Based upon gaming period choice only, do distinctions between historical and fantasy gamers emerge?
- Do particular game periods tend to cluster together? If so, which?
- If distinct groups emerge from clustering, are these distinct groups intuitive?
- Do favorite game periods group into reasonable and explainable buckets?
- What can be inferred from this analysis?
First, using only data from Question 12, showing a respondent’s favorite game period (Top 5 only), these data are aggregated and classified using unsupervised machine learning. The result of this classification is illustrated in Figure 1. Each of the 21 game periods is represented in the dendrogram. Starting right and drawing a line across the first two branches of the dendrogram tree identifies two clusters of game periods (see Figure 2). What does this primary division suggest?
This initial clustering cleanly bifurcates favorite game periods into two, distinct groups. Based on the periods found in each cluster, we can infer that there exists a clear distinction between Historical and Non-historical gamers with respect to favorite periods chosen. Historical gamers tend to select other historical periods and non-historical gamers tend to stick with non-historical game periods. Of course, there will be cross-over but in broad terms, this holds. At a high level, this is a reasonable result. What if more granularity is wanted? What does the analysis suggest is the next clustering solution as focus moves farther out onto the dendrogram?
The next clustering solution as we move left crosses three branches as shown in Figure 3. What is this three-cluster solution? The three-cluster solution keeps Non-historicals intact but bifurcates historicals into two components. I label this break-out as Pre-1700 and Post-1700. Pike & Shotte, Medieval, Dark Ages, and Ancients are in the former and all other periods cluster into the latter. It may seem odd that Pike & Shotte finds itself grouped into Ancients/Dark Ages/Medieval but notice that there is a distinction between Pike & Shotte and its trio of Pre-1700 compatriots. Perhaps, this grouping is more focused on combat with hand weapons, primarily. That is, spear, pike, sword, and bow?
Many game periods remain within the Post-1700 cluster. Can this group be broken into meaningful components? For that, let's move to the five cluster solution as shown in Figure 4.In a five cluster solution, Non-historicals and Pre-1700 remain unchanged. The Post-1700 group is split into three parts. The three new clusters form the Age of Muskets/Rifles, Hollywood, and Modern. Of course, these are general names I give to each group for my own identification but more interesting (and precise) identifiers are possible. Suggestions?
The analysis could continue marching down the tree, pruning branches along the way to group game periods into even smaller groups, but I stop here for now.
What this exercise suggests is that aggregating the 10,783 survey respondents' answers to top choices of game periods brings forth underlying and hidden patterns. These seemingly natural groupings are brought about simply by examining respondent choices in 'game period'. Notice, once again, the clear distinction between non-historical and historical game periods that the analysis identifies using no more than personal choice. It is fascinating that distinct period clusters emerge from within this one analysis.
If there is interest, I can continue the analysis by climbing out on a limb to investigate these ever-smaller tree branches. Hopefully, I do not prune the branch upon which I am sitting.
As always, questions and comments welcome.