The problem with mean-centered personas (i.e., those that describe average, typical users) is the general problem with the mean: if half of your users are right-handed and half are left-handed, should your persona be middle-handed?
Obviously not. Instead, you need personas for both left- and right-handed users. This is what we mean we talk about diversity of personas – a good persona set is one that covers various different user types, not only their hypothetical amalgamation.
One idea to solving the problem of mean is to focus on range, not mean. In other words, so that the personas also capture extreme user attributes instead of only mean values.
Conceptually, a grid-based approach can be envisioned. “Grid” means that the persona generation algorithm first maps the range of each attribute, and then selectively chooses values for the constructed personas that ensure equal representation in decile/interquartile range.
For example, imagine three attributes (e.g., age, gender, country).
The number and composition of the generated personas needs to be such that the personas cover (a) upper extreme, (b) mid point, and (c) lower extreme of each persona attribute. In other words, there are elderly personas, middle-aged personas, and younger personas. Similarly, there are male and female personas, are those beyond the binary gender classification.
For a categorical variable such as country, the concept of range is difficult, because there is no numerically meaningful upper and lower range (whereas mode describes well the center point). To address this, some, more or less arbitrary, decision rule is needed to choose the representative countries without blowing up the number of personas. For example, one could group countries in the baseline user data by continent, and ensure that all continents in the user data are represented in the generated personas.
Hope this gave your interesting ideas! If you have feedback on this post, please reach out to me at: jsalminen(at)hbku(dot)edu(dot)qa. (Always happy to talk personas 🙂 )