The Basics of Data-Driven Personas
Data-driven personas can be generated from almost any data. Essentially, the generation consists of two steps: (1) pattern seeking, and (2) enrichment.
The first part – pattern seeking – means that we need to identify some regularities (i.e., patterns) from the dataset. This is typically done using dimensionality reduction, e.g., clustering, principal component analysis, or matrix factorization.
The second part – enrichment – focuses on findings statistically robust associations between the patterns identified in the first step and secondary variables, such as demographics. These variables are then shown as representative information in the finalized persona profiles.
Three Data Types for Personas
The three main sources of data for persona generation are:
- survey-based data: this is data collected via a questionnaire from users or customers. According to our study, it is the most popular source of persona data in the literature.
- online and web analytics data: this is behavioral and demographic data collected from online analytics or social media platforms, typically using application programming interfaces (APIs). It is, in our opinion, the most potential data for persona generation at the moment.
- sensor-based data: this is data collected using hardware, such as GPS devices or medical sensors (e.g., FitBit). It is the most rarely used for of data. However, as Internet-of-Things (IOT) and medical/wellness sensors are becoming more popular, this data source will also increase in potential.
The data from these three sources is typically in numerical format. For surveys, Likert Scale (1-5) is often used. Online analytics platforms typically output count data (e.g., number of visits / views / clicks / purchases). Sensors typically output time-series data with commonly high frequency (sampling rate).
But, it is also possible to make use of textual data. More specifically, one can analyze social media comments for persona generation. In these efforts, natural language processing (NLP) techniques can be useful, e.g., to infer the persona’s topics of interest. In addition, researchers have applied quantification of qualitative data (e.g., interviews) by manually coding / labeling the data and then using the counts as the input for quantitative analysis.
Want to Learn More?
I hope this article provided useful information for you regarding the three main data types for personas. If you are interested in learning more, I suggest you check out our persona research for peer-reviewed research papers.