Introduction to Data-Driven Personas

Introduction to Data-Driven Personas

Automatic Persona Generation (APG) is a system developed by the persona research team at Qatar Computing Research Institute. APG is defined both as a methodology and a system for automatic creation of personas from online analytics data.

Automatic Persona Generation has specifically been developed to address the limitations of manual persona creation. This blog post details the main benefits of data-driven personas compared to manually created personas.

Overall, any personas are useful for decision makers in companies and other organizations because personas give faces to customer data. Personas help team members understand the organization’s audiences, customers, and users and make customer-oriented decisions.

Main Benefits of Data-Driven Personas

Without further ado, here are the nine main benefits of data-driven personas:

  1. Giving Faces to Numbers
  2. Making Data Communicable
  3. Representing Big Data
  4. Always Up-to-Date
  5. Quick to Generate
  6. Behaviorally Accurate
  7. Providing Full-Stack Access
  8. Protecting Privacy of Individual Customers
  9. Cost-Effective to Create

1. Data-Driven Personas Give Faces to Numbers

Data-driven personas comprise a lot of numerical data on customers, but they present those numbers in an easy to understand format, i.e., personas. Most people don’t like working with numbers and even those that do, can’t work with a lot of numbers at one time.

However, many decisions do require using numbers, including numbers about customers. The APG system solves this problem by reducing the need for users of customer analytics to work with a lot of numbers while still providing access to the raw numbers if needed.

2. Data-Driven Personas Make Data Communicable

Personas have shown to be profitable during design, brainstorming, and other decision making in a variety of verticals, especially in communication among team members about customers.

While it is difficult to discuss and converse about a spreadsheet of thousands of data points, it is much easier to communicate about a person, since we as people engage in this personal level of communication habitually.

3. Data-Driven Personas Are Based on Representative Big Data

Traditional (manual) persona generation relies on qualitative data with a low number of observations, resulting in issues of representativeness and statistical validity. While for organizations with a low number of products or customer segments this might not pose a problem, for companies and other organizations engaged in large-scale online activities (e.g., content creation, social media, e-commerce), there is a need for representative, data-driven personas.

Consider, for example, a news organization that produces thousands of videos and has millions of views from people in over 150 countries (a real scenario) – how could qualitative methods accurately capture the variation between core audience segments?

Finding ways to acquire and efficiently analyze such data is the key aim of Automatic Persona Generation. In most cases, data-driven personas are representative, as these data-driven personas are based on analyzing the entire online analytics dataset.

Furthermore, APG is programmed to operate highly efficiently and can readily scale to millions of customers. Research has shown that automatically generated personas can scale to millions of content interactions among thousands of content pieces.

4. Data-Driven Personas Are Always Up-to-Date

Manually created personas are static, requiring laborious data collection every time the customer behavior changes. By employing data-driven personas, it is possible to create personas in real time, based on automated analysis of actual aggregated social media data, integrating data from Facebook, YouTube, and website channels of commercial organizations.

From these platforms, APG gathers demographic data and topical interests, leveraging up to hundreds of thousands of profiles and millions of user interactions, along with user insights representing interests and viewpoints. The resulting data-driven personas provide insights into competitive marketing, topical interests, and preferred system features for the users of online content and products.

Moreover, APG is responsive to changes in the underlying content consumption. By using a threshold parameter, data-driven personas can be generated automatically at given time intervals, e.g. on a monthly basis. These data-driven personas are kept updated through an automated loop of data collection and re-computation of the personas.

5. Data-Driven Personas Are Quick to Generate

While manual persona creation can take up to six months, data-driven personas can typically be generated within a matter of days. Due to fast runtime of the core algorithm and automated data collection, APG personas can be created even in a few minutes (given that a topical taxonomy exists).

6. Data-Driven Personas Are Behaviorally Accurate

The advantage of the APG system is that it uses reliable algorithmic methods to identify fine grain market segments and then, using this real user data, automatically generating appropriate attributes for personas descriptions.

We use non-negative matrix factorization (NMF) to infer patterns from customers’ interaction with online content, products, or any other target entity. This approach leads to more accurate personas that can then aid in better design and decision making. The data-driven personas are more accurate because they are based on actual user data, can reflect granular audiences, and can be easily updated.

7. Data-Driven Personas Provide Full-Stack Access

APG’s technology takes numbers, algorithmically identifies similar groups of customers, and automatically generates personas, all while still providing access to the underlying data. Therefore, data-driven personas provide a full stack analytics solution.

8. Data-Driven Personas Protect Privacy of Individual Customers

Automatically generated personas from online analytics data do not compromise the privacy of individual customers, since the data provided by the online analytics platforms (e.g., YouTube Analytics, Google Analytics, Facebook Insights) are always at an aggregated group level, rather than showing information on the individual users of those platforms. The use of aggregated data ensures that we only use non-personally identifiable information.

9. Data-Driven Personas Are Cost-Effective

Finally, data-driven personas are relatively cheap to produce, especially compared to manual persona creation projects done by consultancies and marketing agencies (these can range between 50,000-100,000 USD). Because of high degree of automation, there is less manual labor involved, making the personas cheap to produce.

In fact, our persona team’s dream is to democratize personas, providing an affordable access to small businesses, startups, and non-profit organizations to high-quality data-driven personas.

Because automatic personas use digital data, their creation, replication and distribution are considerably more cost-efficient than with personas created manually. When these cost savings are passed to client organizations, automatic personas can be accessed by small organizations, non-profits, and startups that often lack the means to create high-quality personas.

Conclusion on Benefits of Data-Driven Personas

Understanding online analytics numbers and relating them to key performance indicators can be challenging, especially with large volumes of data. Even though there are many online analytics tools (e.g., Google Analytics, Abode Analytics) that one can employ, these tools require analytical sophistication that many end users often do not have.

Additionally, online analytics tools often don’t reduce the complexity of numerical data for decision making and communication. Dealing with numbers poses cognitive challenges for individuals who often do cannot recall many numbers at a time, whereas a persona’s human attributes are more easily remembered.

Therefore, personas seem to provide an ample alternative for presenting numerical data. The best use of personas in online analytics is, therefore, combining numbers and human attributes to create dynamic, accurate, and constantly updated data-driven persona profiles.

Using data-driven personas has considerable potential for product and content development, as well as for marketing and strategic decision making. Real organizations are using personas to increase their performance.

Although qualitative persona creation can be data-driven to some degree, it is not efficient in processing big data volumes of online and social media analytics. The APG approach is also more cost-effective than alternative methods of persona creation and can scale to millions of customers.

The bottom line: Better personas => better decisions => better results

Jung, S., An, J., Kwak, H., Ahmad, M., Nielsen, L., and Jansen, B. J.  (2017) Persona Generation from Aggregated Social Media Data. ACM Conference Extended Abstracts on Human Factors in Computing Systems 2017 (CHI2017). Denver, Colorado. p. 1748-1755. 6-11 May.

Contact the Authors

Interested in automatic persona generation for your company? Contact Dr. Jim Jansen:

Want more information? See …

Jansen, B. J., Salminen, J., Jung, S.G., and Guan, K. (2021). Data-Driven Personas. Synthesis Lectures on Human-Centered Informatics,1 Carroll, J. (Ed). Morgan-Claypool: San Rafael, CA., 4:1, i-317.

Read more about data-driven personas

Got too many personas? This approach can help!

Got too many personas? This approach can help!

Got too many personas? This approach can help!

Got too many personas? This approach can help!

Got too many personas? This approach can help!

Got too many personas? This approach can help!


Scroll to Top