Categories
Data-Driven Personas Persona System Personas

How APG Assigns an Age to a Data-Driven Persona

The following is a post from the APG Team’s summer 2020 intern, Jaad Mohammed.
——–

In this article, we will be explaining how we determine the age of a data-driven persona.

To get a comprehensive understanding of this process, we will take you through a gist of how the APG system creates a persona profile whose final goal is to accurately represent the audience segment of a social media platforms (Facebook, Instagram, YouTube or Google Analytics) by analyzing user interactions with the specified account holders content.

For this, APG needs to gather detailed information regarding the interactions of users with each of the online content pieces on the corresponding platform.

This data is only available to the owners of a particular social media channel (e.g., YouTube channel) or social media account (e.g., Facebook Page, Instagram Account) and not available to the general public. Given the account holder’s permission, APG can gather this data via the Application Program Interface (API) of these platforms.

As the data from these online platforms are aggregated, APG must disaggregate it. To do this, we develop a matrix representing users’ interaction with online content, such as videos. This matrix is implemented by non-negative matrix factorization (NMF), a process that looks something like shown in Figure 1.

Figure 1 Matrix decomposition using NMF. Matrix V is decomposed into W and H. g denotes demographic groups in the dataset, c denotes product units, p is the number of latent behaviors of demographic groups over product units, and ε is the error term.
Figure 1 Matrix decomposition using NMF. Matrix V is decomposed into W and H. g denotes demographic groups in the dataset, c denotes product units, p is the number of latent behaviors of demographic groups over product units, and ε is the error term.

Where V denotes the user’s interactions to contents, W is a g × p matrix, H is a p × c matrix, and ε is an error term. Here, g denotes demographic groups in the dataset, c indicates various contents, p is the number of latent behaviors of demographic groups over multiple contents. [1]

NMF is mainly intended for reducing the complexity of large datasets by finding hidden factors.[2]

A row in W represents how each user group can be characterized by different consumption patterns also known as base personas. A column in W shows how a common base persona is associated with different user groups. Thus, for each column, the user group with the most significant coefficient can be interpreted as a “representative” user group for that corresponding persona.

Next, we determine the demographics of the representative user groups. This depends on how one defines user groups in V. The most efficient way is to use the data broken down by demographics when building V. If a row in V maps to a group defined as age group, gender, country, a format which social analytic tools often provide, then it is trivial to find representative demographics of a persona.

Which essentially means, information regarding the personas age is already present to us in the beginning stages of the persona creation process! i.e., when we collect data from an online platform’s API.

Reference

  • Jung, S., An, J., Kwak, H., Ahmad, M., Nielsen, L., and Jansen, B. J. (2017) Persona Generation from Aggregated Social Media Data  ACM Conference Extended Abstracts on Human Factors in Computing Systems 2017 (CHI2017). Denver, Colorado. p. 1748-1755. 6-11 May.
  • Daniel D. Lee and Sebastian H. Seung, Learning the Parts of Objects by Non-Negative Matrix Factorization, Nature, vol. 401, pp. 788-791, 1999.

By Jim Jansen

Dr. Jansen is a Principal Scientist in the social computing group of the Qatar Computing Research Institute, and a professor with the College of Science and Engineering, Hamad bin Khalifa University, and an adjunct professor with the College of Information Sciences and Technology at The Pennsylvania State University. He is a graduate of West Point and has a Ph.D. in computer science from Texas A&M University, along with master degrees from Texas A&M (computer science) and Troy State (international relations). Dr. Jim Jansen served in the U.S. Army as an Infantry enlisted soldier and communication commissioned officer.

One reply on “How APG Assigns an Age to a Data-Driven Persona”