The following is a post from the APG Team’s summer 2020 intern, Jaad Mohammed.
In this article, we will be explaining how we determine the age of a data-driven persona.
To get a comprehensive understanding of this process, we will take you through a gist of how the APG system creates a persona profile whose final goal is to accurately represent the audience segment of a social media platforms (Facebook, Instagram, YouTube or Google Analytics) by analyzing user interactions with the specified account holders content.
For this, APG needs to gather detailed information regarding the interactions of users with each of the online content pieces on the corresponding platform.
This data is only available to the owners of a particular social media channel (e.g., YouTube channel) or social media account (e.g., Facebook Page, Instagram Account) and not available to the general public. Given the account holder’s permission, APG can gather this data via the Application Program Interface (API) of these platforms.
As the data from these online platforms are aggregated, APG must disaggregate it. To do this, we develop a matrix representing users’ interaction with online content, such as videos. This matrix is implemented by non-negative matrix factorization (NMF), a process that looks something like shown in Figure 1.
Where V denotes the user’s interactions to contents, W is a g × p matrix, H is a p × c matrix, and ε is an error term. Here, g denotes demographic groups in the dataset, c indicates various contents, p is the number of latent behaviors of demographic groups over multiple contents. 
NMF is mainly intended for reducing the complexity of large datasets by finding hidden factors.
A row in W represents how each user group can be characterized by different consumption patterns also known as base personas. A column in W shows how a common base persona is associated with different user groups. Thus, for each column, the user group with the most significant coefficient can be interpreted as a “representative” user group for that corresponding persona.
Next, we determine the demographics of the representative user groups. This depends on how one defines user groups in V. The most efficient way is to use the data broken down by demographics when building V. If a row in V maps to a group defined as age group, gender, country, a format which social analytic tools often provide, then it is trivial to find representative demographics of a persona.
Which essentially means, information regarding the personas age is already present to us in the beginning stages of the persona creation process! i.e., when we collect data from an online platform’s API.
- Jung, S., An, J., Kwak, H., Ahmad, M., Nielsen, L., and Jansen, B. J. (2017) Persona Generation from Aggregated Social Media Data ACM Conference Extended Abstracts on Human Factors in Computing Systems 2017 (CHI2017). Denver, Colorado. p. 1748-1755. 6-11 May.
- Daniel D. Lee and Sebastian H. Seung, Learning the Parts of Objects by Non-Negative Matrix Factorization, Nature, vol. 401, pp. 788-791, 1999.