Mobile Games' User Behavior(User Retention): A/B Testing

Table of Contents

Resources

First of all, thanks to Aurelia Sui for the dataset he shared on kaggle. The dataset is about an A/B test with a mobile game, Cookie Cats.

1. Project background

This original data is based on a project from Datacamp.

Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "connect three"-style puzzle game where the player must connect tiles of the same color to clear the board and win the level. It also features singing cats. Check out this short demo:

As players progress through the levels of the game, they will occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress. In addition to driving in-app purchases, these gates serve the important purpose of giving players an enforced break from playing the game, hopefully resulting in that the player's enjoyment of the game being increased and prolonged.

1.2 AB Testing Process

Here are main steps for AB testing process in this project:

  1. Understanding business problem & data
  2. Detect and resolve problems in the data (Missing Value, Outliers, Unexpected Value)
  3. Look summary stats and plots
  4. Approach 1: Apply hypothesis testing and check assumptions
    • Check Normality & Homogeneity
    • Apply tests (Shapiro, Levene Test, T-Test, Welch Test, Mann Whitney U Test)
  5. Approach 2: Apply computational based test(bootstrap)
  6. Evaluate the results
  7. Make inferences
  8. Recommend business decision to your customer/director/ceo etc.

2. About the data

2.1 Data description

(Data description from Aurelia Sui). The data is from 90,189 players that installed the game while the AB-test was running.

The variables are:

When a player installed the game, he or she was randomly assigned to either gate_30 or gate_40.

3. Analyzing Player Behavior

3.1 import packages for the project

3.2 read and check the data

3.3 Check player behavior data statistics

Note: For subplots, Plotly Express does not support plotly.subplots library, instead it supports faceting by a given data dimension using its parameters 'facet_col','facet_row'.

3.4 Check the outlier

3.5 remove the outlier

3.6 check data after removing the outlier

3.7 further details on player behavior

3.7.1 The users install the game but 3994 palyers never playered the games.

  1. Players just installed the game, but they were too busy to play, or forgot about it afterwards
  2. Players briefly played or opened the game, but didn't like it and process any further.
  3. Players were distracted and played other games instead.
  4. players were not really intesested about games, and decided to spend time on social media, watching videoes etc.

3.7.2 Between AB group, many players didn't even reach game round 30 or 40 where gate was set.

While testing whether gate make a difference on user retention rate, notice there were users didn't reach the game rounds with gate at all.

Plot the cumulative histogram of the number of players along game rounds

  1. 70% of players(63.301K if hover over the plot below) didn't reach game round 40.
  2. About 64% of players (57.562K if hover over the plot below) didn't reach round 30.
  3. Only 32.624k users played beyond game round 30, 26.885K users played beyond round 40.
  4. Need to point out for players whose didn't reach gate(below round 30 or 40), the data couldn't capture/measure the user expereice for such gate setting.
  5. Most users played the game only at very early stage and didn't player any further. Data shows 50% of users only reach round 16. More research need to be conducted on user churn rate,and reasons behind it.

4. Comparing 1-day retention

4.1 Basics on rentention data

  1. _retention1: did the player come back and play 1 day after installing?
  2. _retention7: did the player come back and play 7 days after installing?

The retention rate of players who came back in 1 day and 7 days after installing the app is 44.5% and 18.6%

4.2 AB testing

A common metric messuring how fun and engaging a game is is 1-day retention.

4.2.1 Hypothesis

Null hypothesis: Retention rate of group a is equal to group b.

H0: (Retention_1)a-(Retention_1)b=0

Alternative hypothesis: Retention rate of group a is statistical significant larger then group b.

H1:(Retention_1)a-(Rerention_1)b>0

Note:

we can test whether Ho hypothesis is true. H1 hypothesis can be be tested. If we reject H0 is true, then we accept H1 is true.

4.2.2 parametric based modeling AB testing

unpaired(independent) t-test

Note:

a. unpaired test is a much harder criteria than paired test. In order for you to be certain that that is not happening, you can't look at individual sample of anything, you have to look them and make very strong statements about whether or not these distributions are truly different.

b. Paired test is much more powerful, becuase it can look at how individual sample in your original distribution shifts to the new one.

  1. Calculate the difference between the two sample means, $(\bar P_{1} - \bar P_{2})$
  2. Under $H_{0}$ hypothesis, $$ \mu_{1} - \mu_{2}=0 $$
  3. Calculate the standard deviation of pooled value $$S_{pool} =\sqrt{ \frac{((n_{1}-1)S_{1}^2+(n_{2}-1)S_{2}^2)}{(n_{1}+n_{2} - 2)}}$$
  4. Calculte the standard error of the difference between the means $$SE(\hat P_{1}- \hat P_{2})=S_{pool}\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}}$$
  5. Calculate the T value: $$T=\frac{(\bar P_{1} - \bar P_{2}) - 0}{SE(\hat P_{1}- \hat P_{2})}$$, follow t-distribution with $(n_{1}+n_{2}-2)$ degrees of freedom
  6. Note: for the unpaired t-test to be valid, the sampels should be roughly normally distributed, and should have approximately equal variances. If the variances are absolutely unequal, we must use: $$SE(\hat P_{1} - \hat P_{2})=\sqrt{\frac{S_{1}^2}{n_{1}}+\frac{S_{2}^2}{n_{2}}}$$
  7. if $n_{1},n_{2}$ are resonablly large, then it approximately follows normal distribution $$\frac{(\bar P_{1}- \bar P_{2})}{SE(\hat P_{1} - \hat P_{2})} \backsim N(1,0)$$
  1. Hypothesis:

$H_{0}: P_{1}-P_{2}=0$

$H_{1}: P_{1}-P_{2}>0$

  1. Set significant level $\alpha$=0.5

  2. Calculate the variance of sample set $(\hat{P_{1}}-\hat{P_{2}})$.

Note: in order to estimate variance analytically, we assume Probablity metrics are having binomical distribution.

Variance of group A: $$\sigma^{2}_\hat{{P}_{1}} = \frac{\hat{{P}_1}(1-\hat{{P}_1})}{n_{1}}$$

Variance of group B: $$\sigma^{2}_\hat{{P}_{2}}= \frac{\hat{{P}_2}(1-\hat{{P}_2})}{n_{2}}$$

$$\sigma^{2}_{\hat{P}_{2}-\hat{P}_{1}} =\sigma^{2}_\hat{{P}_{1}}+\sigma^{2}_\hat{{P}_{2}} =\frac{\hat{{P}_1}(1-\hat{{P}_1})}{n_{1}}+\frac{\hat{{P}_2}(1-\hat{{P}_2})}{n_{2}}$$

Under $H_{0}$ hypothesis, assumping group a,b follows $(\mu, \sigma^{2})$ binomial distribution, with the same mean $\mu$, and variance $\sigma^{2}$.

$$\mu= P_{pool} = \frac{n_{1}\hat{P_{1}} + n_{2}\hat{P_{2}}}{n_{1}+n_{2}}$$$$\sigma^{2}_{\hat{P}_{2}-\hat{P}_{1}} =\sigma^{2}_\hat{{P}_{1}}+\sigma^{2}_\hat{{P}_{2}} =P_{pool}(1-P_{pool})(\frac{1}{n_{1}}+\frac{1}{n_{2}})$$
  1. For dataset $(P_{1}-P_{2})$, if $n_{1},n_{2}$ are resonablly large, given the condition:
    • np>5, and n(1-p)>5
    • or np>10, and n(1-p)>10

approximate the binomial distribution to a normal distribution when n is large engough, with a mean of $(\bar p_{1} - \bar p_{2})$, and a standard deviation of $$\sqrt{\sigma^{2}_\hat{{P}_{1}}+\sigma^{2}_\hat{{P}_{2}}}=\sqrt{P_{pool}(1-P_{pool})(\frac{1}{n_{1}}+\frac{1}{n_{2}})}$$

it approximately follows normal distribution can be written: $$\frac{(\bar P_{1}- \bar P_{2})}{SD(\ P_{1} - \ P_{2})} \backsim N(1,0)$$ with $$Z=\frac{(\bar P_{1}- \bar P_{2})-(\mu_{1}- \mu_{2})}{\sigma}=\frac{(\bar P_{1}- \bar P_{2})-0}{\sqrt{P_{pool}(1-P_{pool})(\frac{1}{n_{1}}+\frac{1}{n_{2}})}}$$

  1. Calculate the critical Z_value:

4.2.3 compatational based AB testing(bootstrapping)

5. Conclusion

Come back to the business problem!

In this project, AB test is conducted to analysis the result of players' hehaviors comparing where the gate is moved from level 30 to level 40, particularly 1-day retention rate is used as the evaluation index.

Firstly, we performed data sanitisation. There was no missing data, but removed one outlier from the dataset. We further comducted exploration analysis, checking data summary table and plots to understand the data structure.

Before A/B testing, insights on player behavior were generated regarding game_rounds. I pointed out the risks and drawbacks of A/B sampling in this test since 64% of players didn't reach level 30, and 70% of players didn't reach level 40. Including such players in the AB testing might not correctly indicate the impact of the gate, as such players didn't reach the gate at all.

Further used 1-day retention rate as an evalution test, both theoritical parameter-based and computation based solutions are used. Computation based solution(boostrap) is recommended. It is easy to implement, can be applied to any dataset, and no prerequisite for the dataset distribution.

Both solutions can't reject null hypothesis. No statistical difference of 1-day retention rate has been detedted between two groups about moving first gate from level 30 to level 40 for game rounds.