dimanche 15 juillet 2018

Creating an A/B Test Sample Size Calculator from Evan Miller's Simple Sequential A/B Testing Blog

To learn more about A/B Testing sample sizes selection I am attempting to use Evan Miller's popular blog-post to recreate a sample size calculator (https://www.evanmiller.org/sequential-ab-testing.html). However, there seems to be an error that prevents me from recreating the sample sizes given in article. What would you suggest I re-examine to find a solution to my problem?

This error must be in my calculation or my reading of the problem. Solving the constraint equations suggests a small sample size of ~6. The article suggests that the test statistic cutoff, which indicates if the treatment has a higher conversion rate than the control, is a function of the sample size. Then it list two inequalities to solve for one variable, N the sample size. How do I reproduce the sample size calculations?

Most of these problems would go away if the test statistic cutoff, d_star, wasn't a function of N.

d_star = z*Sqrt(N), N is the sample size and z is the normal z-value

However, On the tables appearing the last half of the article d_star = z*Sqrt(N) relates N and d_star very precisely, which suggests d_star varies with N.

Constraints given alpha and beta: Sum R(p=1/(1+delta)) > 1 - beta Sum R(p=1/2) < alpha

I'll append my Python 2.7 code and a plot for each constrain equation.

#### Begin Python Code to Calculate Sample Size ####
import random
import scipy.stats
import math
import sys
print sys.version

# Functions and helper functions to 
# calculate the sample size.
def calcStatPowerSum(N, script_delta):
    z = scipy.stats.norm.isf(alpha/2.0) #1.96 for alpha = 0.05
    d_star = z*(N**0.5)
    d_star = int(math.ceil(d_star))
    statPowerSum = 0
    for i in range(1, N+1):
        statPowerSum += (float(d_star)/i)*scipy.stats.binom.pmf((i+d_star)//2, i, 1.0-float(1)/(2+script_delta))
        # p and the (1-p) terms are reversed relative to the binomial distribution

    return statPowerSum

def calcCritValueSum(N, script_delta):
    z = scipy.stats.norm.isf(alpha/2.0) #1.96 for alpha = 0.05
    d_star = z*(N**0.5)
    d_star = int(math.ceil(d_star))
    critValueSum = 0
    for i in range(d_star, N+1, 2):
        critValueSum += (float(d_star)/i)*scipy.stats.binom.pmf((i+d_star)//2, i, 0.5)

    return critValueSum

def determineSampleSize(alpha, beta, script_delta):
    z = scipy.stats.norm.isf(alpha/2.0) #1.96 for alpha = 0.05
    d=1
    N=int(math.ceil(z*z))-1
    statPowerSum = 0
    critValueSum = 1
    while (statPowerSum <= 1 - beta or critValueSum >= alpha) and N<3000:
        d+=1
        N=int(math.floor(d*d/z/z))
        statPowerSum = calcStatPowerSum(N,  script_delta)
        critValueSum = calcCritValueSum(N, script_delta)

    return N


alpha = 0.05
beta = 0.8
lift = script_delta = 0.10
sampleSize = determineSampleSize(alpha, beta, script_delta)
print("beta:   ", beta, ", alpha:   ", alpha, ", sampleSize:   ", sampleSize)

##  The article suggests that N=2922 satisfies the constraint equations.
##  But the calculation suggests otherwise.
print("calcCritValueSum: ", calcCritValueSum(2922, script_delta))
print("statPowerSum:     ", calcStatPowerSum(2922, script_delta))

#### End Python Code ####

The following mathematica code reproduces the constraint plots as a function of d_star. Copy and pasting the code into https://sandbox.open.wolframcloud.com/ will generate these plots. Generating more than one plot at a time will exceed the free usage limit.

(*   alpha constraint Plot   *)
z=1.96    (*alpha=0.05*)
Table[Sum[(d/n)*PDF[BinomialDistribution[Floor[d^2/z^2], 1/2], (d+n)/2], {n, d, Floor[d^2/z^2], 2}],{d,1,130}]
ListLinePlot[%, PlotRange->All, AxesLabel->{d_star,prob},PlotLabel->alpha Constraint Plot]


(*   1-beta constraint Plot   *)
delta=0.10
z=1.96    (*alpha=0.05*)
Table[Sum[(d/n)*PDF[BinomialDistribution[Floor[d^2/z^2], (1+delta)/(2+delta)], (d+n)/2], {n, d, Floor[d^2/z^2], 2}],{d,1,130}]
ListLinePlot[%, PlotRange->All, AxesLabel->{d_star,prob},PlotLabel->1- beta Constraint Plot]

Aucun commentaire:

Enregistrer un commentaire