Pregunta de entrevista

Entrevista de Data Scientist, Analytics

-

Meta

Lets say the population on Facebook clicks ads with a click-through-rate of P. We select a sample of size N and examine the sample's conversion rate, denoted by hat{P}, what is the minimum sample size N such that Probability( ABS(hat{P} - P) < DELTA ) = 95%. In other words (this is my translation), find the minimum sample size N such that our sample estimate hat{P} is within DELTA of the true click through rate P, with 95% confidence.

Respuesta

Respuestas de entrevistas

6 respuestas

15

Interpret the question this way: we want to choose an N such that P_hat is an element of [P - delta, P + delta] with probability 95%. First, note that since P_hat is the sum of N Bernoulli trials with some common parameter (by assumption) that we are trying to estimate, we can safely assume P_hat to be normally distributed with mean equal to the true mean (P) and variance equal to (P)(1 - P) / N. Now, we when does a normally distributed random variable fall within delta of it's mean with 95% probability? The answer depends on how big delta is. Since P_hat is normally distributed, we know from our statistics classes that 95% of the time it will fall within 2 standard deviations of its mean. So in other words, we want [P - delta, P + delta] = [P - 2*SE(P_hat), P + 2*SE(P_hat)]. That is, we want delta = SE(P_hat). So what is the SE ("standard error") of P_hat? Well that's just the square root of its (sample) variance, or Sqrt(P_hat * (1 - P_hat) / N). But wait! We haven't run the experiment yet! How can we know what P_hat is? We can either (a) make an educated guess, or (b) take the "worst" possible case and use that to upper bound N. Let's go with option (b): P_hat * (1 - P_hat) is maximized when P_hat is .5, so the product is 0.25. To put it all together: delta = 2 * Sqrt(0.25) / Sqrt(N) = 2 * .5 / Sqrt(N) => N = (1 / delta) ^ 2. So when N is greater than (1 / delta)^2, we can rest assured that P_hat will fall within the acceptable range 95% of the time.

Will en

1

Why is the variance P(1-P) / N. Isn't it NP(1-P), because it is the binomial distribution (sum of Bernoulli trials)?

Katharina en

0

Use Chebyshev's inequality

Anónimo en

0

Rate has Poisson distribution, not Bernoulli. The mean equals the variance, SE = sqrt(P/N).

Anónimo en

0

Doen't the answer imply N >=1?

anonymous en

0

Hi, I am not sure I understand your solution. Could anyone explain more?

Anónimo en

Añadir respuestas o comentarios

Para publicar un comentario sobre esto, inicia sesión o regístrate.