When analyzing a dataset sometimes we are working with data that does not represent the reality but it’s the result of a seleciton. This means that the seleciton made before the analysis is already creating a data bias.
For example if we work with panel data we must make sure that the panelist represent a balanced sample of the population. Hence, if the population we want to represent is 55% male and 45% female, we need to make sure our panel has the same distribution (55/45). It is not that simple (usually the sample will never be exactly representing the reality, determined with an establishment survey, and weighting factors are also applied) but for the sake of this explenation let’s keep it simple.
Another example: if we analyze an NPS score, we should make sure that the data was collected with an unbiased sample. How can a sample be biased? well, one example is very clear: if we collect an NPS score from a specific page, we need to make sure we select a proper page. For example a page that everyone has the chance to see, regardless if they purchase or not. So, placing the NPS survey on the Confirmation page of your ecommerce site, will collect biased data. Why? Because you are implicitely making a selection: you are asking the NPS survey only to people (customers) that have succesfully completed a purchase.
Let’s take a look at the two NPS score analysis below:
Those are both the results of a survey asked to the customers of an ecommerce retail website. The first one being delivered on the exit intent action from any page of the site except the cart page and the thankyou page. The second one was delivered on the ThankYou page post-purchase. We can clearly spot the difference in the results.
This is a clear example of how different results can get if your sample does not represent the entire population but just a biased sample of it. And yes, it is not just a selection bias itself. It becomes a selection bias if we cherry-pick the results we like (and, for example, internally to our organization we circulate the second NPS results instead of the first one).
Now you know it! Happy analysis!