# Sampling and Sampling Bias

- Posted by lhmay
- on Apr, 22, 2018
- in Data Science
- Blog No Comments.

In statistics, **sampling bias** is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a **biased sample**, a non-random sample ^{}of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected.^{}If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

A self-selection bias can result when the non-random component occurs after the potential subject has enlisted in the experiment.

**How do you know if your sample have sampling bias?**

If you know the true mean of your population from which you sampled, you can take samples of sample multiple times and check if the mean of these samples are normally distributed around the true mean of the population.

**Python code for stratified sampling**

```
>>> import pandas as pd
>>> Meta = pd.read_csv('C:\\Users\\a578209\\Downloads\\so\\Book1.csv')
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> y = Meta.pop('Categories')
>>> y
0 Mobile
1 drugs
2 dvd
Name: Categories, dtype: object
>>> X = Meta
>>> X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42, stratify=y)
>>> X_test
ReviewerID ReviewText ProductId
0 1212 good product 14444425
```