# How to Perform A Repeatability Test 2:

## Collecting Data

The number of samples collected during experimentation has an effect on the analysis of results and the validity of the test data collected. Over the years, I have observed several conflicting accounts of how many samples one should collect during experimentation. According to many introductory statistics textbooks, the number of samples collected depends on the population. A small population is considered a collection of 30 samples or less, while a large population is considered a collection of more than 30 samples. However, more advanced subject areas of statistics will have multiple assertions in relation to the goals of analysis and the expected confidence in results. If you were to survey a group of professionals from different organizations and different industries, you would receive a collection of varied opinions. This is the result of differing world views; each person and organization is going to have different goals. Therefore, they will have a degree of varied opinions.

## Let’s Break It Down

Not sure if my theory is valid. Then, let me show you quantitative and qualitative results that support my opinion. Using a Monte Carlo simulation, I will generate a pool of random data that is supposed to conform to a specified level of confidence (i.e. 95.45%) exhibiting a Gaussian distribution. With this data, I will calculate the mean, standard deviation, and degrees of freedom and report the results for you to evaluate. From here, you can formulate your own opinion and chose to agree or disagree with me.

The Results

95.46% of trials exhibited one outlier or less

68.18% of trials exhibited at least one outlier

4.54% of trials exhibited more than one outlier

Notes

1| The numbers in the left column represent the sample number for each trial, totaling 22.

2| The numbers in the top row represent the trial number, totaling 22.

3| The upper and lower limits were quantified by calculating the sum and the difference of the mean and twice the standard deviation (i.e. 2-sigma).

4| The values that do not conform, or outliers, are the cells not highlighted in green.

5| Click the image to make it larger

Now that I have provided you some information and methods that you can use to determine the most efficient number of samples to collect for your repeatability experiments, how many samples will you collect?

Posted in: