
Introduction
Reproducibility is an important contributor to measurement uncertainty.
It is a Type A uncertainty that should be included in every uncertainty budget. However, many people neglect to evaluate it. If you evaluate measurement repeatability, then you should evaluate reproducibility too.
Most people are familiar with repeatability and less familiar with reproducibility. When labs find out (during an assessment) their uncertainty budgets should include reproducibility, I get a lot of questions.
So, I decided to create a complete guide all about reproducibility.
In this guide, I am going to cover everything you need to know about reproducibility, including:
Table of Contents
If you need to evaluate reproducibility for your measurement uncertainty analysis, then keep reading. This guide is going to help you.
If you only want to know how to evaluate the results, then jump ahead to “How to Calculate Reproducibility.” I have included step-by-step instructions to make the process easy for you.
What is Reproducibility
According to the Vocabulary in Metrology, reproducibility is measurement precision under reproducibility conditions of measurement.
In the below image, you will see the definition (2.25) from the Vocabulary in Metrology.
To better understand the definition of reproducibility, focus on the keyword “reproducibility conditions of measurement.” I will tell you more about later in this document.
Why is Reproducibility Important
Reproducibility (in my opinion) is a better Type A uncertainty evaluation of performance compared to repeatability. Repeatability evaluates the short-term performance variability while reproducibility evaluates the long-term performance variability under various conditions encountered by the laboratory over time.
Over time, a laboratory will perform their testing or calibration activities under various conditions or measurements, such as different days, operators, methods, equipment, etc.
By evaluating these conditions, you can get a better estimate of measurement uncertainty for the laboratory’s activities.
Therefore, it is very important to perform reproducibility testing.
Reproducibility Conditions of Measurement
In the definition of reproducibility, you notice the phrase “reproducibility conditions of measurement.”
This is important because it helps you understand the difference between repeatability and reproducibility testing.
According to definition 2.24 of the Vocabulary in Metrology (VIM), you need the following conditions to perform a reproducibility test:
- Different Procedures,
- Different Operators,
- Different Measuring Systems,
- Different Operating Conditions,
- Different Locations, and
- Different Replicate Measurements on Same or Similar Objects
In the below image, you will see the definition of reproducibility conditions of measurement from the VIM.
Unlike repeatability, where all conditions of measurement are the same, reproducibility requires the conditions of measurement to be different. Therefore, you will need to change factors that significantly contribute to measurement uncertainty.
However, I only recommend evaluating one condition at a time (per ISO 5725-3) to avoid confounding results.
What Reproducibility Conditions Should You Evaluate?
In the table below, you will find the most commonly evaluated reproducibility conditions. Also, I included valuable information to help you pick the condition that is best for you.
In the below sections, I have provided more details about each reproducibility conditions.
1. Different Operators/Technicians
This is the most recommended condition to change for reproducibility testing. Some of the largest uncertainties occur from the inconsistencies between operators.
This option is best (for most labs) when there is more than one qualified technician. Therefore, it is recommended to pick two or more qualified technicians and have them independently perform the test or measurement. Then, their results can be evaluated to determine operator-to-operator reproducibility.
2. Different Days
This reproducibility condition is best for labs with only one qualified operator and one measurement system. Typically, this method is recommended for single operator labs.
When testing this condition, you will want to perform the test or measurement on two or more different days. For example, a technician will perform the same test or measurement on Monday, Tuesday, and Wednesday. Then, the results can be evaluated to determine day-to-day reproducibility.
3. Different Methods/Procedures
This condition is best for labs that regularly use more than one method for their testing or calibration activities. This will allow you to evaluate the intermediate precision of selecting different methods.
Evaluating the reproducibility between methods can be helpful. However, this condition is typically overlooked even though it is more common that most people think.
For example, here are three common scenarios:
- Example 1: A calibration laboratory that has 2 different procedures (using the same method of comparison) to calibrate a pressure gauge.
- Example 2: A chemical testing laboratory that prepares solutions using either the gravimetric or volumetric method.
- Example 3: A microbiology laboratory that inoculates plates with different culture medias.
4. Different Equipment
This condition is best for labs with multiple (similar) measurement systems or workstations. In this scenario, you are evaluating the uncertainty associated with the random selection of a measurement system or workstation.
This option is great for laboratories with two or more similar measurement systems. However, you may want to consider evaluating operator to operator reproducibility.
In my experience, it is likely that the uncertainty associated with operators is larger than different measurement systems.
5. Different Environments
This condition is best for labs that perform testing and(or) calibration activities in the laboratory and in the field (i.e. at the customer site).
It can help you evaluate the uncertainty between controlled and uncontrolled environments.
However, most labs create two sets of uncertainty budgets. One set of budgets that evaluates uncertainty for measurements performed in the laboratory, and another set of budgets for measurements performed in the field.
This way, they can showcase their (typically) better measurement capabilities in the laboratory versus their field activities.
Other Conditions
In the table below, you will see an excerpt from ISO 5725-3 that lists conditions of measurement (i.e. factor) that you can evaluate for reproducibility testing.
Many of the conditions I previously listed are included in this table. However, there are a few more I did not cover that you may be interested in.
Below, you will see another table from ISO 5725-3 that provides rationale for specific conditions of measurement.
Reproducibility Testing Scheme
One-factor balanced experiment design
When you need to carry out a reproducibility test, you should use an experiment design. It will help you control your testing scheme and ensure consistent results that can be easily evaluated.
Additionally, an experiment design will help you replicate your reproducibility testing when you need to repeat it in the future.
Most accredited labs will only test one-factor at a time. Therefore, I recommend using a one-factor balanced fully nested experiment design.
It is a simple experiment design where you will need to specify the following:
- Level 1: Measurement function and value (to evaluate),
- Level 2: Reproducibility Conditions (to evaluate), and
- Level 3: Number of repeated measurements (under each condition)
In the image below, you will see a visual representation of this experiment design that shows the levels and parameters of the scheme.
The gray boxes represent the ability to expand the experiment to add additional conditions or additional samples under each condition.
Example of one-factor balanced experiment design
Hopefully, the previous image is helpful. However, just in case you need more, here is an example of a common set-up for a repeatability and reproducibility testing scheme.
- Level 1: Measurement function and value – 1 in Gage Block with a Caliper
- Level 2: Reproducibility Condition – Operators, and
- Level 3: Number of repeated measurements – 10 each
In the image below, you will see an example of how I typically set-up a repeatability and reproducibility testing scheme. For your benefit, I marked-up the image with details to help you see the parameters of each level in the scheme.
Hopefully, you find this example helpful.
If you set-up your repeatability and reproducibility testing schemes this way, it will make analyzing the results much easier. In the next section, I will show you three different methods you can use to calculate reproducibility.
How to Calculate Reproducibility
There are several methods to calculate reproducibility.
If you were to ask several experts how to evaluate reproducibility, you will likely get a variety of responses based on their experience and expertise.
Personally, I prefer to evaluate reproducibility as a standard deviation. This is based on the definition of reproducibility from the Vocabulary in Metrology and the ISO 5725, part 1. Both documents refer to reproducibility as a standard deviation.
In the image below, you will see the definition of reproducibility from the ISO 5725. As you can see, the standard document refers to reproducibility as a “standard deviation.”
The viewpoint of reproducibility as a standard deviation is further supported by the GUM or JCGM 100:2008.
- Section 4.2 – covers the evaluation of Type A uncertainties and refers to estimating the experimental standard deviation.
- Appendix B.2.16 – gives another definition for reproducibility where Note 3 states it can be expressed as the dispersion characteristics of the results.
- Appendix H.5 – provides an example for the analysis of variance that shows how to perform repeatability and reproducibility evaluations.
- Examples – Many of the examples in the GUM express Type A uncertainties as variance (i.e. the square root of the variance is the standard deviation).
However, to be fair, there are a lot of statistical treatments that can be applied to experimental data. Several definitions of reproducibility include notes stating “dispersion characteristics of the results,” or similar.
Therefore, if you research “measures of statistical dispersion,” you can find many types of evaluations, such as:
- Range,
- Standard deviation,
- Variance,
- Coefficient of Variance (CV), and
- many more.
In the sections below, I am going to show you how to calculate reproducibility as both a standard deviation and a range. Each of the methods given are supported by ISO standard documents which should serve as objective evidence if your evaluations ever come into question (This is common).
Whichever method you decide to use, make sure you include note in your uncertainty budgets that specifies the method you used and the reference document it came from. This helps assessors and other observers ensure that you used appropriate methods to evaluate measurement uncertainty.
Reproducibility per ISO 5725-3
The most common method used to calculate reproducibility can be found in ISO 5725-3.
The document specifies how to calculate the reproducibility standard deviation by evaluating intermediate precision.
This is done by evaluating one reproducibility condition of measurement at a time. Most of the time, reproducibility between operators (e.g. technicians) is evaluated. However, other conditions may be evaluated based on a laboratory’s operations and available resources.
Now, this evaluation of reproducibility is considered easy for most people. On a 5-point difficulty scale (where 1 is easy and 5 is hard), I rate this evaluation as a 2 (for most people).
Even though the evaluation is considered easy, I have broken down the process into simpler steps to help you perform the calculations.
Read the sections below to evaluate reproducibility.
Calculate Reproducibility (ISO 5725-3) Step-by-Step
Follow the instructions below to calculate reproducibility per ISO 5725-3:
- Select the test or measurement function to evaluate,
- Determine the requirements to conduct the test or measurement,
- Determine the reproducibility condition to evaluate,
- Perform the test or measurement under:
- condition A,
- condition B,
- if applicable, additional conditions, and
- Evaluate the results.
Select the Test or Measurement Function
Pick a test or calibration measurement function to evaluate the reproducibility of results. Typically, you will want to pick the parameter from the laboratory’s scope of accreditation because this process will be used to evaluate type A uncertainties for an uncertainty analysis for ISO/IEC 17025 accreditation.
Requirements to Conduct Test or Measurement
Determine the requirements to perform the test or measurement. This can include, but is not limited to, the following:
- Personnel,
- Equipment,
- Reference Standards,
- Method,
- Environmental Conditions,
- Item Under Test,
- etc.
Reproducibility Conditions
Determine the reproducibility condition to evaluate. This can include, but is not limited to, the following conditions:
- Operators,
- Days,
- Equipment or Standards,
- Methods,
- Environmental Conditions
The most commonly recommended condition to evaluate is the reproducibility between operators. However, choose the condition that is most appropriate for you laboratory.
According to ISO 5725-3, it is best to evaluate one condition at a time. Otherwise, you may end up with results that are confounded (i.e. mixed up where the effects of each contributor are too difficult to tell apart).
If you want to evaluate more than one condition (at a time), you may want to consider using a Full Factorial experiment design. It will allow you to efficiently evaluate more than one condition while allowing you to evaluate their independent effects and interactions with other factors.
Performing the Test or Measurement
Independently perform the test or measurement under each condition. Make sure that the test or measurement is independently done, from start to finish, for each condition.
Some professionals call this a ‘true replicate.’ While there is no official definition for this term, you can find it used in many papers and presentations.
Evaluate the Results
Evaluate the results of the reproducibility test by calculating the standard deviation of the results under different conditions.
The below image is the standard deviation formula from ISO 5725-3, section 6.2.1; the simplest approach to evaluate intermediate precision (i.e. reproducibility standard deviation) within one laboratory.
You can perform this evaluation in Microsoft Excel or Google Sheets using the following formula:
Tip for Evaluating Reproducibility Results
Many times, repeatability and reproducibility are evaluated at the same time. If you are evaluating both (at the same time), then calculate reproducibility by calculating the:
- Mean or average of each data set,
- Standard deviation of the mean or averages from the previous step.
Reproducibility Example
For example, imagine two technicians independently perform the same measurement 10 times each. To determine reproducibility, first calculate the mean or average of each technician’s results. This should give you two results; an average result for each technician.
Next, calculate the standard deviation of the two calculated average values. The result will be the reproducibility between operators.
Reproducibility per ISO 5725-2
ISO 5725-2 gives a technique for determining the repeatability and reproducibility standard deviations of a measurement method. ISO 21748 also uses this same technique to evaluate reproducibility.
When a laboratory needs to evaluate the measurement uncertainty of a measurement method, this technique is used to:
- Determine repeatability and reproducibility, or
- Determine the homogeneity of materials.
Now, this evaluation of reproducibility is not easy for most people. On a 5-point difficulty scale (where 1 is easy and 5 is hard), I rate this evaluation as a 4 (for most people).
Therefore, I have broken down the process into simpler steps to help you perform the calculations.
Read the sections below to evaluate reproducibility.
Calculate Reproducibility (ISO 5725-2) Step-by-Step
Follow the instructions below to calculate reproducibility per ISO 5725-2:
- Select the test or measurement function to evaluate,
- Determine the requirements to conduct the test or measurement,
- Determine the reproducibility condition to evaluate,
- Each participant shall independently perform the measurement(s),
- Evaluate the results:
- Calculate the Mean Square within groups,
- Calculate the Mean Square between groups, and
- Calculate Reproducibility.
Select the Test or Measurement Function
Pick a test or calibration measurement function to evaluate the reproducibility of results. Typically, this performed for method validation but can be used to evaluate type A uncertainties for an uncertainty analysis for ISO/IEC 17025 accreditation.
Requirements to Conduct Test or Measurement
Determine the requirements to perform the test or measurement. This can include, but is not limited to, the following:
- Personnel,
- Equipment,
- Reference Standards,
- Method,
- Environmental Conditions,
- Item Under Test,
- etc.
Reproducibility Conditions
The reproducibility condition for this type of test evaluates the reproducibility between conditions.
For ISO 17034 accredited reference material producers, this evaluation is used to evaluate within bottle and between bottle homogeneity when estimating uncertainty per ISO 33405.
Evaluate the Results
Evaluate the results of the reproducibility test by calculating the square root of the difference between the between condition mean square and the within condition mean square.
This is not an easy process for most people. So, I am going to break it down into three stages for you. These are:
- Calculating the Mean Square Within Conditions,
- Calculating the Mean Square Between Conditions, and
- Calculating the Reproducibility Standard Deviation.
Calculate the Mean Square, Within Conditions
Follow the instructions below to calculate the Mean Square within conditions:
- Calculate the Standard Deviation for each run.
- Calculate the Degrees of Freedom for each run.
- Multiply the Degrees of Freedom and the Standard Deviation for each run.
- Calculate the Sum of Squares for each run.
- Calculate the Total Sum of Squares.
- Calculate the Total Degrees of Freedom.
- Calculate the Mean Square Within Conditions.
Calculate the Mean Square, Between Conditions
Follow the instructions below to calculate the Mean Square between conditions:
- Calculate the Mean for each run.
- Calculate the Grand Mean (i.e. Mean of all results).
- Calculate the Squared Deviation for each run.
- Count the Number of Samples for each run.
- Multiply the Number of Samples and Squared Deviation for each run.
- Calculate the Total Sum of Squares.
- Calculate the Total Degrees of Freedom.
- Calculate the Mean Square Between Conditions.
Calculate the Reproducibility Standard Deviation
Follow the instructions below to calculate the Reproducibility Standard Deviation:
- Find the Mean Square Within Conditions.
- Find the Mean Square Between Conditions.
- Subtract the Mean Square Between Conditions by the Mean Square Within Conditions.
- If the result is less than zero, then replace the result with zero. Otherwise, use the result (from the previous step).
- Calculate the Reproducibility Standard Deviation by finding the square root of the result (from the previous step).
Reproducibility from a Range of Measurement Results
Alternatively, there is a different method for estimating uncertainty due to reproducibility.
This method is highlighted in ISO 376 and ISO 6789.
Instead of evaluating reproducibility as a standard deviation, these methods evaluate reproducibility as a range of measurement results.
Similar to other methods, labs will perform a series of reproducibility tests with different operators or runs. However, instead of evaluating it as a standard deviation, these methods evaluate reproducibility as a range of measurement results.
Specifically, the difference between the maximum and minimum results.
Reproducibility per ISO 376
Below, you will find an excerpt from ISO 376. In this method, reproducibility is evaluated as a relative uncertainty (i.e. unit or measurement is relevant to the result) in percentage from the difference between the maximum and minimum results.
Reproducibility per ISO 6789
Below, you will find an excerpt from ISO 6789. In this method, reproducibility is evaluated as an absolute uncertainty (i.e. unit or measurement is same as the result) from the difference between the maximum and minimum results.
How to Calculate the Reproducibility from a Range of Values
Follow the instructions below to calculate reproducibility:
- Select the test or measurement function to evaluate,
- Determine the requirements to conduct the test or measurement,
- Determine the reproducibility condition to evaluate,
- Perform the test or measurement under each condition, and
- Evaluate the results:
- Find the Maximum Result in the series,
- Find the Minimum Result in the series,
- Subtract the Maximum Result by the Minimum Result, and
- Calculate Reproducibility by dividing the result (from the previous step) by the square root of 3.
This method determines the bounds or limits (i.e. maximum and minimum) of the measurement results and evaluates the uncertainty in accordance with the JCGM 100:2008, section 4.3.7 (i.e. Rectangular distribution).
The problem with this evaluation (to me) is reproducibility is evaluated as a Type B uncertainty instead of a Type A uncertainty.
I do not fully agree with this method. However, it is published in a standard method and should be followed by laboratories who are accredited to perform this activity.
For the benefit of anyone looking to evaluate reproducibility, I decided to include the method in this guide. It is a valid way to evaluate reproducibility, just not the method I would default to.
Frequently Asked Questions
Below, you will find answers to common questions I receive about measurement reproducibility.
What is reproducibility?
A: According to the Vocabulary in Metrology, it is measurement precision of repeated measurements under different conditions. In simple terms, it is a statistical evaluation of the ability to consistently produce repeatable results under different measurement conditions.
How is reproducibility different from repeatability?
A: Repeatability is an evaluation of a processes’ ability to produce consistent results under similar measurement conditions while reproducibility is an evaluation of a processes’ ability to produce consistent results under different measurement conditions.
Why do we need to include reproducibility in our uncertainty budget?
A: According to many reputable organizations and documents, reproducibility is considered a more realistic Type A uncertainty over time because most labs will reproduce test and measurement results under various conditions.
How to calculate reproducibility?
A: There are several different methods for calculating reproducibility. Refer to one or more of the following:
- ISO 5725-2,
- ISO 5725-3,
- ISO 21748,
- ISO 376, or
- ISO 6847
Should you remove repeatability from reproducibility?
A: Only remove repeatability from reproducibility if you are following the method in ISO 5725-2.
Does reproducibility include repeatability?
A: Not always. If you evaluate reproducibility in accordance with ISO 5725-2, then the answer is yes. If you follow another method, such as ISO 5725-3, then no. ISO 5725-3 evaluates intermediate precision which is not a holistic view of measurement precision under various conditions.
Conclusion
Reproducibility is an important contributor to estimating measurement uncertainty. It should be included in every uncertainty budget even though it is omitted in some ISO and ASTM methods.
In this guide, you should have learned:
- What is Reproducibility?
- Why reproducibility is important?
- How to design a reproducibility test?
- How to calculate reproducibility?
Hopefully, this guide helps you perform a reproducibility test and analyze the results.
I promise. Once you do it, it becomes easy to repeat. Also, you will begin to see opportunities to improve your measurement results and reduce measurement uncertainty.
Finally, I hope this guide answers your questions. If not, feel free to email me your questions. I will be happy to answer your questions and use the information to update this guide.