|eTRNG Statistical Test Results|
The output of eTRNG was extensively analyzed using a number of common statistical test suites used for assessing the randomness of random number generators.
The three most significant test suites used were Diehard,
and FIPS 140-2. The results of these tests are summarized below.
The Diehard test suite was developed in 1995 by Dr. George Marsaglia, Professor Emeritus of Statistics at Florida State University at the time. The Diehard test suite was
considered the gold standard for testing random number generator quality for many years and is still one of the top three random number test suites.
The Diehard test suite is run on a 10 Mbyte data set and produces over 210 scores referred to as p-values. Individual test failures are indicated by a p-value of 0 or 1 but there
is no hard pass/fail criteria for the suite of tests, it is up to the user to decide if the test scores indicate an acceptable level of randomness for their application. The Diehard results file contains
almost 900 lines of text (and the 210+ p-values) so the best way to interpret the results is graphically. Without real pass/fail criteria this is particularly useful when comparing the
results for two sets of data. In the graph below on the left are the Diehard results for a data set produced by a well-known hardware true random number
generator (HW TRNG) based on the decay of a radioactive isotope. The graph in the center shows the results for a data set generated using eTRNG. The graph on the right
shows the results for Mersenne Twister, a software PRNG that is generally considered to be one of the better PRNGs (except for security related applications).
The red line in each graph is plotted using the p-values sorted from smallest to largest. The blue line represents a perfect progression from 0 to 1. The more closely the red
line tracks the blue line the better that data set performed under the Diehard analysis, this is because the p-value distribution
should be uniform for a sequence of numbers with a high degree of randomness. A common misconception regarding the Diehard results is a p-value of 0.5 indicates random data. Truly random data with no discernable patterns actually
produces a uniform distribution of p-values between 0 and 1 which would produce an arithmetic mean very close to 0.5. Digressions from a uniform distribution (appearing as divergences
from the blue line) indicate the Diehard tests detected indications of low randomness in the data. Slight divergences between the blue and red lines aren't necessarily bad,
particularly if the diverging segment is fairly short and it runs parallel to the blue line. Patterns to look for that indicate a lack of randomness are:|
- Nearly vertical segments indicate a number of tests produced very similar p-values. The longer the vertical segment the higher the number of tests that produced a
similar result. Pay particular attention to vertical segments just above the '0' line or just below the '1' line, groups of similar p-values in these areas are strong
indicators of a lack of randomness.
- Nearly horizontal segments indicate gaps in the p-values. The longer the horizontal segment the larger the gap in p-values.
- Line segments following the '0' or '1' line may indicate test failures or p-values that are very close to 0 or 1. This appears at the beginning of the Mersenne Twister
line where the first 5 p-values are below 0.01. If one of these segments is very noticeable and very flat, it is likely due to test failures.
- A smooth red line indicates a more uniform distribution. A jagged red line with vertical segments followed by horizontal segments indicates a poor distribution with
groups of similar p-values followed by gaps in p-values. The closer these jagged segments are to right angles the tighter the groups of similar values and the larger the gaps
between segments of uniform distribution. This is noticeable in several places on the HW TRNG line.
Another way to look at the Diehard results is to look at the p-value distribution across the ranges of 0 to 0.1, 0.1 to 0.2, 0.2 to 0.3 and so on up to 0.9 to 1. For a perfectly
uniform distribution, 10% of the p-values would fall within each of those ranges. The graph below shows the distribution for the same data sets as used in the graphs above. The thick
black line indicates the 10% level. The green lines show the range of percentages for the eTRNG data and the red lines show the range for the hardware TRNG data.
This comparison of Diehard results shows the eTRNG generated data compares very favorably to the data from a high-quality hardware true random number generator based
on radioactive isotope decay (considered by many to be the perfect source of entropy). With the cleaner line in the p-value plot and tighter range on the p-value distribution graph
it could be said eTRNG out-performed this particular HW TRNG (the irregularities in the HW TRNG p-values are likely due to Geiger counter measurement anomlaies/inaccuracies or in how the data is
manipulated in producing the random number output). What these results don't show is the HW TRNG produces about 100 bytes/sec while eTRNG can generate over 60 Mbytes/sec and eTRNG
is a MUCH less expensive solution.
Diehard test results files (57KB text files):
The 800-22 suite of tests developed by the National Institute of Standards and Technology (in the US) has replaced Diehard as the gold standard for judging the quality of random
number generators. While the nature of many of the 800-22 test are close to those in Diehard, the results are presented in way that is more easily judged as pass/fail and less subjective
than the Diehard test results. Each test in 800-22 has specified failure criteria and an overall pass rate greater than 95% is considered to indicate a high degree of randomness.
Each run of 800-22 produces 190 test scores.
The table below shows the 800-22 (version 2.1.1) results for three 100 Mbytes data sets generated by eTRNG Advanced ("Pass 2" is the data set used for the Diehard results above) and published
results for the same HW TRNG data set used for the Diehard results above. For the tests that produce multiple scores, the average of those scores is presented. The best scores for each
of the tests are highlighted in green. Each of the eTRNG data sets was analyzed as 100 strings of 1Mbytes using the 800-22 default settings. For the
three passes (a total of 570 tests) there was a single failure on one of the Nonperiodic-template tests on Pass 3 (a score of 95 against a minimum passing score of 96). The average of the
pass rates for the three eTRNG data sets is 98.9%. Similar to the Diehard results, on the 800-22 test suite the eTRNG results again are comparable to the HW TRNG results.
|Pass rate percentage|
|Test||# of tests||Pass 1||Pass 2||Pass 3||HW TRNG|
|Cumulative sums ||2||100||100||100||97.66|
|Longest run ||1||100||97||100||100|
|Overlapping templates ||1||98||97||98||97.66|
|Random excursions ||8||98.82||99.03||98.87||99.69|
|Random excursions - variant ||18||99.26||99.20||98.75||98.69|
|Overall average ||99.15||98.54||99.06||98.51|
NIST 800-22 test results files (18KB text file):
The US Federal Information Processing Standard (FIPS) 140-2 test suite is primarily intended for analyzing random numbers for use in secure applications such as password
generation and encryption key creation. These tests primarily look for issues that could compromise security in these applications, such as predictability and long sequences
of '0' or '1' bits. FIPS 140-2 performed 1,000 tests on data strings in the first 20,000,032 bits of the data sets. Lacking features such as whitening and balancing, eTRNG is not intended
for use in these security related applications but it does perform very well on the FIPS 140-2 tests as shown below.
|Test ||Pass 1||Pass 2||Pass 3|
|Long run ||1||0||0|
|Continuous run ||0||0||0|
|Passing % ||99.8||99.9||100|
|Back to top |