SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
JOURNAL OF ——, 1
Analysis of SHA3-256 as a Random Number
Generator
Andrew Pollock, Chad Maybin and Elliott Whitling
Abstract—In this paper, we present an empirical evaluation
of the SHA3-256 hash algorithms’ output statistical randomness.
Hash algorithms must produce random and independent output
in order to be considered cryptographically secure. This output
must be random over varying sizes of data sets. Testing for
Randomness has historically been conducted using the industry
standard NIST STS. However, this tool is designed for statistical
analysis of small datasets and does not scale to larger sample
sizes. To assess SHA3-256, we adapted several tests from STS
to run on massive data sets. Assessment of Randomness changes
with scale and consequently, theoretical computational statistics
were given significant consideration. Between 996 million and
101 billion samples were tested using five of the NIST STS tests.
SHA3-256 hashes showed no evidence against randomness in four
of the five tests. However, the longest runs test did show evidence
against randomness and the experiment should be replicated.
Overall, a first pass at Big Data statistical cryptanalysis is
presented with many opportunities for improving on both STS
and our additions.
Index Terms—SHA3, Statistics, HPC, NIST, STS
I. INTRODUCTION
THe Keccak sponge function was accepted as the SHA3
standard in FIPS 202. The National Institute of Standards
and Technology (NIST) conducted extensive cryptographic
testing under the broad battery known as Cryptographic Al-
gorithm Validation Program (”CAVP”) to validate candidate
functions. NIST uses the Statistical Testing Suite (”STS”) to
ensure that the output of an encryption algorithm are suitably
random. If an algorithm were found to be predictable, the
strength of the encryption would suffer. As an algorithm weak-
ens, it becomes much easier for an attacker to compromise
communications which are encrypted with this algorithm. The
father of modern computing, John Von Neumann, warned
against the mathematical generation of random numbers. To
paraphrase, semi-numeric algorithmic approaches to random
number generation cannot achieve perfect randomness as,
given identical input and sufficient compute time, the value
could be reliably reproduced. Schneier [1] articulates two traits
for cryptographically secure pseudo-random sequences:
1) It looks random. This means that it passes all the
statistical tests of randomness that we can find.
2) It is unpredictable. It must be computationally infeasible
to predict what the next random bit will be, given com-
plete knowledge of the algorithm or hardware generating
the sequence and all of the previous bits in the stream.
The authors are with the Masters of Data Science program, Southern
Methodist University, Dallas, Tx, 75205 USA e-mail: please direct commu-
nication to ewhitling@smu.edu
Manuscript received ——; revised ——–.
In the seminal book, Art of Computer Programming Volume
2, Knuth [2] describes fundamental statistical tools to test
pseudo-random sequences. Of the discussed tests, the spectral
test is considered to be one of the most valuable as it can
identify prominent pseudo-random generators (PRNG) like a
linear feedback shift register. These were the first attempts to
assess hash functions as random generators.
With the Big Data revolution, came the ability to compute
previously unheard of sample sizes. The NIST STS is not
designed to accommodate massive data. We propose and
deliver the next step in random number generation testing: a
test suite designed to scale infinitely towards the total output
space of a random number generator function. In this paper,
we attempt to empirically determine the randomness, and
therefore the security, of SHA3-256 by applying tests based
on NIST Statistical Testing Suite to large sequences of values
generated using SHA.
SHA3 is an ideal first candidate to test near-population data
samples (NPDS). The keccak sponge function was recently
awarded the winner of the SHA3 standard competition, making
it the next hash algorithm to be universally adopted.[3] NIST
used STS to assess Keccak but was limited to a sample size
contained within the memory of a single computer.[4]
Given the importance of randomness in Hash Sequences,
very little effort has been put forth by the security community
to assess significant samples from the total output population.
The emergence of Big Data techniques from the Data Science
community warrants adaptation to cryptography problems. In
this paper, we will present the results of NPDS testing that is
several orders of magnitude larger than the NIST test.
The history of detecting non-randomness has been focused
on the theoretical crafting of Psudo-Random Number Gener-
ators PRNG against known statistical techniques. However,
little emphasis has been placed on applying these tools at
suitable scale. All implementations known to the authors have
only conducted tests to data sets that can be stored in memory.
In the source code of STS, the authors explicitly declare such
a limitation.
Six STS tests were adapted to scale up: the monobit,
random block, longest runs, independent runs, binary matrix
and spectral tests. Five of the six tests passed a validation
process designed to ensure consistent results with the existing
NIST STS toolset. The Spectral FFT test failed validation as
it always reported that any input had evidence against ran-
domness. It was therefore not included in the large scale tests.
The other five tests ran between 996 million and 101 billion
samples. Only one test, Longest Runs, suggested evidence
against randomness for SHA3-256.
JOURNAL OF ——, 2
Our results suggest that the Keccak Sponge Function as
SHA3-256 is probably a suitable replacement for SHA2.
Despite a test providing evidence against randomness, the
overwhelming conclusion is that SHA3-256 appears random.
The Longest Runs test should be re-validated and replicated
to ensure a consistent result. Further, the Spectral test should
be assessed. It is largely considered one of the superior
statistical tests for testing randomness and its failure at scale is
unsettling. Finally, we recommend the community as a whole
rethink the scale of statistical testing. The current toolset is
not truly ready to handle computation and calculation beyond
the limitation of a single CPU and RAM.
The remainder of this paper is organized as follows. Sec-
tion II, provides a brief explanation of the mathematical back-
ground of the NIST STS tests. In Section III, we present our
approach to validating the tests. Section IV briefly describes
our SHA3 generation process. Section V contains our results
from one-hundred billion SHA3-256 hashes analyzed with the
tool. In this section, the main focus is on the statistical security
of SHA3-256 but will also touch briefly on validation using
known SHA2 datasets. The final section, VI, concludes with
a summary of major results, a discussion of limitations of the
new toolset and suggested steps for further research.
II. NIST TESTS
In this section we cover the basics of the NIST STS tests.
The core functions for creating P-Values are also discussed. At
the end, we summarize modern constraints with STS focusing
on scaling.
A. Monobit
The monobit test is perhaps the most simplistic of all the
STS tests. Given any hash input as binary, a truly random
population should have roughly equal quantities of 0’s and
1’s. A test sequence, Sn, is generated by transforming the
generated bit sequence ( ) into a sum of bits, each doubled
and minus one.
Sn =
n
i=1
2 i − 1 (1)
And the P test statistic calculated as:
P − value = erfc(
|Sn|
√
2
) (2)
with erfc being the complementary error function.
A computed P-value that is smaller than then 0.01 indicates
that the sequence is likely to be random. The directionality
of Sn signifies the relative proportion of 0’s and 1’s in the
sequence. A large Sn means that a larger quantity of 1’s exist
in input than 0’s. NIST recommends that test sequences are
composed of at least 100 bits.
B. Random Block
The Random Block (also known as binary frequency) test
is an extension of Monobit. Again, the quantity of 0’s and 1’s
should be approximately equal but in this version the test is
within a M-bit block instead of an input sequence. This test is
identical to Monobit when M = 1. A series of test sequences
are created using: N = | n
M |. When M = 4 for a 256 bit SHA-3
sequence, then N = 128.
The ratio is calculated with:
πi =
M
j=1(i − 1)M + j
M
(3)
The test statistic is calculated as:
X2
(obs) = 4M
N
i=1
(πi − 1/2)2
(4)
The test statistic is then compared with a P-Value calculated
as:
P − V alue = igamc(N/2,
X2
(obs)
2
) (5)
As the P-value approaches zero, the ratio of 0’s and 1’s
become uneven.
C. Independent Runs
The independent runs test checks for contiguous series of
0’s or 1’s. The goal is to ensure the continuous length of the
same bit is random.
The ratio of ones is calculated with:
π =
j j
n
(6)
The test statistic is calculated:
Vn(obs) =
n−1
k=1
r(k) + 1 (7)
where r(k) = 1 when k = k+1, else 0.
And the P-value calculated as:
P − value = erfc(
|Vn(obs) − 2nπ(1 − π)|
2
√
2nπ(1 − π)
) (8)
with erfc being the complementary error function.
D. Longest Runs
Longest runs assess the largest series of 1’s in an input
sequence.
The test statistic is calculated:
X2
(obs) =
K
i=0
(vi − Nπi)2
Nπi
(9)
when M = 8, K = 3 and N = 16, M = 128 then K = 5 and N
= 49, and when M = 104
then K = 6 and N = 75.
The test statistic is then compared with a P-Value calculated
as:
P − V alue = igamc(
k
2
,
X2
(obs)
2
) (10)
A sequence is considered non-random if the P-value is
greater than the 0.01 threshold.
JOURNAL OF ——, 3
E. Binary Matrix
The Binary Matrix Rank test (section 2.5 of [5]) is intended
to test for linear correlation of subsets of the bitsequence. The
test follows an algorithm of iterating over blocks where n is
the length of bits, M is the rows and Q is the columns of the
Matrix. Both M and Q are hard coded to 32 in the NIST STS
implementation. The test starts by creating blocks using:
N = |
n
MQ
| (11)
Each block N is filled row by row sequentially then any
remaining bits are transferred to the next block. Each matrix
is then ranked. A chi-square test is then conducted with:
X2
(obs) =
(FM − 0.2888N)2
0.2888N
+ (12)
(FM − 1 − 0.5776N)2
0.5776N
+
(N − FM − FM − 1 − 0.1336N)2
0.1336N
Where FM are the count of matrices with full rank, FM − 1
are the count of matrices with full rank minus one and N −
FM − FM − 1 are the remainder.
The test statistic is then compared with a P-Value calculated
as:
P − V alue = igamc(1,
X2
(obs)
2
)e−t
(13)
Large X2
(obs) values indicate deviation of rank distribution
from that of a random sequence.
F. Spectral
The spectral test (section 2.6 of [5]) is based on the
Linear Algebra approach of mapping Euclidean distances into
Hamming, Banach or Hilbert space. In this case, the algorithm
maps data from Hamming space and mapping the point as
either a negative or positive unit. Once mapped to a sequence,
X, a Discrete Fourier Transformation (DFT) is applied. DFT
takes the sequence and returns Fourier coefficients or a set
samples of the sequence extracted for faster computation. This
new sequence has a modulus function applied that produces
a series of peaks from the original data. These peaks should
be less than 95 percent of total samples divided by two, T.
The theoretical value, N0, is compared to the actual number
of peaks, N1, less than T. The test statistic D is calculated as:
d =
(N1 − N0)
n(.95)(.05)/4)
(14)
And the P test statistic calculated as:
P − value = erfc(
|d|
√
2
) (15)
with erfc being the complementary error function.
A low d value indicates that the number of peaks in the
sample are outside the 95 percent range of T. A relatively
large d will result in a P-value that is greater than or equal
to the significance level which indicates the sequence maybe
random.
G. Core functions
There are two functions commonly used in the NIST tests,
IGAMC and erfc. Both serve different purposes but between
them, the two are included in every test. The function erfc,
or complementary error function, is a ANSI defined formula
included in the C math.h by default; however, NIST STS
utilizes the CEPHES version. A complementary error function
is used to calculate the likelihood a value is outside of the
defined range - often the range is defined by a Gaussian
distribution. [5]
IGAMC is an incomplete gamma function that returns a
normalized range between 0 and 1. Two parameters are passed
to the function, a and x, and the function inverts when a = x.
Both parameters must be non-negative but x may equal zero
( a > 0 and x >= 0). [5]
IGAMC is defined
P(a, x) =
1
Γ(a)
x
0
e−tta−1
dt (16)
where P(a,0) = 0 and P(a,1) = 1
H. Constraints and Improvements
Despite the well constructed source code for the STS, these
suites still have a critical issue: scaling to massive datasets.
Initial runs of SHA3-256 were run using a compiled version
of the official C source. The key error: the data size would
produce IGAMC overflow errors. IGAMC is an incomplete
gamma function used to create a custom distribution for
hypothesis testing. This function is used in six of the fifteen
tests.
IGAMC should be able to scale to an infinite population
size but has a built-in constraint in the NIST implementation:
it can only scale to the size of memory on a single server
node. This severely limits the scale of the tests and our test
server couldn’t surpass a few million hashes before creating an
overflow error. Memory issues are the core of most of NISTs
limitations. Many constants, like the max column and row size
for the Binary Matrix test, appear to be set with protecting
against overflow errors. Despite these measures, the bitstream
size is critical limitation prevent large scale execution.
There has been much research in the area of cryptographic
hash functions, and novel ways of generating pseudoran-
dom and random sequences. Bertoni et al. [6] covered the
submission of the KECCEK sponge function to NIST for
consideration as the SHA3 algorithm. The functions are tested
algebraically in the research, but output from the functions is
not tested empirically. Gholipour and Mirzakuchaki [7] applied
the NIST test suite to output generated from an algorithm
based on the KECCAK hash function (which later was adopted
as SHA3). The sample of data input into the test was much
smaller, however, roughly 150 megabytes of output. We were
unable to find instances of any research that proposed to test
an amount of output data of any algorithm on several orders
of magnitude greater.
III. VALIDATION OF HPC TESTS
The purpose of the testing proofs was to make sure that each
test was both understood and scalable in the context of this
JOURNAL OF ——, 4
(a) Monobit Series (b) Longest Runs Series
(c) Binary Frequency Series (d) Independent Runs Series
(e) Binary Matrix Series
Fig. 1: Five of the six tests P-values by series. The scatter of P-Values indicates the tests and validations are working. Note
the difference in y-axis scaling for the Independent Runs. FFT is excluded as all testing resulted in P-Values indistinguishable
from zero.
project. Accordingly, the source code and mathematical back-
ground of each test is from the NIST documentation and each
test was run against an increasing variety of known scenarios
to ensure that results were as expected before expanding the
testing to ever larger binary strings.
All six tests were conducted using a Lenovo Thinkstation
S20 with 24G ram and 8 cores at 2 gigahertz each. The tests
were validated one at a time and was the sole task of the
computer.
The testing proof methodology was fundamentally the same
for each NIST test used (see appendix for flow chart):
1) SHA2-256 is used to create 10,000 outputs using
random sentences. The test should produce no evidence
against randomness.
2) A repeating series of 0s and 1s that is known to be
non-random is tested. The test should demonstrate
evidence against randomness.
3) Step 1 is repeated but with a larger scale of 1,000,000
sentences.
4) A random binary string produced from Random.Org
should produce no evidence against randomness. Note
that Random.Org vets its services using NIST STS to
ensure its samples have no evidence against randomness.
JOURNAL OF ——, 5
5) Step 1 is repeated but with SHA3-256. A different
10,000 sentences are analyzed.
6) SHA3-256 outputs around another 33,000 sentences
taken from the New King James Bible as source. This is
meant to emulate a large amount of text generation with
cohesive themes unlike our previous random generation.
There should be no evidence against randomness.
7) Large bitstreams created and passed through SHA3-256
in multiple series producing sample sizes between
approximately one-thousand and one million. This
final test should have no evidence against randomness
and should show no significant difference in P-Values
between iterations.
Step 7 is the most important part of the validation process.
The process had to ensure that spurious results were not being
produced as a by-product of the test. For example, the length
in bits should produce a P-Value indistinguishable from other
lengths. Bit lengths were tested at intervals between 1024 and
896,000, doubling each previous iteration for a total of 12
lengths tested. Additionally, there should be no impact on
the sequence used. The each bit length test was conducted
in twenty iterations to that no contiguous seeds or PRNG’s
affected the result of the test. A Tukey-Kramer differences
analysis was conducted between the series and bit lengths to
ensure that all P-Value output was not different at a statistically
significant level.
Inconsistent P-Values are a strong indication that the vali-
dation process worked. Random generation of hashes should
not produce identical P-Values. The noise in figures 1 and 2
between P-Values supports the validation process. Note that
the independent runs test has a larger range and confidence
intervals than other tests producing visually reduced noise.
The FFT test fails all inputs as non-random regardless of
bit length or series. Our diagnostic efforts could not reveal
a flaw in the implementation in time to include it in our
large scale run on the super-computer. Additionally, FFT may
need an entire code redesign to work at scale. One of the
problems we ran into with FFT is that the test needs to be
run against a full data sequence. The reason for this is that
the values of the FFT are created across the entire sequence -
breaking the sequence into multiple pieces does not simply
extend the original calculation. The definition of peaks is
likewise contingent on the full sequence itself. Had our FFT
test returned meaningful values, we were planning to look at
a distribution of p-values on smaller tests as the creation of
a distributed FFT model was out of scope for our project.
Consequently, the FFT test was not included in the full run.
IV. SHA3 HASH GENERATION
A python script automated the generation of SHA3 256-
bit hashes using Keccak’s python implementation. We utilized
their Keccak function with a message pair of the input hex
string length and string. The string was generated by seeding
the process with the string composed of 77 characters then
adding one per hash generation. The bitrate was set to 1088
bits with a capacity of 256 bits. Per Keccak’s implementation
a suffix for SHA3 was selected as 0x06 and the output bit
length was set to 256. [6]
V. RESULTS
The large scale testing was completed at Southern Methodist
University’s ManeFrame super computer. ManeFrame uses a
queue based system to manage tasks and consequently, all
jobs were interlaced with the processing of other computations.
ManeFrame runs Scientific Linux 6 (64bit) using the SLURM
scheduler. ManeFrame has several queue types depending on
duration and node requirements. Our testing was conducted
on the Serial Queue with a maximum duration of 24 hours
per job allocation. The Serial Queue has access to 384 worker
nodes, each with 24 GB of Ram and 8-core Intel(R) Xeon(R)
CPU X5560 @ 2.80GHz 107 processors.
Both SHA2 and SHA3 256 were assessed. Two variants of
the suite were utilized. A full test suite used all test methods. A
quick test suite, composed of Monobit and Binary Frequency,
was also launched with the aim to produce as many samples
as possible. In general, the SHA2 hash generation was signifi-
cantly faster than SHA3 hash generation. The SHA2 test suites
both have an order of magnitude greater sample size than their
SHA3 counter parts. A post-mortem of python implementation
revealed a Python construct that throttled SHA3 generations.
This defect was fixed and did not result in abhorrent hash
outputs - everything was functioning correctly, only slower
than optimal.
A total of 15.267 billion hashes were produced for the
SHA2 full suite. The quick suite produced and tested 101.976
billion samples. SHA3 produced 0.996 and 23.180 billion sam-
ples respective to full and quick suites. All samples generated
within a suite were unique and sequential but the actual value
was not stored due to hard disk limitations.
A. Monobit
The monobit test provided no evidence against randomness
for all test suites. All P-Values returned were larger than
the 0.01 significance level. Both SHA2 samples produced
P-Values above 1.0. This appears to be caused by the test
sequence, Sn, to generate such an approximately even distri-
bution of 0’s and 1’s that when the P-Value was calculated
using Equation 2, the absolute value of Sn was outside the
practical limits of the erfc distribution.
It should be noted that SHA3-256 P-Values decreased with
the increase of sample size by more than double. This maybe a
by-product of randomness during calculation of P-Values with
large sample sizes or it may hint at a real trend. Our current
results cannot distinguish between the two possibilities. Repli-
cations of this experiment, along with storage of confidence
intervals or standard error would enable the necessary methods
to determine if SHA3 Monobit does trend towards a significant
level of 0.01 as the sample size increases.
JOURNAL OF ——, 6
(a) Monobit Bitlength (b) Longest Runs Bitlength
(c) Binary Frequency Bitlength (d) Independent Runs Binlength
(e) Binary Matrix Bit Length
Fig. 2: Five of the six tests P-values by bit length. The scatter of P-Values indicates the tests and validations are working. Note
the difference in y-axis scaling for the Independent Runs. FFT is excluded as all testing resulted in P-Values indistinguishable
from zero.
B. Random Block
The random block test produced no evidence against ran-
domness for all test suites. The SHA2-256 suites had P-Values
of 0.5169 and 0.9343 for full and quick suites respectively.
These are well above the 0.01 significance level indicating an
even distribution of 0s and 1s among random chunks of M-
bit sized blocks. The SHA3-256 also had P-Values larger than
0.01 but at 0.1338 and 0.2762 respective to full and quick,
the output is prone to a more uneven distribution of 0’s and
1’s. As NIST notes, the chosen significance level illustrates
an arbitrary point where the output should be considered
non-random. And as the P-Values approach this point, the
methodology is informing us that the distribution is becoming
more uneven. However, both values are an order of magnitude
above the 0.01 threshold and consequently, do not show any
evidence against randomness.
C. Longest Runs
The SHA2 full test suite calculated a P-Value of 2.0. This
indicates there is no evidence against randomness. It should
be noted that this is an atypical P-Value but is likely caused by
the methodology. The value is calculated with a Chi-Square
that must have resulted in a ratio of 0’s to 1’s that are evenly
distributed as the largest series of 1’s is near 1. We conjecture
that incomplete gamma function did not contain an upper limit
JOURNAL OF ——, 7
P-Values
hash algorithm Sample Size Monobit Longest Runs Independent Runs Binary Frequency Binary Matrix
SHA2-256 15,267,000,000 1.2596801123956967 2.0 1.0 0.51695668919649718 1.0
SHA2-256 101,976,000,000 1.1912280400600554 - - 0.93439071101230919 -
SHA3-256 996,700,000 0.5662553593860834 0.0 1.0 0.13385067814625731 0.19970306365206464
SHA3-256 23,180,000,000 0.19447400097104728 - - 0.27626007166449562 -
TABLE I: The resultant P-Values of each test and the related sample size
large enough to encompass the value produced from the Chi-
Square.
Contrariwise, the SHA3 test produced a P-Value of 0.0.
Assuming an accurate value, this provides evidence against
randomness. This result should be replicated and the methods
should be vetted again before suggesting that SHA3-256 pro-
duces non-random values. This result should be viewed with
additional skepticism since the Independent Runs test results
in no evidence against randomness (see next subsection). The
independent runs test and longest runs tests are very similar.
The independent runs test checks for the number of continuous
0’s and 1’s, ensuring that the length of runs is random. The
largest runs assess the largest series of 1’s in an sequence.
Given the two extremes of results, it is highly probable
that the methodology is either incorrectly implemented or the
test cannot scale to large values. A replication of this test is
especially called for, in addition to increased analysis.
D. Independent Runs
Only the Full Suites of SHA2 and SHA3 ran the indepen-
dent runs test. In both algorithms trials, the P-Value return
with a perfect 1.0, which indicates there is no evidence against
randomness. More precisely, this means that the contiguous
series of 0’s and 1’s are approximately even in size.
E. Binary Matrix
The Binary Matrix suites resulted in no evidence against
randomness. The SHA2 full suite produced a P-Value of 1.0.
The SHA3 test measured a P-Value of 0.1997. Both suggest
that subsets of bit sequences broken into 32x32 matrices do
not have any linear correlation between elements within the
matrix.
VI. CONCLUSION
In this paper we present an initial approach to assessing
hash output as random number generators using large sample
sizes. The tests were chosen from the industry standard NIST
STS and were adapted to not be limited by memory utiliza-
tion. Additionally, we constructed an automation process to
schedule analysis jobs at a massive scale.
Five of the six tests adapted passed validation. The valida-
tion process assessed known random and non-random values.
Additionally, we compared the P-Value output by series and bit
length. The Monobit, Longest Runs, Independent Runs, Binary
Frequency and Binary Matrix tests all passed verification.
However, the FFT or Spectral Test always reported that all
inputs are non-random. This occurred even on SHA2-256
tests of known random sequences. Consequently, this test was
removed from the large scale run. One of the first efforts after
this experiment should be a post-mortem of the FFT code to
diagnose the failure. Spectral FFT is widely considered one
of the most powerful tools to detect subtle deviations against
randomness and its exclusion in our results shift this effort
from a reliable assessment of SHA3 as a random number
generator to an initial pass at execution of large sample size
testing of hash outputs.
Using the updated toolkit, we tested SHA2 and SHA3
output with sample sizes between 0.996 billion and 101.976
billion. All tests demonstrated no evidence against randomness
for SHA2-256. Four of the five tests showed no evidence of
randomness for SHA3 hash output. The Longest Runs test
resulted in a P-Value smaller than a 0.01 significance level. The
method passed the verification process, making it improbable
that an implementation error occurred. Two other possibilities
are likely. First, this result is caused by the test itself being
incapable of scaling. The incomplete gamma function used
to evaluate the sequence might not produce a distribution
compatible with this size of samples. This observation is
supported by the atypical result of a P-Value of 2.0 for the
SHA2 Longest Runs. The other possibility is that SHA3-
256 really does show evidence against non-randomness when
assess at scale. While this result would be a significant blow
to the adoption of SHA3, it is also the least likely given the
overall lack of evidence against randomness. This test needs to
be replicated and analyzed before any conclusion can be made
about SHA3’s ability to act like a random number generator.
In addition to the validation of the Longest Run result and
analysis of FFT testing, there are several additional next steps
that can strengthen the ability to assess randomness in hash
algorithms. The adaptation and validation of the other tests to
no longer be constricted by memory would allow for full NIST
STS parity with large samples. Additionally, the incomplete
gamma function and error function should be assessed from a
theoretical and experimental perspective to ensure it can scale
to large samples.
APPENDIX
Test Validation Flow Chart
JOURNAL OF ——, 8
Fig. 3: Validation Process Flow
JOURNAL OF ——, 9
ACKNOWLEDGMENT
The authors would like to thank Dr. Engels and Dr. Mcgee
at Southern Methodist University.
REFERENCES
[1] B. Schneier, Applied cryptography: protocols, algorithms, and source
code in C. Wiley, 1996.
[2] D. Knuth, Art of Computer Programming, Volume 2: Seminumerical
Algorithms (Page 2). Addison-Wesley, 1997.
[3] NIST, “Sha-3 standard: Permutation-based hash and extendable-output
functions, fips 202,” Tech. Rep.
[4] C. Boutin, “Nist selects winner of secure has algorithm (sha-3) competi-
tion. nist tech beat. [online],” 2012.
[5] Statistical Test Suite for Random and Pseudorandom Number Generators
for Cryptographic Applications., 1st ed., NIST, April 2010.
[6] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, “Keccak sponge
function family main document. submission to nist (round 2),” Keccak,
Tech. Rep., 2009.
[7] A. Gholipour and S. Mirzakuchaki, “A pseudorandom number generator
with keccak hash function.” International Journal of Computer and
Electrical Engineering 3.6, 2011.

Mais conteúdo relacionado

Mais procurados

On Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list ApproachOn Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list Approach
Patrick Nguyen
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
Junghoon Kim
 
Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...
Eswar Publications
 

Mais procurados (18)

Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Sccp forces-2009 v3
Sccp forces-2009 v3Sccp forces-2009 v3
Sccp forces-2009 v3
 
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
Computational Complexity Comparison Of Multi-Sensor Single Target Data Fusion...
 
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
COMPUTATIONAL COMPLEXITY COMPARISON OF MULTI-SENSOR SINGLE TARGET DATA FUSION...
 
On Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list ApproachOn Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list Approach
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
50120140503004
5012014050300450120140503004
50120140503004
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Sensing Method for Two-Target Detection in Time-Constrained Vector Poisson Ch...
Sensing Method for Two-Target Detection in Time-Constrained Vector Poisson Ch...Sensing Method for Two-Target Detection in Time-Constrained Vector Poisson Ch...
Sensing Method for Two-Target Detection in Time-Constrained Vector Poisson Ch...
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace Reduction
 
Concurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine LearningConcurrency Control for Parallel Machine Learning
Concurrency Control for Parallel Machine Learning
 
Algoritmo quântico
Algoritmo quânticoAlgoritmo quântico
Algoritmo quântico
 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
 
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream CiphersMultiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
 
Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...Performance Comparison of Cluster based and Threshold based Algorithms for De...
Performance Comparison of Cluster based and Threshold based Algorithms for De...
 
Computing probabilistic queries in the presence of uncertainty via probabilis...
Computing probabilistic queries in the presence of uncertainty via probabilis...Computing probabilistic queries in the presence of uncertainty via probabilis...
Computing probabilistic queries in the presence of uncertainty via probabilis...
 

Semelhante a HPC_NIST_SHA3

Semelhante a HPC_NIST_SHA3 (20)

AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
AN ALTERNATIVE APPROACH FOR SELECTION OF PSEUDO RANDOM NUMBERS FOR ONLINE EXA...
 
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
A Modular approach on Statistical Randomness Study of bit sequences
A Modular approach on Statistical Randomness Study of bit sequencesA Modular approach on Statistical Randomness Study of bit sequences
A Modular approach on Statistical Randomness Study of bit sequences
 
Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...Computational intelligence based simulated annealing guided key generation in...
Computational intelligence based simulated annealing guided key generation in...
 
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...
 
Efficient Query Evaluation of Probabilistic Top-k Queries in Wireless Sensor ...
Efficient Query Evaluation of Probabilistic Top-k Queries in Wireless Sensor ...Efficient Query Evaluation of Probabilistic Top-k Queries in Wireless Sensor ...
Efficient Query Evaluation of Probabilistic Top-k Queries in Wireless Sensor ...
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMSA STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
A STATISTICAL COMPARATIVE STUDY OF SOME SORTING ALGORITHMS
 
A statistical comparative study of
A statistical comparative study ofA statistical comparative study of
A statistical comparative study of
 
First paper with the NITheCS affiliation
First paper with the NITheCS affiliationFirst paper with the NITheCS affiliation
First paper with the NITheCS affiliation
 
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
 
An SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE ApproachAn SPRT Procedure for an Ungrouped Data using MMLE Approach
An SPRT Procedure for an Ungrouped Data using MMLE Approach
 
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream CiphersMultiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
 
Design of optimized Interval Arithmetic Multiplier
Design of optimized Interval Arithmetic MultiplierDesign of optimized Interval Arithmetic Multiplier
Design of optimized Interval Arithmetic Multiplier
 
New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...
 
Design of ternary sequence using msaa
Design of ternary sequence using msaaDesign of ternary sequence using msaa
Design of ternary sequence using msaa
 
08 entropie
08 entropie08 entropie
08 entropie
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Compressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor NetworkCompressive Data Gathering using NACS in Wireless Sensor Network
Compressive Data Gathering using NACS in Wireless Sensor Network
 

HPC_NIST_SHA3

  • 1. JOURNAL OF ——, 1 Analysis of SHA3-256 as a Random Number Generator Andrew Pollock, Chad Maybin and Elliott Whitling Abstract—In this paper, we present an empirical evaluation of the SHA3-256 hash algorithms’ output statistical randomness. Hash algorithms must produce random and independent output in order to be considered cryptographically secure. This output must be random over varying sizes of data sets. Testing for Randomness has historically been conducted using the industry standard NIST STS. However, this tool is designed for statistical analysis of small datasets and does not scale to larger sample sizes. To assess SHA3-256, we adapted several tests from STS to run on massive data sets. Assessment of Randomness changes with scale and consequently, theoretical computational statistics were given significant consideration. Between 996 million and 101 billion samples were tested using five of the NIST STS tests. SHA3-256 hashes showed no evidence against randomness in four of the five tests. However, the longest runs test did show evidence against randomness and the experiment should be replicated. Overall, a first pass at Big Data statistical cryptanalysis is presented with many opportunities for improving on both STS and our additions. Index Terms—SHA3, Statistics, HPC, NIST, STS I. INTRODUCTION THe Keccak sponge function was accepted as the SHA3 standard in FIPS 202. The National Institute of Standards and Technology (NIST) conducted extensive cryptographic testing under the broad battery known as Cryptographic Al- gorithm Validation Program (”CAVP”) to validate candidate functions. NIST uses the Statistical Testing Suite (”STS”) to ensure that the output of an encryption algorithm are suitably random. If an algorithm were found to be predictable, the strength of the encryption would suffer. As an algorithm weak- ens, it becomes much easier for an attacker to compromise communications which are encrypted with this algorithm. The father of modern computing, John Von Neumann, warned against the mathematical generation of random numbers. To paraphrase, semi-numeric algorithmic approaches to random number generation cannot achieve perfect randomness as, given identical input and sufficient compute time, the value could be reliably reproduced. Schneier [1] articulates two traits for cryptographically secure pseudo-random sequences: 1) It looks random. This means that it passes all the statistical tests of randomness that we can find. 2) It is unpredictable. It must be computationally infeasible to predict what the next random bit will be, given com- plete knowledge of the algorithm or hardware generating the sequence and all of the previous bits in the stream. The authors are with the Masters of Data Science program, Southern Methodist University, Dallas, Tx, 75205 USA e-mail: please direct commu- nication to ewhitling@smu.edu Manuscript received ——; revised ——–. In the seminal book, Art of Computer Programming Volume 2, Knuth [2] describes fundamental statistical tools to test pseudo-random sequences. Of the discussed tests, the spectral test is considered to be one of the most valuable as it can identify prominent pseudo-random generators (PRNG) like a linear feedback shift register. These were the first attempts to assess hash functions as random generators. With the Big Data revolution, came the ability to compute previously unheard of sample sizes. The NIST STS is not designed to accommodate massive data. We propose and deliver the next step in random number generation testing: a test suite designed to scale infinitely towards the total output space of a random number generator function. In this paper, we attempt to empirically determine the randomness, and therefore the security, of SHA3-256 by applying tests based on NIST Statistical Testing Suite to large sequences of values generated using SHA. SHA3 is an ideal first candidate to test near-population data samples (NPDS). The keccak sponge function was recently awarded the winner of the SHA3 standard competition, making it the next hash algorithm to be universally adopted.[3] NIST used STS to assess Keccak but was limited to a sample size contained within the memory of a single computer.[4] Given the importance of randomness in Hash Sequences, very little effort has been put forth by the security community to assess significant samples from the total output population. The emergence of Big Data techniques from the Data Science community warrants adaptation to cryptography problems. In this paper, we will present the results of NPDS testing that is several orders of magnitude larger than the NIST test. The history of detecting non-randomness has been focused on the theoretical crafting of Psudo-Random Number Gener- ators PRNG against known statistical techniques. However, little emphasis has been placed on applying these tools at suitable scale. All implementations known to the authors have only conducted tests to data sets that can be stored in memory. In the source code of STS, the authors explicitly declare such a limitation. Six STS tests were adapted to scale up: the monobit, random block, longest runs, independent runs, binary matrix and spectral tests. Five of the six tests passed a validation process designed to ensure consistent results with the existing NIST STS toolset. The Spectral FFT test failed validation as it always reported that any input had evidence against ran- domness. It was therefore not included in the large scale tests. The other five tests ran between 996 million and 101 billion samples. Only one test, Longest Runs, suggested evidence against randomness for SHA3-256.
  • 2. JOURNAL OF ——, 2 Our results suggest that the Keccak Sponge Function as SHA3-256 is probably a suitable replacement for SHA2. Despite a test providing evidence against randomness, the overwhelming conclusion is that SHA3-256 appears random. The Longest Runs test should be re-validated and replicated to ensure a consistent result. Further, the Spectral test should be assessed. It is largely considered one of the superior statistical tests for testing randomness and its failure at scale is unsettling. Finally, we recommend the community as a whole rethink the scale of statistical testing. The current toolset is not truly ready to handle computation and calculation beyond the limitation of a single CPU and RAM. The remainder of this paper is organized as follows. Sec- tion II, provides a brief explanation of the mathematical back- ground of the NIST STS tests. In Section III, we present our approach to validating the tests. Section IV briefly describes our SHA3 generation process. Section V contains our results from one-hundred billion SHA3-256 hashes analyzed with the tool. In this section, the main focus is on the statistical security of SHA3-256 but will also touch briefly on validation using known SHA2 datasets. The final section, VI, concludes with a summary of major results, a discussion of limitations of the new toolset and suggested steps for further research. II. NIST TESTS In this section we cover the basics of the NIST STS tests. The core functions for creating P-Values are also discussed. At the end, we summarize modern constraints with STS focusing on scaling. A. Monobit The monobit test is perhaps the most simplistic of all the STS tests. Given any hash input as binary, a truly random population should have roughly equal quantities of 0’s and 1’s. A test sequence, Sn, is generated by transforming the generated bit sequence ( ) into a sum of bits, each doubled and minus one. Sn = n i=1 2 i − 1 (1) And the P test statistic calculated as: P − value = erfc( |Sn| √ 2 ) (2) with erfc being the complementary error function. A computed P-value that is smaller than then 0.01 indicates that the sequence is likely to be random. The directionality of Sn signifies the relative proportion of 0’s and 1’s in the sequence. A large Sn means that a larger quantity of 1’s exist in input than 0’s. NIST recommends that test sequences are composed of at least 100 bits. B. Random Block The Random Block (also known as binary frequency) test is an extension of Monobit. Again, the quantity of 0’s and 1’s should be approximately equal but in this version the test is within a M-bit block instead of an input sequence. This test is identical to Monobit when M = 1. A series of test sequences are created using: N = | n M |. When M = 4 for a 256 bit SHA-3 sequence, then N = 128. The ratio is calculated with: πi = M j=1(i − 1)M + j M (3) The test statistic is calculated as: X2 (obs) = 4M N i=1 (πi − 1/2)2 (4) The test statistic is then compared with a P-Value calculated as: P − V alue = igamc(N/2, X2 (obs) 2 ) (5) As the P-value approaches zero, the ratio of 0’s and 1’s become uneven. C. Independent Runs The independent runs test checks for contiguous series of 0’s or 1’s. The goal is to ensure the continuous length of the same bit is random. The ratio of ones is calculated with: π = j j n (6) The test statistic is calculated: Vn(obs) = n−1 k=1 r(k) + 1 (7) where r(k) = 1 when k = k+1, else 0. And the P-value calculated as: P − value = erfc( |Vn(obs) − 2nπ(1 − π)| 2 √ 2nπ(1 − π) ) (8) with erfc being the complementary error function. D. Longest Runs Longest runs assess the largest series of 1’s in an input sequence. The test statistic is calculated: X2 (obs) = K i=0 (vi − Nπi)2 Nπi (9) when M = 8, K = 3 and N = 16, M = 128 then K = 5 and N = 49, and when M = 104 then K = 6 and N = 75. The test statistic is then compared with a P-Value calculated as: P − V alue = igamc( k 2 , X2 (obs) 2 ) (10) A sequence is considered non-random if the P-value is greater than the 0.01 threshold.
  • 3. JOURNAL OF ——, 3 E. Binary Matrix The Binary Matrix Rank test (section 2.5 of [5]) is intended to test for linear correlation of subsets of the bitsequence. The test follows an algorithm of iterating over blocks where n is the length of bits, M is the rows and Q is the columns of the Matrix. Both M and Q are hard coded to 32 in the NIST STS implementation. The test starts by creating blocks using: N = | n MQ | (11) Each block N is filled row by row sequentially then any remaining bits are transferred to the next block. Each matrix is then ranked. A chi-square test is then conducted with: X2 (obs) = (FM − 0.2888N)2 0.2888N + (12) (FM − 1 − 0.5776N)2 0.5776N + (N − FM − FM − 1 − 0.1336N)2 0.1336N Where FM are the count of matrices with full rank, FM − 1 are the count of matrices with full rank minus one and N − FM − FM − 1 are the remainder. The test statistic is then compared with a P-Value calculated as: P − V alue = igamc(1, X2 (obs) 2 )e−t (13) Large X2 (obs) values indicate deviation of rank distribution from that of a random sequence. F. Spectral The spectral test (section 2.6 of [5]) is based on the Linear Algebra approach of mapping Euclidean distances into Hamming, Banach or Hilbert space. In this case, the algorithm maps data from Hamming space and mapping the point as either a negative or positive unit. Once mapped to a sequence, X, a Discrete Fourier Transformation (DFT) is applied. DFT takes the sequence and returns Fourier coefficients or a set samples of the sequence extracted for faster computation. This new sequence has a modulus function applied that produces a series of peaks from the original data. These peaks should be less than 95 percent of total samples divided by two, T. The theoretical value, N0, is compared to the actual number of peaks, N1, less than T. The test statistic D is calculated as: d = (N1 − N0) n(.95)(.05)/4) (14) And the P test statistic calculated as: P − value = erfc( |d| √ 2 ) (15) with erfc being the complementary error function. A low d value indicates that the number of peaks in the sample are outside the 95 percent range of T. A relatively large d will result in a P-value that is greater than or equal to the significance level which indicates the sequence maybe random. G. Core functions There are two functions commonly used in the NIST tests, IGAMC and erfc. Both serve different purposes but between them, the two are included in every test. The function erfc, or complementary error function, is a ANSI defined formula included in the C math.h by default; however, NIST STS utilizes the CEPHES version. A complementary error function is used to calculate the likelihood a value is outside of the defined range - often the range is defined by a Gaussian distribution. [5] IGAMC is an incomplete gamma function that returns a normalized range between 0 and 1. Two parameters are passed to the function, a and x, and the function inverts when a = x. Both parameters must be non-negative but x may equal zero ( a > 0 and x >= 0). [5] IGAMC is defined P(a, x) = 1 Γ(a) x 0 e−tta−1 dt (16) where P(a,0) = 0 and P(a,1) = 1 H. Constraints and Improvements Despite the well constructed source code for the STS, these suites still have a critical issue: scaling to massive datasets. Initial runs of SHA3-256 were run using a compiled version of the official C source. The key error: the data size would produce IGAMC overflow errors. IGAMC is an incomplete gamma function used to create a custom distribution for hypothesis testing. This function is used in six of the fifteen tests. IGAMC should be able to scale to an infinite population size but has a built-in constraint in the NIST implementation: it can only scale to the size of memory on a single server node. This severely limits the scale of the tests and our test server couldn’t surpass a few million hashes before creating an overflow error. Memory issues are the core of most of NISTs limitations. Many constants, like the max column and row size for the Binary Matrix test, appear to be set with protecting against overflow errors. Despite these measures, the bitstream size is critical limitation prevent large scale execution. There has been much research in the area of cryptographic hash functions, and novel ways of generating pseudoran- dom and random sequences. Bertoni et al. [6] covered the submission of the KECCEK sponge function to NIST for consideration as the SHA3 algorithm. The functions are tested algebraically in the research, but output from the functions is not tested empirically. Gholipour and Mirzakuchaki [7] applied the NIST test suite to output generated from an algorithm based on the KECCAK hash function (which later was adopted as SHA3). The sample of data input into the test was much smaller, however, roughly 150 megabytes of output. We were unable to find instances of any research that proposed to test an amount of output data of any algorithm on several orders of magnitude greater. III. VALIDATION OF HPC TESTS The purpose of the testing proofs was to make sure that each test was both understood and scalable in the context of this
  • 4. JOURNAL OF ——, 4 (a) Monobit Series (b) Longest Runs Series (c) Binary Frequency Series (d) Independent Runs Series (e) Binary Matrix Series Fig. 1: Five of the six tests P-values by series. The scatter of P-Values indicates the tests and validations are working. Note the difference in y-axis scaling for the Independent Runs. FFT is excluded as all testing resulted in P-Values indistinguishable from zero. project. Accordingly, the source code and mathematical back- ground of each test is from the NIST documentation and each test was run against an increasing variety of known scenarios to ensure that results were as expected before expanding the testing to ever larger binary strings. All six tests were conducted using a Lenovo Thinkstation S20 with 24G ram and 8 cores at 2 gigahertz each. The tests were validated one at a time and was the sole task of the computer. The testing proof methodology was fundamentally the same for each NIST test used (see appendix for flow chart): 1) SHA2-256 is used to create 10,000 outputs using random sentences. The test should produce no evidence against randomness. 2) A repeating series of 0s and 1s that is known to be non-random is tested. The test should demonstrate evidence against randomness. 3) Step 1 is repeated but with a larger scale of 1,000,000 sentences. 4) A random binary string produced from Random.Org should produce no evidence against randomness. Note that Random.Org vets its services using NIST STS to ensure its samples have no evidence against randomness.
  • 5. JOURNAL OF ——, 5 5) Step 1 is repeated but with SHA3-256. A different 10,000 sentences are analyzed. 6) SHA3-256 outputs around another 33,000 sentences taken from the New King James Bible as source. This is meant to emulate a large amount of text generation with cohesive themes unlike our previous random generation. There should be no evidence against randomness. 7) Large bitstreams created and passed through SHA3-256 in multiple series producing sample sizes between approximately one-thousand and one million. This final test should have no evidence against randomness and should show no significant difference in P-Values between iterations. Step 7 is the most important part of the validation process. The process had to ensure that spurious results were not being produced as a by-product of the test. For example, the length in bits should produce a P-Value indistinguishable from other lengths. Bit lengths were tested at intervals between 1024 and 896,000, doubling each previous iteration for a total of 12 lengths tested. Additionally, there should be no impact on the sequence used. The each bit length test was conducted in twenty iterations to that no contiguous seeds or PRNG’s affected the result of the test. A Tukey-Kramer differences analysis was conducted between the series and bit lengths to ensure that all P-Value output was not different at a statistically significant level. Inconsistent P-Values are a strong indication that the vali- dation process worked. Random generation of hashes should not produce identical P-Values. The noise in figures 1 and 2 between P-Values supports the validation process. Note that the independent runs test has a larger range and confidence intervals than other tests producing visually reduced noise. The FFT test fails all inputs as non-random regardless of bit length or series. Our diagnostic efforts could not reveal a flaw in the implementation in time to include it in our large scale run on the super-computer. Additionally, FFT may need an entire code redesign to work at scale. One of the problems we ran into with FFT is that the test needs to be run against a full data sequence. The reason for this is that the values of the FFT are created across the entire sequence - breaking the sequence into multiple pieces does not simply extend the original calculation. The definition of peaks is likewise contingent on the full sequence itself. Had our FFT test returned meaningful values, we were planning to look at a distribution of p-values on smaller tests as the creation of a distributed FFT model was out of scope for our project. Consequently, the FFT test was not included in the full run. IV. SHA3 HASH GENERATION A python script automated the generation of SHA3 256- bit hashes using Keccak’s python implementation. We utilized their Keccak function with a message pair of the input hex string length and string. The string was generated by seeding the process with the string composed of 77 characters then adding one per hash generation. The bitrate was set to 1088 bits with a capacity of 256 bits. Per Keccak’s implementation a suffix for SHA3 was selected as 0x06 and the output bit length was set to 256. [6] V. RESULTS The large scale testing was completed at Southern Methodist University’s ManeFrame super computer. ManeFrame uses a queue based system to manage tasks and consequently, all jobs were interlaced with the processing of other computations. ManeFrame runs Scientific Linux 6 (64bit) using the SLURM scheduler. ManeFrame has several queue types depending on duration and node requirements. Our testing was conducted on the Serial Queue with a maximum duration of 24 hours per job allocation. The Serial Queue has access to 384 worker nodes, each with 24 GB of Ram and 8-core Intel(R) Xeon(R) CPU X5560 @ 2.80GHz 107 processors. Both SHA2 and SHA3 256 were assessed. Two variants of the suite were utilized. A full test suite used all test methods. A quick test suite, composed of Monobit and Binary Frequency, was also launched with the aim to produce as many samples as possible. In general, the SHA2 hash generation was signifi- cantly faster than SHA3 hash generation. The SHA2 test suites both have an order of magnitude greater sample size than their SHA3 counter parts. A post-mortem of python implementation revealed a Python construct that throttled SHA3 generations. This defect was fixed and did not result in abhorrent hash outputs - everything was functioning correctly, only slower than optimal. A total of 15.267 billion hashes were produced for the SHA2 full suite. The quick suite produced and tested 101.976 billion samples. SHA3 produced 0.996 and 23.180 billion sam- ples respective to full and quick suites. All samples generated within a suite were unique and sequential but the actual value was not stored due to hard disk limitations. A. Monobit The monobit test provided no evidence against randomness for all test suites. All P-Values returned were larger than the 0.01 significance level. Both SHA2 samples produced P-Values above 1.0. This appears to be caused by the test sequence, Sn, to generate such an approximately even distri- bution of 0’s and 1’s that when the P-Value was calculated using Equation 2, the absolute value of Sn was outside the practical limits of the erfc distribution. It should be noted that SHA3-256 P-Values decreased with the increase of sample size by more than double. This maybe a by-product of randomness during calculation of P-Values with large sample sizes or it may hint at a real trend. Our current results cannot distinguish between the two possibilities. Repli- cations of this experiment, along with storage of confidence intervals or standard error would enable the necessary methods to determine if SHA3 Monobit does trend towards a significant level of 0.01 as the sample size increases.
  • 6. JOURNAL OF ——, 6 (a) Monobit Bitlength (b) Longest Runs Bitlength (c) Binary Frequency Bitlength (d) Independent Runs Binlength (e) Binary Matrix Bit Length Fig. 2: Five of the six tests P-values by bit length. The scatter of P-Values indicates the tests and validations are working. Note the difference in y-axis scaling for the Independent Runs. FFT is excluded as all testing resulted in P-Values indistinguishable from zero. B. Random Block The random block test produced no evidence against ran- domness for all test suites. The SHA2-256 suites had P-Values of 0.5169 and 0.9343 for full and quick suites respectively. These are well above the 0.01 significance level indicating an even distribution of 0s and 1s among random chunks of M- bit sized blocks. The SHA3-256 also had P-Values larger than 0.01 but at 0.1338 and 0.2762 respective to full and quick, the output is prone to a more uneven distribution of 0’s and 1’s. As NIST notes, the chosen significance level illustrates an arbitrary point where the output should be considered non-random. And as the P-Values approach this point, the methodology is informing us that the distribution is becoming more uneven. However, both values are an order of magnitude above the 0.01 threshold and consequently, do not show any evidence against randomness. C. Longest Runs The SHA2 full test suite calculated a P-Value of 2.0. This indicates there is no evidence against randomness. It should be noted that this is an atypical P-Value but is likely caused by the methodology. The value is calculated with a Chi-Square that must have resulted in a ratio of 0’s to 1’s that are evenly distributed as the largest series of 1’s is near 1. We conjecture that incomplete gamma function did not contain an upper limit
  • 7. JOURNAL OF ——, 7 P-Values hash algorithm Sample Size Monobit Longest Runs Independent Runs Binary Frequency Binary Matrix SHA2-256 15,267,000,000 1.2596801123956967 2.0 1.0 0.51695668919649718 1.0 SHA2-256 101,976,000,000 1.1912280400600554 - - 0.93439071101230919 - SHA3-256 996,700,000 0.5662553593860834 0.0 1.0 0.13385067814625731 0.19970306365206464 SHA3-256 23,180,000,000 0.19447400097104728 - - 0.27626007166449562 - TABLE I: The resultant P-Values of each test and the related sample size large enough to encompass the value produced from the Chi- Square. Contrariwise, the SHA3 test produced a P-Value of 0.0. Assuming an accurate value, this provides evidence against randomness. This result should be replicated and the methods should be vetted again before suggesting that SHA3-256 pro- duces non-random values. This result should be viewed with additional skepticism since the Independent Runs test results in no evidence against randomness (see next subsection). The independent runs test and longest runs tests are very similar. The independent runs test checks for the number of continuous 0’s and 1’s, ensuring that the length of runs is random. The largest runs assess the largest series of 1’s in an sequence. Given the two extremes of results, it is highly probable that the methodology is either incorrectly implemented or the test cannot scale to large values. A replication of this test is especially called for, in addition to increased analysis. D. Independent Runs Only the Full Suites of SHA2 and SHA3 ran the indepen- dent runs test. In both algorithms trials, the P-Value return with a perfect 1.0, which indicates there is no evidence against randomness. More precisely, this means that the contiguous series of 0’s and 1’s are approximately even in size. E. Binary Matrix The Binary Matrix suites resulted in no evidence against randomness. The SHA2 full suite produced a P-Value of 1.0. The SHA3 test measured a P-Value of 0.1997. Both suggest that subsets of bit sequences broken into 32x32 matrices do not have any linear correlation between elements within the matrix. VI. CONCLUSION In this paper we present an initial approach to assessing hash output as random number generators using large sample sizes. The tests were chosen from the industry standard NIST STS and were adapted to not be limited by memory utiliza- tion. Additionally, we constructed an automation process to schedule analysis jobs at a massive scale. Five of the six tests adapted passed validation. The valida- tion process assessed known random and non-random values. Additionally, we compared the P-Value output by series and bit length. The Monobit, Longest Runs, Independent Runs, Binary Frequency and Binary Matrix tests all passed verification. However, the FFT or Spectral Test always reported that all inputs are non-random. This occurred even on SHA2-256 tests of known random sequences. Consequently, this test was removed from the large scale run. One of the first efforts after this experiment should be a post-mortem of the FFT code to diagnose the failure. Spectral FFT is widely considered one of the most powerful tools to detect subtle deviations against randomness and its exclusion in our results shift this effort from a reliable assessment of SHA3 as a random number generator to an initial pass at execution of large sample size testing of hash outputs. Using the updated toolkit, we tested SHA2 and SHA3 output with sample sizes between 0.996 billion and 101.976 billion. All tests demonstrated no evidence against randomness for SHA2-256. Four of the five tests showed no evidence of randomness for SHA3 hash output. The Longest Runs test resulted in a P-Value smaller than a 0.01 significance level. The method passed the verification process, making it improbable that an implementation error occurred. Two other possibilities are likely. First, this result is caused by the test itself being incapable of scaling. The incomplete gamma function used to evaluate the sequence might not produce a distribution compatible with this size of samples. This observation is supported by the atypical result of a P-Value of 2.0 for the SHA2 Longest Runs. The other possibility is that SHA3- 256 really does show evidence against non-randomness when assess at scale. While this result would be a significant blow to the adoption of SHA3, it is also the least likely given the overall lack of evidence against randomness. This test needs to be replicated and analyzed before any conclusion can be made about SHA3’s ability to act like a random number generator. In addition to the validation of the Longest Run result and analysis of FFT testing, there are several additional next steps that can strengthen the ability to assess randomness in hash algorithms. The adaptation and validation of the other tests to no longer be constricted by memory would allow for full NIST STS parity with large samples. Additionally, the incomplete gamma function and error function should be assessed from a theoretical and experimental perspective to ensure it can scale to large samples. APPENDIX Test Validation Flow Chart
  • 8. JOURNAL OF ——, 8 Fig. 3: Validation Process Flow
  • 9. JOURNAL OF ——, 9 ACKNOWLEDGMENT The authors would like to thank Dr. Engels and Dr. Mcgee at Southern Methodist University. REFERENCES [1] B. Schneier, Applied cryptography: protocols, algorithms, and source code in C. Wiley, 1996. [2] D. Knuth, Art of Computer Programming, Volume 2: Seminumerical Algorithms (Page 2). Addison-Wesley, 1997. [3] NIST, “Sha-3 standard: Permutation-based hash and extendable-output functions, fips 202,” Tech. Rep. [4] C. Boutin, “Nist selects winner of secure has algorithm (sha-3) competi- tion. nist tech beat. [online],” 2012. [5] Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications., 1st ed., NIST, April 2010. [6] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, “Keccak sponge function family main document. submission to nist (round 2),” Keccak, Tech. Rep., 2009. [7] A. Gholipour and S. Mirzakuchaki, “A pseudorandom number generator with keccak hash function.” International Journal of Computer and Electrical Engineering 3.6, 2011.