6. Parallelize Digesting
Independent
IO and digest
threads
Always have work for the
digest algorithm.
Large files saw over 95% of
algorithm potential.
Small files unchanged.
8. Integrity across the
network
Internal
Prove
Auditing
your hardware
Peer-Auditing
Prove
Digital
your friends
Signatures
Prove identity
Token Based
Prove time
Cover two topicsObservations on storage and hashing performanceHashing in Chronopolis and larger systems
ACNC RAID - 20MB small io, 62MB large block
Java implementation is default implementationCrypto++But….Real world performance where you have a read/digest pattern
Overtime the performance bottleneck flipped from storage and subsystems being a bottleneck to the hashing algorithm being a bottleneckCan we hash enough in time
Two ways to parallelizeMultiple simul filesOr thread a single file digestImplementation was two threads and a set of buffers that were passed between the threads
This is possible today.Question, what happens in the future for recovery
There is not a one size fits all for integrity checkingEach has their strengths/weaknessesInternal - vulnerable to malice, deletions, etcPeer - requires existing relationship and data at both sidesDigital Signatures - trusting sig hasn’t been compromised. If it has, then nothing can be trusted - revocation doesn’t really workToken - This is ACE - small information next to file can prove file hasn’t been tampered with - proves date, but not necessarily identityWhat should you use? - All, whatever is appropriate
Chronopolis uses ACE internallyManifests are producer supplied - we create our own token due to weak manifests from a producer (md5, etc)To trace back, we need tokens from ingestion node
Single token back to ingest - token issued inline with manifest validation - nodes become transparentIdeal - tokens issues at producer - explain how tokens can be issued before producer
There is an ACE token format which packs file identifiers (paths) and ACE tokensDesigned to be embedded in process
We can use ACE tokens and extended integrity information to prove provenanceIn a cloud, digests ONLY validate non-corrupt transfer - does not protect against tamperingMost/all cloud systems support extended metadata - use it for advanced integrity information - tokens are 5-6 extra headers - allows for end user validation of data
Reasons for hashing - operational, malice, provenanceHashing costs - currently flat, however SHA3 may change that (BLAKE alg 15Gbps+)