2. +
Business (legal) use case
2
• Duty to disclose information – rule FRCP 26
• Preserve relevant information
• Produce information on request
• Keep the information for X years
• Sanctions for obstruction
• Sanctions for non-compliance
5. +
Discovery basics
5
• Obligations of the parties
• At the start of a lawsuit or litigation
possibility, preserve relevant data
• Produce data at request, within timelines
• Review the data before production
• Can request eDiscovery from opponents
• Store and archive
6. +
Interesting facts about eDiscovery
6
• Most of these are proprietary or under NDA
• Representative case size: 5GB to 500GB
• Cost per GB of processing: $5-200, ~$100
• Takes 25-50% of litigation budget
• Days to process and months to review
• Preservation: 3-7 years
• 500 providers, with 10 majors
7. +
Challenges of eDiscovery
7
• Data sizes in the TB
• Seasonal loads, tight deadlines
• Hundreds of file formats
• Heavy read/write load in review
• Text analytics is of paramount importance
• Huge price tickets obstruct justice
8. +
FreeEed main features
8
• Open source Hadoop-based eDiscovery:
• As scalable as Hadoop
• Fast review with NoSQL
• Scales with the lawsuit - time and volume
• Data preservation and archiving with VM
• Only possible with open source license
9. +
Design goals
9
• Built on open source components
• Big Data scalable
• Preservation, chain of custody, archiving
• Scalable technically and business-ly
• Stable (don’t laugh, people get different
results on different runs)
• Close-source compatible (MS + Azure too)
10. +
Packaging architecture
10
• Comes as VM’s
• Grab as few or as many as you want
• No mixing of matters
• No ethical problems
• Preserve for as many years as you want
• 1 VM = 1 corn, FreeEed = free popcorn
14. +
FreeEed popcorn
14
• Deploy on laptops, servers or cloud
• One-node or any number of nodes
• Scalable storage
• Different cooking recipes
• No mixing of matters
• Easy archiving
• Easy deletion
15. +
Processing architecture
15
• Based on golden-image VM
• Controlled cluster start in any environment
• Index / cull on the fly or later
• Immediately searchable
17. +
Cloud integration
Downloadable VM’s
Same VM’s on Amazon AWS
Amazon VM’s are very convenient
Immediate deployment
Any hardware configuration you need
Control lots of power from a limited-power laptop
Azure – working with Microsoft
17
18. +
Review architecture
18
• Lucene
• Solr
• HBase
• Lucene indexes created in reducers and
combined in Solr
• For small matters, write directly to Solr
25. +
FreeEed and data governance
25
• Virtualization for data preservation
• Scalable processing
• Archiving
• Documents groups not mixing
• Data format stored together with software that
understands it
26. +
Hadoop & Big Data applications
26
• Other related applications
• Financial – text analytics
• Energy – documents and procedures
analytics
• Actual on-going projects
27. +
FreeEed as a learning tool
27
• 100’s of downloads
• Dozens of active users
• Real-world Hadoop application
• Many developers download to learn
• Complex, real, but manageable
28. +
FreeEed adoption – who is trying
our “popcorn”?
28
• Large law firms
• Small law firms and solos
• Government agencies
• Universities
• Enterprises
• Developers learn Big Data
30. +
How you can use FreeEed
30
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
31. +
How you can use FreeEed
31
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
32. +
Q&A
32
• Thank you!
• People usually ask:
• How can I put my data in the cloud?
• Is it safe?
• Do you do OCR, PST, OST, etc…?