SlideShare a Scribd company logo
1 of 32
+
Hadoop-based Open Source eDiscovery:
FreeEed
(Easy as popcorn)
+
Business (legal) use case
2
• Duty to disclose information – rule FRCP 26
• Preserve relevant information
• Produce information on request
• Keep the information for X years
• Sanctions for obstruction
• Sanctions for non-compliance
+
Before the thirties
3
• Court room was full of surprises
+
Civil discovery changes this
4
+
Discovery basics
5
• Obligations of the parties
• At the start of a lawsuit or litigation
possibility, preserve relevant data
• Produce data at request, within timelines
• Review the data before production
• Can request eDiscovery from opponents
• Store and archive
+
Interesting facts about eDiscovery
6
• Most of these are proprietary or under NDA
• Representative case size: 5GB to 500GB
• Cost per GB of processing: $5-200, ~$100
• Takes 25-50% of litigation budget
• Days to process and months to review
• Preservation: 3-7 years
• 500 providers, with 10 majors
+
Challenges of eDiscovery
7
• Data sizes in the TB
• Seasonal loads, tight deadlines
• Hundreds of file formats
• Heavy read/write load in review
• Text analytics is of paramount importance
• Huge price tickets obstruct justice
+
FreeEed main features
8
• Open source Hadoop-based eDiscovery:
• As scalable as Hadoop
• Fast review with NoSQL
• Scales with the lawsuit - time and volume
• Data preservation and archiving with VM
• Only possible with open source license
+
Design goals
9
• Built on open source components
• Big Data scalable
• Preservation, chain of custody, archiving
• Scalable technically and business-ly
• Stable (don’t laugh, people get different
results on different runs)
• Close-source compatible (MS + Azure too)
+
Packaging architecture
10
• Comes as VM’s
• Grab as few or as many as you want
• No mixing of matters
• No ethical problems
• Preserve for as many years as you want
• 1 VM = 1 corn, FreeEed = free popcorn
+
FreeEed makes lawyers happy
11
+
FreeEed : Architecture
12
+
FreeEed popcorn is very popular with
lawyers, legal techs, IT, etc.
+
FreeEed popcorn
14
• Deploy on laptops, servers or cloud
• One-node or any number of nodes
• Scalable storage
• Different cooking recipes
• No mixing of matters
• Easy archiving
• Easy deletion
+
Processing architecture
15
• Based on golden-image VM
• Controlled cluster start in any environment
• Index / cull on the fly or later
• Immediately searchable
+
Cluster start-up on EC2
16
+
Cloud integration
 Downloadable VM’s
 Same VM’s on Amazon AWS
 Amazon VM’s are very convenient
 Immediate deployment
 Any hardware configuration you need
 Control lots of power from a limited-power laptop
 Azure – working with Microsoft
17
+
Review architecture
18
• Lucene
• Solr
• HBase
• Lucene indexes created in reducers and
combined in Solr
• For small matters, write directly to Solr
+
Review screen
19
+
Review capabilities
20
• Search
• Cull down
• View text and metadata
• Tag documents
• Export as images or as native files
+
Eagle eye’s view - EDRM
21
+
Left of EDRM – Legal Hold
22
• FreeEedCollect
• Architecture:
https://github.com/markkerzner/FreeEedC
ollect
• ZooKeeper/MapReduce/Flume/HDFS
+
Right of EDRM – Org. charts
23
Partnership with Sintelix
+
Analytics – network of actors
24
Partnership with Sintelix
+
FreeEed and data governance
25
• Virtualization for data preservation
• Scalable processing
• Archiving
• Documents groups not mixing
• Data format stored together with software that
understands it
+
Hadoop & Big Data applications
26
• Other related applications
• Financial – text analytics
• Energy – documents and procedures
analytics
• Actual on-going projects
+
FreeEed as a learning tool
27
• 100’s of downloads
• Dozens of active users
• Real-world Hadoop application
• Many developers download to learn
• Complex, real, but manageable
+
FreeEed adoption – who is trying
our “popcorn”?
28
• Large law firms
• Small law firms and solos
• Government agencies
• Universities
• Enterprises
• Developers learn Big Data
+
Looking forward
29
• Add
• Collection
• Analytics
• Community
• Integrations
• Implementations
+
How you can use FreeEed
30
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
+
How you can use FreeEed
31
• For its intended purpose
• Large law firms
• Small firms and solos,
• Pro-se
• Integrate in the IT legal
• Start a similar document management project
+
Q&A
32
• Thank you!
• People usually ask:
• How can I put my data in the cloud?
• Is it safe?
• Do you do OCR, PST, OST, etc…?

More Related Content

Similar to FreeEed presentation

Similar to FreeEed presentation (20)

Switching to Oracle Document Cloud
Switching to Oracle Document CloudSwitching to Oracle Document Cloud
Switching to Oracle Document Cloud
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
 
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop EcosystemThings Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
AWS Community Day Poland 2022 - Building a Data Lake.pdf
AWS Community Day Poland 2022 - Building a Data Lake.pdfAWS Community Day Poland 2022 - Building a Data Lake.pdf
AWS Community Day Poland 2022 - Building a Data Lake.pdf
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
EDB Postgres in Public Sector
EDB Postgres in Public SectorEDB Postgres in Public Sector
EDB Postgres in Public Sector
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
 
Big Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate SystemsBig Data/Cloudera from Excelerate Systems
Big Data/Cloudera from Excelerate Systems
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Webinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration SystemWebinar: How to Design a Compliant and GDPR Ready Collaboration System
Webinar: How to Design a Compliant and GDPR Ready Collaboration System
 

More from Mark Kerzner

Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
Mark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
Mark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
Mark Kerzner
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
Mark Kerzner
 

More from Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2Automated Hadoop Cluster Construction on EC2
Automated Hadoop Cluster Construction on EC2
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 
Carnavale de Venice
Carnavale de VeniceCarnavale de Venice
Carnavale de Venice
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

FreeEed presentation

  • 1. + Hadoop-based Open Source eDiscovery: FreeEed (Easy as popcorn)
  • 2. + Business (legal) use case 2 • Duty to disclose information – rule FRCP 26 • Preserve relevant information • Produce information on request • Keep the information for X years • Sanctions for obstruction • Sanctions for non-compliance
  • 3. + Before the thirties 3 • Court room was full of surprises
  • 5. + Discovery basics 5 • Obligations of the parties • At the start of a lawsuit or litigation possibility, preserve relevant data • Produce data at request, within timelines • Review the data before production • Can request eDiscovery from opponents • Store and archive
  • 6. + Interesting facts about eDiscovery 6 • Most of these are proprietary or under NDA • Representative case size: 5GB to 500GB • Cost per GB of processing: $5-200, ~$100 • Takes 25-50% of litigation budget • Days to process and months to review • Preservation: 3-7 years • 500 providers, with 10 majors
  • 7. + Challenges of eDiscovery 7 • Data sizes in the TB • Seasonal loads, tight deadlines • Hundreds of file formats • Heavy read/write load in review • Text analytics is of paramount importance • Huge price tickets obstruct justice
  • 8. + FreeEed main features 8 • Open source Hadoop-based eDiscovery: • As scalable as Hadoop • Fast review with NoSQL • Scales with the lawsuit - time and volume • Data preservation and archiving with VM • Only possible with open source license
  • 9. + Design goals 9 • Built on open source components • Big Data scalable • Preservation, chain of custody, archiving • Scalable technically and business-ly • Stable (don’t laugh, people get different results on different runs) • Close-source compatible (MS + Azure too)
  • 10. + Packaging architecture 10 • Comes as VM’s • Grab as few or as many as you want • No mixing of matters • No ethical problems • Preserve for as many years as you want • 1 VM = 1 corn, FreeEed = free popcorn
  • 13. + FreeEed popcorn is very popular with lawyers, legal techs, IT, etc.
  • 14. + FreeEed popcorn 14 • Deploy on laptops, servers or cloud • One-node or any number of nodes • Scalable storage • Different cooking recipes • No mixing of matters • Easy archiving • Easy deletion
  • 15. + Processing architecture 15 • Based on golden-image VM • Controlled cluster start in any environment • Index / cull on the fly or later • Immediately searchable
  • 17. + Cloud integration  Downloadable VM’s  Same VM’s on Amazon AWS  Amazon VM’s are very convenient  Immediate deployment  Any hardware configuration you need  Control lots of power from a limited-power laptop  Azure – working with Microsoft 17
  • 18. + Review architecture 18 • Lucene • Solr • HBase • Lucene indexes created in reducers and combined in Solr • For small matters, write directly to Solr
  • 20. + Review capabilities 20 • Search • Cull down • View text and metadata • Tag documents • Export as images or as native files
  • 21. + Eagle eye’s view - EDRM 21
  • 22. + Left of EDRM – Legal Hold 22 • FreeEedCollect • Architecture: https://github.com/markkerzner/FreeEedC ollect • ZooKeeper/MapReduce/Flume/HDFS
  • 23. + Right of EDRM – Org. charts 23 Partnership with Sintelix
  • 24. + Analytics – network of actors 24 Partnership with Sintelix
  • 25. + FreeEed and data governance 25 • Virtualization for data preservation • Scalable processing • Archiving • Documents groups not mixing • Data format stored together with software that understands it
  • 26. + Hadoop & Big Data applications 26 • Other related applications • Financial – text analytics • Energy – documents and procedures analytics • Actual on-going projects
  • 27. + FreeEed as a learning tool 27 • 100’s of downloads • Dozens of active users • Real-world Hadoop application • Many developers download to learn • Complex, real, but manageable
  • 28. + FreeEed adoption – who is trying our “popcorn”? 28 • Large law firms • Small law firms and solos • Government agencies • Universities • Enterprises • Developers learn Big Data
  • 29. + Looking forward 29 • Add • Collection • Analytics • Community • Integrations • Implementations
  • 30. + How you can use FreeEed 30 • For its intended purpose • Large law firms • Small firms and solos, • Pro-se • Integrate in the IT legal • Start a similar document management project
  • 31. + How you can use FreeEed 31 • For its intended purpose • Large law firms • Small firms and solos, • Pro-se • Integrate in the IT legal • Start a similar document management project
  • 32. + Q&A 32 • Thank you! • People usually ask: • How can I put my data in the cloud? • Is it safe? • Do you do OCR, PST, OST, etc…?