SlideShare a Scribd company logo
1 of 47
Download to read offline
Predictive Coding 2.0
Making E-Discovery
More Efficient and Cost Effective


   John Tredennick
   Jeremy Pickens
    Jim Eidelman
How Many Do I Have to Check?
 1.  You have a bag with 1 million M&Ms
 2.  It contains mostly brown M&Ms
 3.  You cannot see into the bag
 4.  You have a scoop that will pull out 100
     M&Ms at a time
 5.  Your hope is that there are no red
     M&Ms in the bag
 6.  You pull out a scoop and they are all
     brown

  How many scoops do you need to review
  to be confident there are no red M&Ms?
Let’s Take a Poll

  How many scoops?
          2
   1                3

    5      10           20


          100?               500?
                                    1,000?
How Confident Do You Need to Be?

    Does 95% work? How about 99%
    How many errors can you tolerate?
      §    Five out of a hundred?
      §    One out of a hundred?
      §    One percent = 10,000

  At a 95% confidence level and 5% percent margin of error: 384 M&Ms
  At a 99% confidence level and 1% margin of error: 459 M&Ms
  At a 100% confidence level and 0% margin of error: 1,000,000 M&Ms
Predictive Coding
Does it Work?
What Have the Courts Said?
What Have the Courts Said?

       “Until there is a judicial opinion approving (or
       even critiquing) the use of predictive coding,
       counsel will just have to rely on this article as a
       sign of judicial approval. In my opinion,
       computer-assisted coding should be used in
       those cases where it will help ‘secure the just,
       speedy, and inexpensive’ (Fed. R. Civ. P. 1)
       determination of cases in our e-discovery
       world.”

       Magistrate Judge Andrew Peck
Predictive Coding 1.0
 1.  Assemble your corpus.
 2.  Assemble a seed set of
     documents.
 3.  Review the seed set.
 4.  Apply machine learning and
     automatically tag the remainder
     of the corpus.
Predictive Coding 1.0
 §  Tremendous gains in review
     effectiveness
 §  Substantial cost savings
 §  It works. Often quite well




 ….when the corpus is complete.
67.5
uploads per case
533 matters, nearly 36,000 uploads across the matters.
166.3
days loading case
   This is collection driven, not loading limits.
In which upload and on which day do your responsive
documents show up?




    67 166
   uploads days
  Terms that do not appear early begin appearing later.
Machine-Assisted Decision Making

                  Upload timeline of 6 TB case.

                  When should machine-assisted
    Is it here?   decision making (e.g. early case
                  assessment) begin?

                         Or here?
Example: Responsive Early, Junk Later
       To: bob@company.com, alice@company.com
       From: charles@company.com
       Subject: Company Picnic
       Bob, would you coordinate with Alice and make sure we have
       enough hamburger buns for the company picnic? Please try
       and find them at a reasonable price.


Responsive                           Junk
Example: Junk Early, Responsive Later
        To: bob@company.com, alice@privatemail.com
        From: charles@company.com
        Subject: Get Together
        Let’s get together at 7pm at the Sports Bar to discuss pricing of
        our components. The Broncos are playing and I really want to
        watch Tebow.


 Junk
                                                      Responsive
Problems With Predictive Coding 1.0

  The corpus is almost never complete
     §  Continuous collection and rolling uploads
     §  When does “Early Case Assessment” begin?


  Changing Issues
     §  Responsiveness is “bursty”


  Shifting Concept Relationships
     §  Due both to increasing corpus and changing issues
     §  Exploration is extremely limited
Our Approach
 Predictive Coding 2.0 necessitates the ability to deal with
 dynamic change and flux.
 We have developed a flexible analytics framework based
 on bipartite graphs
 It is aware of changes in corpus and in coding so as to
 enable smart review and adaptive related concept
 suggestion as information pours in.
Our Approach
Avoid the lock-in that arises due to poor decision making that
occurs early in the matter when corpus (collection) and coding
information is incomplete.




Goal:
Continuous Case Assessment
What Is Underneath?


    A full bipartite graph of the
   documents and features (e.g.
    words, phrases, dates) that
    comprise those documents
Terms   Documents
Feedback: Immediate and Continuous



Continuous feedback aids better decision
making and predictive coding.

Adapts to both:
    New arrival of coding information
    New arrival of documents and terms
Terms   Documents
Predictive Coding 2.0

                        Feedback – and
                        improvement – is iterative,
                        continuous, amplified.

                               The more you review, the
                               less you have to review




            % of Docs Examined Manually
Better Decisions As Understanding Improves



Term relationships change over time

Using continuous improvement,
decisions can be revised and refined
as the matter proceeds.
Terms   Documents




        Time
        uncovers
        new
        relationships
Looking at Concepts Over Time
                           20%	
               65%	
  
 Start with the            lube	
              fuels	
  
 key term                 piping	
              fob	
  
                         battery	
        purityethane	
  
 “fuel”                 mounted	
        petrochemicals	
  
                       redundant	
               fin	
  
                        batteries	
        paraxylene	
  
  At 20%             compartments	
              cif	
  
                         mixture	
              phy	
  
  these are              airflow	
              fwd	
  
  the related              ansi	
             swopt	
  
                       ventilation	
      brentpartials	
  
  terms                 chargers	
              brg	
  
                        stainless	
         locswap	
  
                           rotor	
          benzene	
  
 And at 65%               bleed	
               diff	
  
                       accessory	
              spd	
  
                         plenum	
            liquids	
  
                        detector	
              opt	
  
Related Terms Through Coding Filters
Terms   Documents

         Responsive


        NonResponsive
Putting Related Concepts to Work
                     The whole corpus

                     Topic 203
                     …whether the Company had met,
                     or could, would, or might meet its
                     financial forecasts, models,
                     projections, or plans…

                     Topic 205
                     …analyses, evaluations,
 TREC collection     projections, plans, and reports on
 with many topics    the volume(s) or geographic
     identified      location(s) of energy loads.
Model In the Whole Collection
                          Term	
        Score	
  
  Look at the
  keyword “model”      modeling	
       1000	
  
                       equation	
       864	
  
  Scope is the        stochastic	
      706	
  
  whole collection     variables	
      677	
  
                      parameters	
      518	
  
                      probability	
     365	
  
                      simulation	
      337	
  
                      assumption	
      325	
  
                        returns	
       251	
  
                        curves	
         211	
  
Model In Topic 203
                          Term	
        Score	
  
  Look at the
  keyword “model”        flows	
        1000	
  
                     assumptions	
      913	
  
  Scope: Topic 203       gains	
        872	
  
                        shares	
        864	
  
      meeting          liquidity	
      486	
  
      financial      fluctuations	
     374	
  
     forecasts
                       analysts	
       285	
  
                         cents	
        254	
  
                      whitewing	
       237	
  
                       handles	
        166	
  
Model In Topic 205
                         Term	
        Score	
  
  Look at the
  keyword “model”        bids	
        1000	
  
                     congestion	
       611	
  
  Scope: Topic 205      loads	
        455	
  
                     constraints	
     354	
  
     analyzing        clearing	
       292	
  
      energy            zonal	
        194	
  
     volumes
                       signals	
       192	
  
                       procure	
       190	
  
                      dispatch	
       152	
  
                         csc	
         120	
  
Model In Comparison
    Now,           Whole Corpus	
        Topic 203	
       Topic	
  205	
  

imagine this        modeling	
            flows	
            bids
with batches        equation	
        assumptions	
      congestion
 and coding        stochastic	
           gains	
           loads
  changes           variables	
          shares	
        constraints
 over time!        parameters	
         liquidity	
       clearing
Note: Our system   probability	
      fluctuations	
        zonal
can accept any     simulation	
         analysis	
         signal
combination of
coding and         assumption	
           cents	
         procure
metadata filters
to dynamically
                     returns	
         whitewing	
        dispatch
assess your data     curves	
           handles	
             csc
Summary

  Incomplete Collections



  Changing Coding Calls



Havoc for Machine Coding
Predictive Coding 2.0
Problem: The corpus is almost never complete
   Answer: Review Algorithms that are iterative and continuous
Problem: Changing Issues
   Answer: Review Algorithms that are adaptive and continuous
Problem: Shifting Concept Relationships
   Answer: Concept Relationships that are calculated dynamically, on-
   the-fly, and coding-aware.



Continuous Case Assessment
Analytics Consulting
§  Analytics consulting and predictive ranking for nearly 4 years
§  How it started -- Before “Predictive Coding” became popular:

     “Can’t you predict what documents are probably
      relevant based on your review so far?”
                                   – Judge, SDNY

§  Predictive Ranking: Iterative search techniques + algorithms
§  Then off-the-shelf Predictive Coding 1.0 technologies
§  Catalyst’s research is exciting! We apply the research to real-world
    scenarios. Applying Bipartite Analytics…
Smart Review with the Bipartite Analytics


  Technology Advantages:
     §  Accurate
     §  Dynamic
     §  Flexible
     §  “Just in Time” suggestions
Smart Review Scenarios
1. “What happened” – examples: FCPA investigation, conspiracy ECA

2. Typical large scale litigation with lots of ESI –
   e.g., class action lawsuit

3. Highly complex litigation with multiple issues –
   e.g. patent and unfair competition claims
Scenario 1 – What happened?
Goal: Rapidly determine facts and resolve matter if possible

Applying the Technology

Small number of knowledgeable attorneys drill into documents using the
fusion of advanced search features and flexible predictive coding.
Scenario 1 – What happened?
  Goal: Rapidly determine facts and resolve matter if possible

  Applying the Technology
  Small number of knowledgeable attorneys drill into documents using
  the fusion of advanced search features and flexible predictive coding.

     §  Faster location of valuable “veins” of information
         due to search filters
     §  Rapid learning and application of that learning
         through flexible, “just in time” predictive coding 2.0.
     §  “Choose your own adventure”
Scenario 2 – Large Scale Litigation
  Goal: Minimize cost because of learning across large document set,
  increase quality with focused review, and maximize protection of
  privilege and trade secrets

  Applying the Technology:
    §  Prioritized review based on rapid, continuous learning
    §  Large scale defensible culling
    §  More accurate ranking of “potentially privileged” documents
Scenario 3– Highly Complex Litigation
Goal: Review and produce with multiple and changing issues

Applying the Technology
 §  Rapid learning across multiple topics
 §  Leverage ability to adjust for change in topics
 §  Review quality improves because of focus
 §  Explore otherwise hidden subjects with Concept Explorer
 §  Leverage learning across narrow, focused lines of inquiry (e.g.,
     emails between two people in a narrow time window)
 §  Protect privileged documents
Predictive Coding 2.0
Making E-Discovery
More Efficient and Cost Effective


   John Tredennick
   Jeremy Pickens
    Jim Eidelman

More Related Content

Viewers also liked

Verso la Sostenibilità economica degli ecoquartieri
Verso la Sostenibilità economica degli ecoquartieriVerso la Sostenibilità economica degli ecoquartieri
Verso la Sostenibilità economica degli ecoquartierieAmbiente
 
Progress of my ALT project
Progress of my ALT projectProgress of my ALT project
Progress of my ALT projectstomaskovic
 
Charity Fundraiser Scheme
Charity Fundraiser SchemeCharity Fundraiser Scheme
Charity Fundraiser Schemeutilities4you
 
บทที่1
บทที่1บทที่1
บทที่1airly2011
 
Start up accounting -med- 9-15-10
Start up accounting -med- 9-15-10Start up accounting -med- 9-15-10
Start up accounting -med- 9-15-10Katrina Harrell
 
Bai6tin10 131222063859-phpapp01
Bai6tin10 131222063859-phpapp01Bai6tin10 131222063859-phpapp01
Bai6tin10 131222063859-phpapp01Bich Tuyen
 
Debt Relief
Debt ReliefDebt Relief
Debt ReliefRyan
 
Fiesta del 21 de Mayo
Fiesta del 21 de MayoFiesta del 21 de Mayo
Fiesta del 21 de Mayotata
 
Increase Your Sales, Leads & Website Traffic in 2013
Increase Your Sales, Leads & Website Traffic in 2013Increase Your Sales, Leads & Website Traffic in 2013
Increase Your Sales, Leads & Website Traffic in 2013Purple Trout, LLC
 

Viewers also liked (15)

Setaepilazione
SetaepilazioneSetaepilazione
Setaepilazione
 
Verso la Sostenibilità economica degli ecoquartieri
Verso la Sostenibilità economica degli ecoquartieriVerso la Sostenibilità economica degli ecoquartieri
Verso la Sostenibilità economica degli ecoquartieri
 
100 ideas una
100 ideas   una100 ideas   una
100 ideas una
 
Progress of my ALT project
Progress of my ALT projectProgress of my ALT project
Progress of my ALT project
 
Lisa's work
Lisa's workLisa's work
Lisa's work
 
Presentation1
Presentation1Presentation1
Presentation1
 
Charity Fundraiser Scheme
Charity Fundraiser SchemeCharity Fundraiser Scheme
Charity Fundraiser Scheme
 
บทที่1
บทที่1บทที่1
บทที่1
 
Start up accounting -med- 9-15-10
Start up accounting -med- 9-15-10Start up accounting -med- 9-15-10
Start up accounting -med- 9-15-10
 
recession7892
recession7892recession7892
recession7892
 
Bai6tin10 131222063859-phpapp01
Bai6tin10 131222063859-phpapp01Bai6tin10 131222063859-phpapp01
Bai6tin10 131222063859-phpapp01
 
Debt Relief
Debt ReliefDebt Relief
Debt Relief
 
Impact of FDI in Ireland 2012
Impact of FDI in Ireland 2012Impact of FDI in Ireland 2012
Impact of FDI in Ireland 2012
 
Fiesta del 21 de Mayo
Fiesta del 21 de MayoFiesta del 21 de Mayo
Fiesta del 21 de Mayo
 
Increase Your Sales, Leads & Website Traffic in 2013
Increase Your Sales, Leads & Website Traffic in 2013Increase Your Sales, Leads & Website Traffic in 2013
Increase Your Sales, Leads & Website Traffic in 2013
 

Similar to Continuous Case Assessment

Creating Stable Assignments
Creating Stable AssignmentsCreating Stable Assignments
Creating Stable AssignmentsKevlin Henney
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Ian McDonald
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?Daniel Alencar
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891SAIL_QU
 
Technical Debt and Requirements
Technical Debt and RequirementsTechnical Debt and Requirements
Technical Debt and RequirementsNeil Ernst
 
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...WinOps Conf
 
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012Jon Powell - MECLABS - offer response optimization - IMS Boston 2012
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012thepulsenetwork
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
Targeting Risky Credit
Targeting Risky CreditTargeting Risky Credit
Targeting Risky CreditAARollason
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...Amazon Web Services
 
Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionShane McIntosh
 
Building on quicksand microservices indicthreads
Building on quicksand microservices  indicthreadsBuilding on quicksand microservices  indicthreads
Building on quicksand microservices indicthreadsIndicThreads
 
Finding a good development partner
Finding a good development partnerFinding a good development partner
Finding a good development partnerKevin Poorman
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docx
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docxACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docx
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docxSALU18
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Fine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry BalabkaFine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry BalabkaDevClub_lv
 

Similar to Continuous Case Assessment (20)

Code quality
Code quality Code quality
Code quality
 
Creating Stable Assignments
Creating Stable AssignmentsCreating Stable Assignments
Creating Stable Assignments
 
9X5u87KWa267pP7aGX3K
9X5u87KWa267pP7aGX3K9X5u87KWa267pP7aGX3K
9X5u87KWa267pP7aGX3K
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2
 
How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?How much time it takes for my feature to arrive?
How much time it takes for my feature to arrive?
 
Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891Icsme14danieletal 150722141344-lva1-app6891
Icsme14danieletal 150722141344-lva1-app6891
 
Technical Debt and Requirements
Technical Debt and RequirementsTechnical Debt and Requirements
Technical Debt and Requirements
 
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...
WinOps Conf 2016 - Gael Colas - Configuration Management Theory: Why Idempote...
 
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012Jon Powell - MECLABS - offer response optimization - IMS Boston 2012
Jon Powell - MECLABS - offer response optimization - IMS Boston 2012
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
Targeting Risky Credit
Targeting Risky CreditTargeting Risky Credit
Targeting Risky Credit
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
 
Cross-Project Build Co-change Prediction
Cross-Project Build Co-change PredictionCross-Project Build Co-change Prediction
Cross-Project Build Co-change Prediction
 
Building on quicksand microservices indicthreads
Building on quicksand microservices  indicthreadsBuilding on quicksand microservices  indicthreads
Building on quicksand microservices indicthreads
 
Finding a good development partner
Finding a good development partnerFinding a good development partner
Finding a good development partner
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docx
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docxACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docx
ACCOUNTING INFORMATION SYSTEMSAccess and Data Analytics Test.docx
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Fine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry BalabkaFine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry Balabka
 

Continuous Case Assessment

  • 1. Predictive Coding 2.0 Making E-Discovery More Efficient and Cost Effective John Tredennick Jeremy Pickens Jim Eidelman
  • 2. How Many Do I Have to Check? 1.  You have a bag with 1 million M&Ms 2.  It contains mostly brown M&Ms 3.  You cannot see into the bag 4.  You have a scoop that will pull out 100 M&Ms at a time 5.  Your hope is that there are no red M&Ms in the bag 6.  You pull out a scoop and they are all brown How many scoops do you need to review to be confident there are no red M&Ms?
  • 3. Let’s Take a Poll How many scoops? 2 1 3 5 10 20 100? 500? 1,000?
  • 4. How Confident Do You Need to Be? Does 95% work? How about 99% How many errors can you tolerate? §  Five out of a hundred? §  One out of a hundred? §  One percent = 10,000 At a 95% confidence level and 5% percent margin of error: 384 M&Ms At a 99% confidence level and 1% margin of error: 459 M&Ms At a 100% confidence level and 0% margin of error: 1,000,000 M&Ms
  • 7. What Have the Courts Said?
  • 8. What Have the Courts Said? “Until there is a judicial opinion approving (or even critiquing) the use of predictive coding, counsel will just have to rely on this article as a sign of judicial approval. In my opinion, computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ (Fed. R. Civ. P. 1) determination of cases in our e-discovery world.” Magistrate Judge Andrew Peck
  • 9. Predictive Coding 1.0 1.  Assemble your corpus. 2.  Assemble a seed set of documents. 3.  Review the seed set. 4.  Apply machine learning and automatically tag the remainder of the corpus.
  • 10. Predictive Coding 1.0 §  Tremendous gains in review effectiveness §  Substantial cost savings §  It works. Often quite well ….when the corpus is complete.
  • 11. 67.5 uploads per case 533 matters, nearly 36,000 uploads across the matters.
  • 12. 166.3 days loading case This is collection driven, not loading limits.
  • 13. In which upload and on which day do your responsive documents show up? 67 166 uploads days Terms that do not appear early begin appearing later.
  • 14. Machine-Assisted Decision Making Upload timeline of 6 TB case. When should machine-assisted Is it here? decision making (e.g. early case assessment) begin? Or here?
  • 15. Example: Responsive Early, Junk Later To: bob@company.com, alice@company.com From: charles@company.com Subject: Company Picnic Bob, would you coordinate with Alice and make sure we have enough hamburger buns for the company picnic? Please try and find them at a reasonable price. Responsive Junk
  • 16. Example: Junk Early, Responsive Later To: bob@company.com, alice@privatemail.com From: charles@company.com Subject: Get Together Let’s get together at 7pm at the Sports Bar to discuss pricing of our components. The Broncos are playing and I really want to watch Tebow. Junk Responsive
  • 17. Problems With Predictive Coding 1.0 The corpus is almost never complete §  Continuous collection and rolling uploads §  When does “Early Case Assessment” begin? Changing Issues §  Responsiveness is “bursty” Shifting Concept Relationships §  Due both to increasing corpus and changing issues §  Exploration is extremely limited
  • 18. Our Approach Predictive Coding 2.0 necessitates the ability to deal with dynamic change and flux. We have developed a flexible analytics framework based on bipartite graphs It is aware of changes in corpus and in coding so as to enable smart review and adaptive related concept suggestion as information pours in.
  • 19. Our Approach Avoid the lock-in that arises due to poor decision making that occurs early in the matter when corpus (collection) and coding information is incomplete. Goal: Continuous Case Assessment
  • 20. What Is Underneath? A full bipartite graph of the documents and features (e.g. words, phrases, dates) that comprise those documents
  • 21. Terms Documents
  • 22. Feedback: Immediate and Continuous Continuous feedback aids better decision making and predictive coding. Adapts to both: New arrival of coding information New arrival of documents and terms
  • 23. Terms Documents
  • 24. Predictive Coding 2.0 Feedback – and improvement – is iterative, continuous, amplified. The more you review, the less you have to review % of Docs Examined Manually
  • 25. Better Decisions As Understanding Improves Term relationships change over time Using continuous improvement, decisions can be revised and refined as the matter proceeds.
  • 26. Terms Documents Time uncovers new relationships
  • 27. Looking at Concepts Over Time 20%   65%   Start with the lube   fuels   key term piping   fob   battery   purityethane   “fuel” mounted   petrochemicals   redundant   fin   batteries   paraxylene   At 20% compartments   cif   mixture   phy   these are airflow   fwd   the related ansi   swopt   ventilation   brentpartials   terms chargers   brg   stainless   locswap   rotor   benzene   And at 65% bleed   diff   accessory   spd   plenum   liquids   detector   opt  
  • 28. Related Terms Through Coding Filters
  • 29. Terms Documents Responsive NonResponsive
  • 30. Putting Related Concepts to Work The whole corpus Topic 203 …whether the Company had met, or could, would, or might meet its financial forecasts, models, projections, or plans… Topic 205 …analyses, evaluations, TREC collection projections, plans, and reports on with many topics the volume(s) or geographic identified location(s) of energy loads.
  • 31. Model In the Whole Collection Term   Score   Look at the keyword “model” modeling   1000   equation   864   Scope is the stochastic   706   whole collection variables   677   parameters   518   probability   365   simulation   337   assumption   325   returns   251   curves   211  
  • 32. Model In Topic 203 Term   Score   Look at the keyword “model” flows   1000   assumptions   913   Scope: Topic 203 gains   872   shares   864   meeting liquidity   486   financial fluctuations   374   forecasts analysts   285   cents   254   whitewing   237   handles   166  
  • 33. Model In Topic 205 Term   Score   Look at the keyword “model” bids   1000   congestion   611   Scope: Topic 205 loads   455   constraints   354   analyzing clearing   292   energy zonal   194   volumes signals   192   procure   190   dispatch   152   csc   120  
  • 34. Model In Comparison Now, Whole Corpus   Topic 203   Topic  205   imagine this modeling   flows   bids with batches equation   assumptions   congestion and coding stochastic   gains   loads changes variables   shares   constraints over time! parameters   liquidity   clearing Note: Our system probability   fluctuations   zonal can accept any simulation   analysis   signal combination of coding and assumption   cents   procure metadata filters to dynamically returns   whitewing   dispatch assess your data curves   handles   csc
  • 35. Summary Incomplete Collections Changing Coding Calls Havoc for Machine Coding
  • 36. Predictive Coding 2.0 Problem: The corpus is almost never complete Answer: Review Algorithms that are iterative and continuous Problem: Changing Issues Answer: Review Algorithms that are adaptive and continuous Problem: Shifting Concept Relationships Answer: Concept Relationships that are calculated dynamically, on- the-fly, and coding-aware. Continuous Case Assessment
  • 37. Analytics Consulting §  Analytics consulting and predictive ranking for nearly 4 years §  How it started -- Before “Predictive Coding” became popular: “Can’t you predict what documents are probably relevant based on your review so far?” – Judge, SDNY §  Predictive Ranking: Iterative search techniques + algorithms §  Then off-the-shelf Predictive Coding 1.0 technologies §  Catalyst’s research is exciting! We apply the research to real-world scenarios. Applying Bipartite Analytics…
  • 38. Smart Review with the Bipartite Analytics Technology Advantages: §  Accurate §  Dynamic §  Flexible §  “Just in Time” suggestions
  • 39. Smart Review Scenarios 1. “What happened” – examples: FCPA investigation, conspiracy ECA 2. Typical large scale litigation with lots of ESI – e.g., class action lawsuit 3. Highly complex litigation with multiple issues – e.g. patent and unfair competition claims
  • 40. Scenario 1 – What happened? Goal: Rapidly determine facts and resolve matter if possible Applying the Technology Small number of knowledgeable attorneys drill into documents using the fusion of advanced search features and flexible predictive coding.
  • 41.
  • 42.
  • 43.
  • 44. Scenario 1 – What happened? Goal: Rapidly determine facts and resolve matter if possible Applying the Technology Small number of knowledgeable attorneys drill into documents using the fusion of advanced search features and flexible predictive coding. §  Faster location of valuable “veins” of information due to search filters §  Rapid learning and application of that learning through flexible, “just in time” predictive coding 2.0. §  “Choose your own adventure”
  • 45. Scenario 2 – Large Scale Litigation Goal: Minimize cost because of learning across large document set, increase quality with focused review, and maximize protection of privilege and trade secrets Applying the Technology: §  Prioritized review based on rapid, continuous learning §  Large scale defensible culling §  More accurate ranking of “potentially privileged” documents
  • 46. Scenario 3– Highly Complex Litigation Goal: Review and produce with multiple and changing issues Applying the Technology §  Rapid learning across multiple topics §  Leverage ability to adjust for change in topics §  Review quality improves because of focus §  Explore otherwise hidden subjects with Concept Explorer §  Leverage learning across narrow, focused lines of inquiry (e.g., emails between two people in a narrow time window) §  Protect privileged documents
  • 47. Predictive Coding 2.0 Making E-Discovery More Efficient and Cost Effective John Tredennick Jeremy Pickens Jim Eidelman