How to Troubleshoot Apps for the Modern Connected Worker
EDRM Foundational e-Discovery Practices-ilta
1. 6/20/2013
1
The EDRM - Understanding
Foundational Electronic
Discovery Principles
David J. Kearney
ILTA Volunteer City Representative
June 20, 2013
EDRM - Overview
http://www.edrm.net/
Stands for The Electronic Discovery Reference
Model
First launched in 2005 and released publically in
2006
Developed to provide a standardized approach to
e-Discovery related activities
Helps visually depict the movement of electronic
discovery components from one phase to the next.
Contains 9 phases/stages
EDRM - Overview
Information Management
Identification
Preservation
Collection
Processing
Review
Analysis
Production
Presentation
2. 6/20/2013
2
EDRM - Overview
Stages standardize workflow
Stages are not fixed sequentially
Not meant as a literal, linear or
waterfall model
The EDRM is meant to be iterative
in nature
EDRM - Overview
Recent Article – January 22, 2013
[Elleanor] Chin says one way to create greater
cost predictability is for lawyers to recognize
the elements of e-discovery projects (as
defined by the EDRM model) that remain
largely consistent from matter to matter and
“reuse existing technology, workflows, and
infrastructure and to conserve the cost of legal
decision making, for executing the big picture
strategy.”
Modernizing E-Discovery Planning and Budgeting
http://www.exterro.com/e-discovery-beat/2013/01/22/modernizing-
e-discovery-planning-and-budgeting/
By: Andrew Bartholomew
6/20/2013
EDRM - Overview
3. 6/20/2013
3
Information Management
Information Management
Many issues can be better managed if this
stage is taken seriously and implemented
with consistent & sound practices.
This is THE STARTING POINT for the entire
process. Sound and comprehensive
information management strategies aid
organizations in the identification,
preservation, and collection steps of the
process and can lower the number of
documents that need to be preserved,
collected, reviewed and produced. This is
where more organizations can GET IT RIGHT.
Furthermore, risks and costs are reduced.
Information Management
Information Management
Has morphed into Information
Governance
Information governance,
records and information
management, and data
disposition policies are ways to
help lower costs and mitigate
risks for organizations.
Information Management
“Part of the reason eDiscovery is so
expensive is because companies have
so much data that serves no business
need. … Companies are going to
realize that it’s important to get their
information governance under control
to get rid of the data that has no
business need … in ways that will
improve the company's bottom line…”
— U.S. Magistrate Judge Andrew J. Peck, CGOC
Faculty Member, in a video interview courtesy of JD
Supra Law News, February 4, 2013.
4. 6/20/2013
4
Identification
Locating potential sources of ESI & determining its
scope, breadth & depth.
Identification
Identify individuals responsible for the resource.
Interview end-users who input data, and the personnel who perform
maintenance on the resource.
Custodian ESI may include:
E-mail
Personal storage on hardware devices (handheld devices, DVDs, thumb drives)
Allocated network storage
Private data storage (home computer, personal e-mail)
Data associated with social networking sites used by the custodian
Non-custodian ESI sources are those not held by a particular person.
Examples of non-custodian sources:
Databases
Wikis
Shared network storage locations
Preservation
Ensuring that ESI is protected against inappropriate
alteration or destruction.
5. 6/20/2013
5
Preservation
Preservation is the process of retaining documents,
including electronic documents, for legal purposes
and should include the suspension of normal
document retention/destruction practices and
policy.
That means that any data must be retained as it is,
status quo, either by copying the data to another
location or stopping all automated or manual
changes or destruction of the material.
It must also be determined if there is a possibility of
the data being modified, deleted, and if so, needs
to be preserved to another location.
Collection
Gathering ESI for further use in the e-discovery process (processing, review, etc.).
Once documents/files have been preserved (sometime one and the
same), collection can begin
Transfer/acquisition of data for review
Includes; Servers, PCs, Macs, Linux, Windows, iOS, Android, handheld
devices, flash/thumb drives , tablets, MP3 players, phone systems, backup
tapes, CD/DVD, databases (financial, CRM, ERP), structured/unstructured
data, Cloud/Social Networking Sites
Proper planning and careful implementation can reduce time & money
spent
Ensures integrity of evidence
Proper collection can guard against future disputes (discovery about
discovery – causes unneeded rancor between parties)
Process must be defensible, proportionate, efficient, auditable, and
targeted.
May impact and expand the scope of the discovery process
Collection costs can be significant
Collection
A reasonable collection strategy must address
what ESI should be collected, when, and how
What: The total corpus of potentially collectible
ESI will usually have been defined during the
process of formulating the internal preservation
directive/litigation hold. Usually consists of four
main categories of data locations:
1. Individual employee files
2. Department/group files
3. Enterprise databases
4. Backup Media
6. 6/20/2013
6
Collection
When: Not all data identified for preservation
needs to be collected right away. Some data
may never need to be collected. Collecting all
data that has been preserved may unnecessarily
inflate costs and overwhelm the case team with
irrelevant data
How: Once the timing of collection from a data
location has been decided, the team must
assess what level of forensic defensibility should
be employed for the collection
Collection
Normal collection processes generally involve
straight forward copying, that maintains the
integrity of the metadata, of the ESI as it exists on
the system
A forensic protocol must ensure that the process is
carried out in a way that will produce reliable
information consistently, so the individual
conducting the collection can testify
The protocol must also provide for a means of
verifying the integrity of the work that has been
done by maintaining an untouched mirror copy of
the inspected materials
Collection
Maintaining Integrity of Metadata…
The single most important thing that can be done
is to use a software or hardware write blocker.
7. 6/20/2013
7
Collection
Metadata
System Metadata - Data about the architecture
of the system
File Metadata - Data about the data in a specific
file that is recorded internal to that file
Collection
Acquisition is actually the proper term for
collecting electronic data. In digital forensics,
examiners refer to the copying of data as
acquiring to avoid any confusion that might be
caused by using “copying”, since copying
doesn’t imply that the copy was made in a
forensically sound manner.
Collection
Tools Used During Collection:
Write Blocker
LEO
Suites
Task Specific
Software
Hardware
8. 6/20/2013
8
Collection
Forensically Defensible Collection – a forensically sound
collection will preserve all potentially relevant metadata
that may be of use to the trial team in its claims. This
collection type utilizes a “write-blocker” to prevent
alteration of source media when a device is attached to
retrieve the data.
Maintains rigorous chain-of-custody controls that
document all collection steps, from initial access to the
point of storage or processing.
Ensures that nothing about the data is altered or
degraded
A collection by a third-party vendor will often be the best
method.
Typical of a targeted collection
Collection
Forensic Collection – a forensic copy of a hard
drive will include every byte of data on that
drive, including data in unallocated space and
slack space. Forensic inspection of a party’s
computer system is rarely necessary.
Because forensic collections are much more
invasive and inclusive, there is a greater risk of
disclosure of information that is either irrelevant to
the matter or protected by privilege claims. The
forensic protocol must therefore take steps to
mitigate risks and protect the producing party.
Collection
The decision regarding the degree of forensic
defensibility will be required for ESI collection. This
decision must be made on an individual basis
depending on the cost, accessibility, and needs of
the case.
The software & process used must, at least, be
capable of write protecting the files during the
collection process and maintaining the integrity of
both the system and file metadata associated with
each file/document
One constant is the need to have detailed and
complete documentation of the critical decisions
and actions made during the collection process
9. 6/20/2013
9
Collection
Whether or not a file server should be forensically
collected depends on the nature of the
investigation. More often than not, collecting the
active data and relevant network shares is
appropriate
If extracting an event, log, intrusion, or other time
critical event, forensic imaging of the entire
server may be necessary
Collection
Collection can be accomplished by:
The Client – Corporate/IT Personnel
Custodians – Potential dangers when
custodians/clients try to collect their own data –
especially when seeking consistency and unbiased
process, e.g. 10, 25, 50 custodians and a delete key.
Outside Law Firm
Vendor
Not Reasonably Accessible
Balancing Test:
Cost of converting data into more accessible
format
Cost to review the data for responsiveness,
privilege, or other concerns
Business disruption and other internal costs
Other issues to address:
Relevance of data residing on the source
Overall litigation value of the data at issue
Other means to get information
Collection
10. 6/20/2013
10
Sources of ESI
Shared network resources are resources, files, or
other data shared throughout the network being
examined, such as
E-Mail servers
Document Servers
Files Servers
Other resources shared across the network
Collection
Other sources
Cloud/web-based storage and E-Mail (e.g. Gmail,
Yahoo, Box, Dropbox, Facebook…
Absent a subpoena or court order, it is nearly
impossible to collect the data held by an ISP
Flash, temporary, and ephemeral data storage
(e.g., thumb/external drives leave data droppings)
Social Networking applications
Databases (reports v. exporting the data)
Collection
Structured v. Unstructured data
Differences & Specifics
Structured Data - Information with a high degree of
organization
Relies on users
Legal Hold at application level
Unstructured Data
No identifiable structure
Potential large number of users
May be largely duplicative
How it is applied to e-Discovery
Structured Data – e-Discovery expenses are IT & User costs
for identification, Collection, and Legal Hold
Unstructured Data – Costs are for Processing, Analysis &
Review
Collection
11. 6/20/2013
11
Cost Factors & Considerations
Travel to different locations to have personnel on-site
to perform collection
Whether the collection is performed by use of an
automated script that can run remotely or without
manual operation
Custodian interviews at the time of the collection
may raise initial costs, but are more efficient in the
long run since such interview will likely to be
ultimately needed
Forensic collection require the use of different, more
complicated techniques, and the collected data
will need extra handling during processing and
review
Collection
Cost Factors & Considerations
Impacted by the number of megabytes, gigabytes,
terabytes, Petabytes, Exabytes, etc. needed to be
collected
The human review, which can be the most time
consuming and expensive part of the entire e-discovery
process…even if using Technology Assisted
Review…volume of review becomes larger with the
amount of data collected, just by basic nature of
more…
Controlling, Monitoring, and being able to justify a
sound stepped approach to limit the data being
collected (custodians, data range, etc.)
Collection
Collection
Quality Control
Validating that all ESI has been collected. In general, over-inclusive
collections, coupled with repeatable, documented, and defensible
methods to cull and search ESI will be most effective at validating the
collection of ESI
Courts are increasingly sensitive to the costs of electronic discovery
and the concept of proportionality, which should be taken into
account when assessing the scope of the collection
In some cases, the use of software tools will aid in validating the
collection of ESI. Failure to use commonly accepted methods and
technologies may expose the client to additional risk
In addition, each piece of digital data can generate a unique value,
known as a HASH VALUE. Commonly used hash formats are “MD5”
and “SHA-1”. If a dispute arises about the integrity of a piece of
information, the hash value of the original data can be compared
with the original's hash value
12. 6/20/2013
12
Collection
Other commonly used tools and devices for
collection
Faraday Bags
Inventory & Tracking System
Check-in & Check-out Procedures
Cameras and Video Recording
Collection
Tips
When wrongdoing is suspected, don’t “take a quick peak” at
a computer without forensic collection
Don’t delay to preserve a device
Don’t assume that all devices are the same a PCs
Always document the process
Don’t assume that the device is not encrypted
Do not save time/money by using traditional file copy
methods
Don’t process everything at one time that was collected
Test and sample search terms and expressions against the
dataset
Examine foreign language types – make sure you have a solid
understanding od the data
Processing
Reducing the volume of ESI and
converting it, if necessary, to forms
more suitable for review & analysis.
13. 6/20/2013
13
Native Format
Documents in native format:
Have not been converted in any way from its
original form
Will appear and behave exactly as they did at the
point of creation
If produced in native form, no costs incurred to
convert into another format
Contains full metadata, which often includes
privileged or sensitive information (subject, author,
date, tracking changes, etc.)
Imaged Format
Documents in imaged format:
Equivalent to printing a document and creating a
static page image
Can be time-consuming, expensive to process
Can lead to loss of information useful to requesting
party, i.e. the loss of metadata
Metadata
Metadata, which is a part of all types of ESI, exists
in fields that can be used to populate a load file
database created by the requesting party.
Examples of metadata fields are:
Names (author, sender, recipient, blind
recipients)
Dates (create date, sent, received, modified)
Subject (primarily for e-mail)
Document type
“Text” (searchable field containing the text or
body of the document itself) –
14. 6/20/2013
14
Processing
Assessment
Assessment is a critical first step in the workflow as
it allows the processing team to ensure that the
processing phase is aligned with the overall e-
discovery strategy, identify any processing
optimizations that may result in substantive cost
savings and minimize the risks associated with
processing. A critical aspect of this step is to
ensure that the processing methodology will yield
the expected results in terms of the effort, time
and costs, as well as expected output data
streams.
Processing
Preparation
During assessment a determination is made as to
which classes of data need to be moved forward
through processing. At that point there may be a
number of activities required to enable handling
and reduction of that data.
Processing
Selection
One of the primary reasons for “processing” data
in an e-discovery project is so that a reasonable
selection can be made of data that should be
moved forward into an attorney review stage
15. 6/20/2013
15
Processing
Output
The data that has been selected to move
forward to review is transformed into any number
of formats depending on requirements of the
downstream review platforms, or in certain
circumstances simply passed on to a review
platform in its existing format; or it may be
exported in a native format.
Processing
Overall Analysis / Validation
Throughout the four phases of processing there
are opportunities to analyze the data or results of
certain sub-processes to ensure that overall
results are what was intended, or that decisions
as to the handling of the data are valid and
appropriate.
Processing
Overall Quality Control
Validation is the testing of results to ensure that
appropriate high level processing and selection
decisions have been made, and ensuring that
ultimate results match the intent of the discovery
team. Quality Control (“QC”) involves testing to
see that specific technical processes were
performed as expected, regardless of what the
results show.
16. 6/20/2013
16
Processing
Overall Reporting
To meet the needs of project management;
status reporting; exception reporting; chain of
custody and defensibility it is important that
processing systems track the work performed on
all items submitted to processing.
Processing
Collected ESI must first be entered into an
appropriate software program or tool with
processing ability
Regardless of who processes the data, it is
imperative that the resulting data sets are
reviewed and that the process is validated
The processing software must provide logs of
what was accomplished and what failed during
processing.
Processing
Tools Used for Processing
PC/Server-Based
Cloud-Based
Vendor-Based
17. 6/20/2013
17
Processing
Methods for limiting volume include:
Culling to exclude particular document types
De-duplication
Elimination of system files
Application of search terms and date limitations
Processing
Culling
Processing methods must account for and remove
irrelevant data
Before data is indexed for processing, it can be
culled by the following criteria:
Remove all files of file types deemed to have not
evidentiary value
Remove documents with certain file paths
Eliminate files that fall below a size threshold
Processing
De-duplication:
The process of removing exact copies of the same
message or file from a data set, thus reducing the
number of files that need to be reviewed.
Within-custodian
Across-custodian
“Near duplicates” – slight changes to a document;
different hash values
18. 6/20/2013
18
Processing
Culling Methods
Deduplication
DeNISTing
Paths
Size
No evidentiary value
Processing
Deduplication
DeNISTing
Processing
• Budget based on assumptions from actual data
– Client should have a good idea of custodian data
– Know the data being worked with, e.g. E-mail will
have a much different volume vs.
databases/spreadsheets
– Having more time permits greater cost control, &
consistency
– Open communications and discussions with
opposition to agree on scope and methods
– Collecting all data that has been preserved may
inflate costs unnecessarily
19. 6/20/2013
19
Analysis
Evaluating ESI for content & context, including key
patterns, topics, people & discussion.
Analysis
Analysis
Fact Finding
Includes:
• Information Management
• Litigation Readiness
• Data Assessment
• Collection
21. 6/20/2013
21
Analysis
Process Analysis
Validation/Quality Assurance
Includes:
• Testing
• Documentation
Review
Evaluating ESI for relevance & privilege.
Privilege issues
Review methods
Budgeting and costs
Evaluating ESI for content & context, including
key patterns, topics, people & discussion.
Review: Privilege Issues
We review to:
Distinguish relevant from irrelevant
Protect privileged material
Attorney-client communications
Attorney work product
22. 6/20/2013
22
Review: Privilege Issues
Waiver of privilege
Clawback agreements
Agreement that inadvertent production of
privilege material will not constitute a waiver
Quick peek agreements
No effort to weed out privileged material up
front
Evidence Rule 502
Generally establishes that inadvertent
production will not result in waiver
Encourages use of protective orders including
clawback agreements
Review: Review Methods
Coding
Responsive or non-responsive
Privileged
Confidentiality
“Key” documents
Basic linear review
Concept searching
Clustering (uses linguistic, latent semantic technologies)
E.g., when searching the term “diamond,” clustering will
allow you to distinguish between “baseball” diamond and
diamond “ring.”
Predictive coding
Technology Assisted Review
…or Predictive Coding
• …or Computer Assisted
Review
…or Intelligent Review
• …or ???? Review
23. 6/20/2013
23
EDRM - CARRM
EDRM’s Computer Assisted Review Reference
Model
6/20/2013OLP - eDiscovery Certification Course
Review & Analysis:
Budgeting and Costs
Discovery costs may well be the largest budget
item, other than trial
Since few cases ever get to trial, discovery is
often the single most expensive part of any
litigation matter
Review:
Budgeting and Costs
Understand the cost drivers
Number of custodians
Volume of ESI each custodian will handle
Review of ESI
Create a budget of the estimated costs as early
as possible
All assumptions should be stated explicitly in the
budget so that variances can be noted and the
client can adjust expectations accordingly
24. 6/20/2013
24
Review:
Budgeting and Costs
The complexity of the case will have a direct
impact on the cost of e-discovery
Complexity of the coding schema (number of
tags the reviewers will be applying)
Sophistication of the privilege issues presented
by the facts of the case
Number of passes of review that are
anticipated
The most efficient way to organize a review is
with numerous decisions during a single pass
review rather than through separate review
phases of the same material
Production
Delivering ESI to others in appropriate forms &
using appropriate delivery mechanisms.
Production
Parties should agree on a form of production at
the outset of discovery, ideally at the earliest
stage of discovery.
Under Rule 34, the requesting party may specify
a format to which the producing party may
object and offer an alternative format.
Rule 34 of the FRCP states that the format must
be either the form in which it is ordinarily
maintained in the usual course of business or a
reasonably usable form.
25. 6/20/2013
25
Production
Native format
The form in which the document is maintained in the
system where it was created
Reasonably useable formats
Any imaged format of the ESI such as TIFF or PDF
Should include metadata
Production: Native Format
Documents in native format:
Have not been converted in any way from its
original form
Will appear and behave exactly as they did at the
point of creation
If produced in native form, incur no cost to convert
into another format
Contain full metadata, which often includes
privileged or sensitive information (subject, author,
date, tracking changes, etc.)
Production: Imaged Format
Documents in imaged format:
Equivalent to printing a document and creating a
static page image
Can be time-consuming, expensive to process
Can lead to loss of information useful to requesting
party, i.e. the loss of metadata
26. 6/20/2013
26
Production: Metadata
Metadata, which is a part of all types of ESI, exists
in fields that can be used to populate a load file
database created by the requesting party.
Examples of metadata fields are:
Names (author, sender, recipient, blind
recipients)
Dates (create date, sent, received, modified)
Subject (primarily for e-mail)
Document type
“Text” (searchable field containing the text or
body of the document itself) –
TIP: “Text” field needs to be removed when
redacting
Documenting Production
ESI productions should include correspondence,
production shipments, confirmation and shipping
receipts, and a tracking log showing:
What material was produced
On which type of storage media (CD, DVD, hard
drive)
How it was transmitted
Documenting Production
The production media should be subject to
quality-control checks to:
Assure completeness
Show lack of corruption
Conform with production format (as agreed upon in
the parties’ 26(f) discovery plan)
Documentation of these processes should be
kept to show timely and accurate compliance
with production requests.
27. 6/20/2013
27
Presentation
Displaying ESI before audiences (at depositions,
hearings, trials, etc.), especially in native & near-native
forms, to elicit further information, validate existing
facts or positions, or persuade an audience.
Overall Tips
Consult FRCP and local rules of pertinent jurisdiction
Stay organized and keep complete records, specifically about critical
decisions and actions during the processes
Track what was done, by whom, when & how it was done
Maintain specific routine practices across cases/projects to increase
efficiency and ensure critical steps are not missed
IT IS NOT IF PROCESSES/ACTIONS WILL BE SCRUTINIZED…
…BUT WHEN
BE PREPARED!
EDRM Umbrella
EDRM’s Computer Assisted Review
Reference Model
The EDRM Talent Task Matrix
EDRM Model Code of Conduct
Information Governance
Reference Model