Most organizations hoard and fail to destroy their piles of files in a legally defensible manner when business and law allow. How do you tackle the monster problem of over-retention of electronic information? The session, Rich shows how to develop and execute the four most important steps in defensible disposition: the Defensible Disposition Policy, Assessment Plan, Technology Plan, and Disposition Plan. He’ll outline business case development and tool selection.
The Good, The Bad, and The Ugly of Defensible Disposition
1. #AIIM14
#AIIM14
#AIIM14
The
Good,
the
Bad,
and
the
Ugly
of
Defensible
Disposi7on
Richard
Medina
Co-‐founder
and
Principal
Consultant,
Doculabs
|
doculabs.com
rmedina@doculabs.com
|
richardmedinadoculabs.com
@richarddoculabs
2. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi2on
methodology
4. Analysis
and
classificaLon
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
3. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi2on
methodology
4. Analysis
and
classificaLon
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
4. #AIIM14
The
Problem
is
Over-‐Reten7on
OrganizaLons
have
been
over-‐retaining
electronic
informaLon
and
failing
to
dispose
of
it
in
a
legally
defensible
manner
when
business
and
law
will
allow
Retaining
everything
forever
Disposing
of
everything
immediately
Having
employees
make
classificaLon
decisions
Having
technology
make
classificaLon
decisions
Hybrid
with
technology
and
people
5. #AIIM14
Why
Over-‐Reten7on
is
the
Problem
§ Organiza2ons
keep
non-‐required
electronic
content
forever
because:
1. Classifying
content
(to
determine
what
to
keep
and
what
to
purge)
is
manual
and
expensive
2. Content
worth
preserving
is
mixed
with
content
that
should
be
purged
3. Legal
-‐-‐
and
others
-‐-‐
are
afraid
of
wrongfully
deleLng
materials
(spoliaLon)
4. AddiLonal
storage
is
inexpensive,
which
makes
it
easy
for
corporaLons
to
buy
more
storage
and
defer
addressing
the
problem
6. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi2on
methodology
4. Analysis
and
classificaLon
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
7. #AIIM14
Recommenda7ons
for
Day-‐forward
§ Addressing
day-‐forward
informa7on
lifecycle
management
(ILM)
is
much
easier
to
address
than
historical
content
§ Even
though
addressing
it
messes
with
employees’
day-‐to-‐day
business
acLviLes
§ Day-‐forward:
Ini2ate
ILM
prac2ces
on
a
“day-‐forward”
basis
first,
so
any
new
content
created
or
saved
is
assigned
a
disposi2on
period
§ DisposiLon
horizons
should
begin
to
influence
behavior
on
where
content
begins
to
be
stored
(as
users
discover
that
those
materials
saved
in
the
“wrong”
system
will
be
purged)
§ Guidance:
Provide
employees
with
explicit
guidance
for
the
acceptable
use
of
available
tools
for
dynamic
content
and
their
associated
reten2on
periods
§ For
example,
retain
non-‐records
for
3
years,
retain
official
records
per
the
retenLon
schedule
§ Historical:
For
historical
content,
analyze
the
feasibility
of
content
analy2cs
and
autoclassifica2on
§ Recognize
that
cleaning
up
TBs
of
content
can
take
years.
So
conduct
the
analysis
in
2014,
begin
the
cleanup
effort
in
earnest
by
2015,
and
eliminate
a
large
porLon
of
dated
content
by
2016
8. #AIIM14
Guidance
Example
for
Day-‐
forward
System/Repository
Recommended
Reten7on
Period
Personal
Network
Drives
(“P”
drives)
• Provide
each
user
with
personal
drive
space
of
a
limited
size
for
their
storage,
for
as
long
as
the
user
is
employed
Shared
Network
Drives
(“G”
drives)
• Make
them
read
only
(which
means
no
network
storage
for
collabora7on;
content
will
have
to
go
into
an
ECM
system)
• Excep7ons
include
applica7on
or
systems
that
need
to
use
network
storage
ECM
System
1. Default
for
non
records:
retained
for
3
years
2. Default
for
non
records
that
have
long-‐term
value:
retained
for
7
years
3. Official
records:
retained
per
the
reten7on
schedule
Social
Community
Sites
• No
documents
stored
in
communi7es
(only
links
to
documents
in
the
ECM
system)
• Consider
reten7on
periods
for
non-‐document
content
(e.g.
3
years)
9. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi;on
methodology
4. Analysis
and
classificaLon
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
10. #AIIM14
What’s
the
Purpose
of
Your
DD
Methodology?
§ You
must
sa7sfy
4
demands:
1. Regulatory
retenLon
requirements
2. Hold
retenLon
requirements
3. Business
retenLon
requirements
4. Cost
impact
of
anything
you
do
§ What
you
do
has
impact:
1. What
you
do
2. Effects
of
what
you
do
§ You
can
do
2
things:
1. Sort
2. Dispose
§ Your
mission
stated
two
ways:
§ Your
mission
is
to
saLsfy
your
retenLon
demands
(1-‐3)
while
minimizing
bad
cost
impact
to
yourself
(4)
§ Your
mission
is
to
maximize
good
cost
impact
(4)
while
saLsfying
your
retenLon
requirements
(1-‐3)
11. #AIIM14
It’s
Based
on
Reasonableness
§ To
determine
what
“sa2sfy
your
reten2on
demands”
really
means
for
you,
use
the
Principle
of
Reasonableness
and
act
In
Good
Faith
§ Courts
do
not
ask,
expect
or
necessarily
reward
organizaLons
for
perfecLon.
Courts
do
expect,
however,
that
whatever
informaLon
management
tacLcs
an
organizaLon
undertakes
are
appropriate
to
how
that
parLcular
enLty
is
situated
(size,
financial
resources,
regulatory
and
liLgaLon
profile,
etc.).
(Jim
McGann
and
Julie
Colgan,
“Implement
a
defensible
dele2on
strategy
to
manage
risk
and
control
costs”,
Inside
Counsel)
12. #AIIM14
Your
DD
Methodology
Has
4
Parts
1. Defensible
Disposi7on
Policy
§ It’s
your
design
specificaLon,
your
business
rules
for
DD,
your
decision
tree
§ Specifies
very
clearly
the
objecLves
that
your
methodology
will
fulfill.
It
states
clearly
what
you
mean
by
your
retenLon
requirements
and
what
you
mean
by
reasonable
costs
when
you
are
trying
to
fulfill
your
retenLon
requirements.
2. Technology
Approach
§ For
SorLng
and
Disposing
§ You
must
use
technology
–
it’s
not
an
opLon
13. #AIIM14
Your
DD
Methodology
Has
4
Parts
3. Assessment
(Sor7ng)
Plan
§ Do
the
legwork
and
look
at
what’s
there
§ What
informaLon
and
systems
you’re
assessing
§ Your
processing
rules
(decision
plan)
§ It
will
be
flexible
4. Disposi7on
Plan
§ Evaluate
your
assessment
results
using
your
DD
Policy
§ Dispose
(which
ranges
from
keeping
forever
to
deleLng
right
now
with
many
opLons
in
between)
§ Refine
your
DD
Policy
(1)
and
conLnue
as
needed
14. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi2on
methodology
4. Analysis
and
classifica7on
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
15. #AIIM14
There’s
an
Awesome
Business
Case
Classifica7on
Technique
Classifica7on
Rate
Pricing
Total
Cost
to
Classify
Manual
ClassificaLon
10
seconds
per
document
$35
/
hr.
$20
million
Auto
ClassificaLon
(with
95%
machine
and
5%
human
classified,
via
offshore
labor)
Less
than
1
second
per
document
$.005
per
document
for
machine
processing
and
$5
/
hr.
for
those
that
require
manual
classificaLon
$2
million
§ …
if
the
technology
works
§ 50
TB
=
~200
million
documents
(average
of
250KB
per
document)
§ The
following
table
illustrates
the
Lme
and
effort
required
to
classify
200
million
documents
16. #AIIM14
Analysis
and
Classifica7on
Technologies
§ Many
different
kinds
of
technology
vendors
are
addressing
analysis,
classificaLon,
and
disposiLon
§ File
AnalyLcs,
Content
AnalyLcs,
Content
ClassificaLon,
ECM,
E-‐discovery,
Search,
Capture,
DLP,
Storage
Management
§ Products,
hosted
soluLons,
service
providers
§ IBM/Stored
IQ,
HP/Autonomy,
EMC
Kazeon,
SAS,
Kofax,
Equivio,
RaLonal
RetenLon,
Recommind,
Index
Engines,
and
others
§ Most
have
a
sweet
spot
where
they
will
succeed
§ But
it’s
highly
dependent….
on
just
about
every
factor
you
can
think
of
§ E.g.,
your
business
purposes,
your
ECM
environment,
your
“informaLon
architecture”,
your
document
types
and
their
complexity
and
volume,
the
value
and
risk
of
the
documents,
your
success
criteria,
etc.,
etc.,
etc.
17. #AIIM14
Sidebar:
How
Many
of
them
Work
Before
Acer
<server
XXX,
drive
G:>
Forecast
summary_121008.doc
Record
=
no
Age
=
2.5
years
Document
type=
departmental
forecast
Keywords
=
forecast,
2008,
drav
Status
=
delete
Confidence
=
9.2
(out
of
10)
1. Analyze
the
content
and
review
the
retenLon
schedule
2. Establish
classificaLon
rules
and
train
the
systems
with
examples
3. Crawlers
and
recogniLon
engines
evaluate
the
content
and
generate
a
classificaLon
4. For
content
where
a
high
machine
confidence
factor
exists,
content
is
automaLcally
tagged
and
then
staged
for
migraLon
to
the
appropriate
system
or
disposiLon
5. For
content
with
low
confidence
factors,
documents
are
routed
to
clerical
staff
(onshore
or
offshore)
for
manual
classificaLon
6. The
results
of
the
manual
idenLficaLon
are
fed
back
into
the
automated
algorithms
to
“teach”
the
systems
bewer
classificaLon
Throughout
the
process,
results
and
samples
are
routed
to
records
management
and
legal
professionals
within
the
firm
for
validaLon
and
confirmaLon
1
2
3
4
5
6
Client
Valida7on
18. #AIIM14
Issues
1. The
problem
§ The
sky
is
falling
again
2. Break
it
into
two
problems
§ Day-‐forward
versus
historical
content
3. How
to
address
historical
content
§ A
defensible
disposi2on
methodology
4. Analysis
and
classificaLon
technology
§ Should
you
use
it?
Does
it
work?
5. Doing
the
Assessment
§ Approaches
and
results
19. #AIIM14
Assessment
Approaches
§ There
are
three
categories
of
awributes
that
can
be
used
to
determine
what
a
file
is:
1. Environmental
awributes
around
the
file
(e.g.,
file
locaLon,
ownership)
2. File
awributes
about
the
file
(e.g.,
file
type,
age,
author)
3. Content
awributes
within
the
file
(e.g.,
keywords,
character
strings,
word
proximity,
word
density)
§ Various
techniques
and
technologies,
along
with
business
rules,
can
be
used
to
determine
what
a
file
is,
and
whether
it
is
eligible
for
disposiLon
§ E.g.,
a
DOC
file
created
over
5
years
ago
and
not
accessed
for
a
year
may
be
purged
§ This
type
of
purging
could
be
done
aver
giving
users
adequate
noLce
(“move
it
or
lose
it”
or
“hold”
for
90
days,
then
delete)
20. #AIIM14
#1:
Environmental
Ahributes
Ahribute
Evalua7on
Technique
Tool(s)
Used
Examples
How
Used
Ownership
Access
Controls
Content
Analy7cs,
Data
Loss
Preven7on,
Storage
Management
Permissions
within
LDAP
list
people
and
infer
department
or
func7on
Large
collec7ons
of
files
can
be
assessed
en
masse
based
on
access
controls
1
Loca7on
File
Path
Content
Analy7cs,
Data
Loss
Preven7on,
Storage
Management
G:/accoun7ng/july2004/temp
Stranded
and
orphaned
loca7ons
are
ocen
easily
eliminated
2
Environmental
Ahributes
(around
a
file)
21. #AIIM14
#2:
File
Ahributes
Duplicate
Hash
Algorithm
Content
AnalyLcs
Exact
duplicates
Exact
duplicates
can
be
easily
eliminated
3
File
Type
Extension
or
MIME
type
Content
AnalyLcs
.TMP,
.MP3
To
idenLfy
file
types
that
should
not
exist
in
a
corporate
seyng
4
Block
Read
Content
AnalyLcs
Near
duplicates
Near
duplicates
must
be
assessed
in
the
context
of
other
awributes
Metadata
ProperLes
Content
AnalyLcs
Age
To
determine
old
materials,
materials
authored
by
individuals
that
have
lev
the
organizaLon
5
Content
AnalyLcs
Author
Typically,
these
awributed
must
be
conLnued
with
other
awributed
via
a
rule
to
take
acLon
Content
AnalyLcs
Security
Profile
(ConfidenLal)
User
filename
properLes
to
determine
type
File
Name
Character
Strings
Content
AnalyLcs
GL-‐USDIST31_093098.xls
Determine
whether
a
file
was
system
generated
vs.
human
generated
6
Content
AnalyLcs
FORMUB92_SMITH
Documents
that
are
based
on
a
specific
form
number
can
easily
be
idenLfied
Ahribute
Evalua7on
Technique
Tool(s)
Used
Examples
How
Used
File
Ahributes
(about
a
file)
22. #AIIM14
#3:
Content
Ahributes
Key
Word
Character
Strings
Content
AnalyLcs;
ClassificaLon
Module
“Enron”,
“Guarantee”
To
determine
if
a
document
is
on
Hold
via
a
word
list
per
the
hold
request
7
Character
or
Word
Paherns
“ClassificaLon”
<pawern
matching>
ClassificaLon
Module
Word
proximity
To
determine
the
category
in
which
a
document
may
fit
8
ClassificaLon
Module
Word
frequency
Content
AnalyLcs;
ClassificaLon
Module
“Privileged”
IdenLficaLon
of
PII
Content
AnalyLcs;
DLP
SS#,
Credit
card
#
Regular
Expression(RegEX)
lists;
determined
enLLes
for
hold,
security,
IP,
PHI,
PII,
DLP
Ahribute
Evalua7on
Technique
Tool(s)
Used
Examples
How
Used
Content
Ahributes
(within
a
file)
23. #AIIM14
Assessment
Results
Preserva7on
Findings
Unnecessary
File
Types
(Executables,
non-‐business
pictures,
movies,
etc.)
13
to
15%
Duplicates
15
to
20%
Near
Duplicates
9
to
30%
Risk
Findings
Files
with
PII
10
to
16%
Files
with
Sample
Keywords
3
to
5%
Opera7onal
Findings
Files
10
years
or
older
7
to
11%
Files
accessed
within
the
last
18
months
25
to
35%
Findings
not
mutually
exclusive
(
i.e.,
a
duplicate
file
could
also
be
aged)
24. #AIIM14
Assessment
Summary
Findings
Enterprise
Impact
Total
that
could
be
disposed
20%
of
2.5
PB
Enterprise
ImplicaLons
.5
PB
removed
@
$5,000,000
per
PB
Savings
$2,500,000
per
year
in
storage
expense
Technique
Status
%
of
Total
Total
AnalyLcs
Unnecessary
20%
500
TB
(.5
PB)
ClassificaLon
Record
8%
200
TB
(.2
PB)
Non-‐Record,
Business
Reference
28%
700
TB
(.7
PB)
Evaluated,
Staged
for
DisposiLon
(2016)
44%
1,100
TB
(1.1PB)
Total
100%
2,500
TB
(2.5
PB)
25. #AIIM14
Assessment
Implica7ons
§ Given
the
results,
$2.5
million
in
storage
expense
could
be
saved
annually
on
the
disposiLon
of
historic
content,
resulLng
in
$12.5
million
over
5
years
§ Going
forward
with
newly
created
content,
if
similar
techniques
are
applied,
the
saving
grows
to
$34.8
million
over
5
years
§ The
current
cost
projecLons
are
based
on
the
historical
content
growth
rate
of
30%
per
year
§ The
expected
cost
projecLons
are
based
on
a
content
growth
rate
of
26%
per
year
@$5,000,000
per
PB
2012
2013
2014
2015
2016*
Total
Current
Storage
(PB)
2.5
3.25
4.23
5.49
7.14
Current
Cost
(Mill)
$12.5
$16.3
$21.1
$27.5
$35.7
$113.0
Expected
Storage
(PB)
2
2.52
3.18
4.00
3.94
Expected
Cost
(Mill)
$10
$12.6
$15.9
$20.0
$19.7
$78.2
Total
Savings
(Mill)
$2.5
$3.65
$5.25
$7.46
$16.00
$34.8
*In
2016,
the
1.1
PB
or
44%
of
content
from
the
2012
historical
content
assessment
can
be
disposed
26. #AIIM14
Conclusions
1. The
business
case
for
disposiLon
is
strong
§ Costs,
risks,
and
benefits
2. InformaLon
governance
must
be
addressed
in
phases
§ StarLng
today,
the
program
will
take
years
to
mature
§ Set
expectaLons
according
3. You
should
probably
address
day-‐forward
ILM
before
tackling
historical
content
4. Recognize
that
manual
classificaLon
is
not
an
opLon
5. The
technologies
are
immature
and
varied,
but
you
can
be
successful
by
matching
the
techniques
and
technologies
to
the
kinds
of
files
you
want
to
target
6. Your
DD
methodology
has
4
main
parts:
DD
Policy,
Technology
Approach,
Assessment
Plan,
Disposi2on
Plan
27. #AIIM14
#AIIM14
#AIIM14
Thank
You
Richard
Medina
Co-‐founder
and
Principal
Consultant,
Doculabs
|
doculabs.com
rmedina@doculabs.com
|
richardmedinadoculabs.com
@richarddoculabs
28. www.aiim.org/infochaos
Do
YOU
understand
the
business
challenge
of
the
next
10
years?
This
ebook
from
AIIM
President
John
Mancini
explains.