TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Monitoring Global IT Systems for Scientific Insights
1. The
Grid
Observatory
Initiative
develops
a
scientific
view
of
the
dynamics
and
usage
of
globalized
IT
systems
by
monitoring
and
analyzing
the
EGI
grid.
The
overall
goal
is
to
create
a
full-‐fledged
digital
curation
process,
with
its
four
components:
preservation,
validation,
indexation
and
knowledge
building.
As
the
largest
non-‐profit
globalized
system
worldwide
and
with
demanding
scientific
users,
the
EGI
infrastructure
is
one
of
the
most
exciting
artificial
complex
systems.
With
extensive
monitoring
facilities
already
in
place,
it
offers
an
unprecedented
opportunity
to
observe
and
to
understand
the
computing
practices
within
the
e-‐Science
community.
Grid
and
cloud
share
a
common
paradigm:
they
are
globalized
at
a
large
scale.
As
such,
the
data
collected
and
the
knowledge
built
from
analyzing
EGI
concern
cloud
modeling
as
well.
Ongoing
work
integrates
monitoring
data
from
the
StratusLab
cloud.
The
Grid
Observatory
is
an
open
collaboration,
keen
to
foster
dialog
and
partnerships
with
others
in
the
relevant
areas
of
computer
science
and
engineering.
The
Laboratoire
de
Recherche
en
Informatique
and
Laboratoire
de
l’Accélérateur
Linéaire,
from
CNRS
and
University
Paris-‐Sud,
along
with
the
London
Imperial
College
operate
data
production.
The
initiative
is
supported
by
France-‐Grilles,
INRIA
and
CNRS.
A
trove
of
experimental
data:
www.grid-‐observatory.org
The
first
role
of
the
Grid
Observatory
is
to
preserve
the
monitoring
data,
normally
discarded
after
operational
usage,
and
to
make
them
available
to
the
wider
scientific
community.
Through
its
web
portal,
the
Grid
Observatory
offers
public
access
to
a
repository
of
grid
traces
to
observe
e-‐Science
practice
and
infrastructure.
• EGI
provides
an
accessible
approximation
of
the
current
and
future
requirements
of
e-‐Science
users.
• Grid
status
and
middleware
activity
are
recorded.
These
can
be
explored
for
a
wide
range
of
motivations,
from
operational
usage,
e.g.
improving
performance,
to
scientific
research,
e.g.
testing
classification
methods
for
fault
detection.
The
Grid
Observatory
follows
Tim
Berners
Lee’s
recommendation
for
Raw
Data
Now.
It
exemplifies
the
Big
Data
challenges:
semantic
organization,
provenance,
interoperability,
and
next
generation
analytics.
Emerging
technologies
such
as
Linked
Open
Data
will
be
explored
to
further
address
those
challenges.
The
Green
Computing
Observatory
The
Grid
Observatory
offers
extensive
traces
of
energy
consumption.
Because
green
IT
is
becoming
an
increasingly
urgent
need
and
also
because
there
was
no
existing
EGI
monitoring
tool,
this
action
has
its
own
name:
the
Green
Computing
Observatory.
• The
traces
integrate
motherboard-‐level
monitoring
with
information
on
computing,
networking,
storage,
and
cooling.
• Acquisition
exploits
the
de
facto
standards
IPMI
and
Ganglia.
• Integration
is
based
on
an
ontology
of
IT
system
measurements,
including
virtual
machines,
developed
by
University
Picardie
Jules
Verne.
From
applied
to
fundamental
research
Research
exploiting
the
monitoring
data
should
demonstrate
verifiable
and
positive
impact
on
production
systems.
• Beyond-‐power-‐law
and
non-‐stationary
behavior
are
pervasive.
With
sequential
testing,
segmentation
and
adaptive
on-‐line
clustering,
we
advanced
fault
detection
and
parsimonious
model
building.
• Efficient
autonomic
policies
must
combine
a
priori
knowledge
and
on-‐line
adaptation,
but
reference
interpretations
are
most
often
missing.
Data-‐driven
topic
modeling
in
the
spirit
of
text
mining,
and
heterogeneous
data
integration
with
Statistical
Relational
Learning
help
to
build
intelligible
representations.
2.
Digital
curation
The
overall
goal
of
the
Grid
Observatory
is
to
create
a
full-‐fledged
digital
curation
process,
with
its
four
components.
Establishing
and
developing
a
long-‐term
repository
of
digital
assets
for
current
and
future
references.
The
Grid
Observatory
operates
since
October
2008.
It
continuously
records
and
publishes
various
traces.
An
essential
achievement
is
to
cover
the
complete
scope
of
the
grid
middleware
and
users
activity,
beyond
particular
aspects
such
as
job
lifecycle
or
failure
events,
and
including
for
instance
logging
the
Information
System
(BDII).
Providing
digital
asset
search
and
retrieval
facilities
to
scientific
communities
through
a
gateway.
The
middleware
traces
are
currently
made
available
only
in
raw
format,
on
a
weekly
basis.
Much
remains
to
be
done
in
the
direction
of
a
more
semantic
organization.
The
Green
Computing
Observatory
data
are
organized
along
an
XML
schema
associated
with
the
measurement
ontology.
All
are
available
trough
the
Grid
Observatory
portal.
Tackling
the
good
data
creation
and
management
issues,
and
interoperability,
through
formal
ontology
building.
The
Grid
Observatory
most
often
builds
on
EGI
and
gLite
monitoring,
thus
benefits
from
their
collective
effort
of
middleware
development
and
EMI
standardization.
The
Green
Computing
Observatory
builds
on
IPMI
and
Ganglia.
Calibration
of
IPMI
measurements
is
made
possible
by
PDU
(Power
Distribution
Unit)
measurements.
The
Green
Computing
Observatory
participates
in
the
COST
action
IC0804
-‐
Energy
efficiency
in
large
scale
distributed
systems.
Adding
value
to
data
by
generating
new
sources
of
information
and
knowledge
through
semantic,
statistical
and
Machine
Learning
based
inference.
The
general
framework
for
the
Grid
Observatory
is
to
turn
it
into
a
social
intelligence
system
to
pool
scientific
and
engineering
expertise,
in
order
to
build
gradually
more
integrated
models
of
the
European
e-‐infrastructures,
and
to
define
and
validate
autonomic-‐oriented
policies
addressing
their
operational
challenges.
More
information:
• The
Green
Computing
Observatory:
a
data
curation
approach
for
green
IT.
9th
IEEE
Int.
Conf.
on
Dependable,
Autonomic
and
Secure
Computing.
• The
Grid
Observatory.
11th
IEEE/ACM
Int.
Symp.
on
Cluster,
Cloud
and
Grid
Computing.
Towards
Open
Linked
Data
***
Data
are
accessible
on
the
web
through
the
portal;
the
only
protection
implemented
is
against
malicious
usage.
All
formats
are
machine
readable
and
open:
ASCII,
XML,
SQL,
LDIF
RDF
and
Linked
RDF
are
the
next
step.
Selected
contributions
from
the
Grid
Observatory
initiative
and
its
users
Fault
detection
and
diagnosis,
smart
probing.
Distributed
Monitoring
with
Collaborative
Prediction.
12th
IEEE/ACM
Int.
Symp.
on
Cluster,
Cloud
and
Grid
Computing.
Toward
Autonomic
Grids:
Analyzing
the
Job
Flow
with
Affinity
Streaming.
15th
ACM
SIGKDD
Conf.
on
Knowledge
Discovery
and
Data
Mining.
Optimization
of
jobs
submission
on
the
EGEE
production
grid:
modeling
faults
using
workload.
Journal
of
Grid
Computing
,
8(2).
Grid
models
Characterizing
e-‐science
file
access
behavior
via
latent
Dirichlet
allocation
.
4th
IEEE/ACM
Int.
Conf.
on
Utility
and
Cloud
Computing.
Towards
non-‐stationary
Grid
models.
Journal
of
Grid
Computing,
9(4).
Autonomic
Quality
of
Service
and
Green
Computing
Multiobjective
reinforcement
learning
for
responsive
grids.
Journal
of
Grid
Computing
8:3..
Autonomic
policy
adaptation
using
decentralized
online
clustering.
7th
IEEE/ACM
int.
conf.
on
Autonomic
computing.