2. The problem with big data: and a solution
The problem:
“New reference architectures will include both big data and enterprise
data warehouses”
[IDC, 19 January 2012]
Two worlds: structured and unstructured data (plus external data
sources, documents stored in structured databases, etc.)
Siloes create issues with management, integration, etc.
The solution:
Linked data – a single reference point for all data in the enterprise
#CloudCamp 1 UNCLASSIFIED
3. Some history
Fixed structure
Difficult to change schema
Simple reporting capabilities
Complex to create new reports
#CloudCamp 2 UNCLASSIFIED
4. Some history
Completed
transactions
transferred to separate
database for analysis
“Data warehouse”
Better reporting, data
mining, etc.
Still highly structured
Data is historical
May be aggregated
#CloudCamp 3 UNCLASSIFIED
5. The smart guys
Real-time update of completed
transactions
Transactions moved to data warehouse
upon completion
Smaller transactional database
Allows for alerts to be generated when
specific conditions met and action
taken
#CloudCamp 4 UNCLASSIFIED
6. A third “data silo”
Masses of unstructured/semi-
structured data being processed in
NoSQL databases
May, or may not be transferred
to/from structured databases
Time-consuming and inefficient
Three types of data, each with their
own limitations and own
management considerations
#CloudCamp 5 UNCLASSIFIED
8. Linked Data
Tie records together – even from separate data sets
We can express as triples with a specific grammar:
Build up a graph to show machine-readable data in human
form
#CloudCamp 7 UNCLASSIFIED
9. Then add lots more data…
Source: http://lod-cloud.net/
Each node is itself another graph (zoom in)
#CloudCamp 8 UNCLASSIFIED
10. Aren’t we missing a trick?
Use linked data as a the
optimal reference source
Broker of all data sources
Single view on structured and
unstructured data
Bring in external sources too
Mapping, interconnecting,
indexing and feeding
In real time
Query linked data to derive
new value from old
Infer relationships
Gain new insights
#CloudCamp 9 UNCLASSIFIED
11.
12. About the author
Mark Wilson, Strategy Manager, Fujitsu
Mark is an analyst working within Fujitsu’s UK and
Ireland Office of the CTO, providing thought
leadership both internally and to customers,
shaping business and technology strategy. He has
17 years' experience of working in the IT industry,
12 of which have been with Fujitsu. Mark has a
background in leading large IT infrastructure
projects with customers in the UK, mainland
Europe and Australia. He has a degree in
Computer Studies from the University of
Glamorgan. Mark is also active in social media and
won the Individual IT Professional (Male) award in
the 2010 Computer Weekly IT Blog Awards. Mark
may be found on Twitter @markwilsonit.
If you would like to comment on the topics in this
presentation, Mark would welcome your feedback,
by email to mark.a.wilson@uk.fujitsu.com.
Notas do Editor
Everyone’s talking about big data but the bulk of the conversation seems to focus on a new level of business intelligence and an ever-increasing volume of data organised into OLTP, OLAP and NoSQLsiloes. In this talk, Mark Wilson puts forward a view that the real value is not from the big data itself but how we can employ linked data concepts to integrate structured, unstructured and semistructured data sets – and then use this unified data source to derive new value.