SlideShare a Scribd company logo
1 of 35
Architecture Design Series
Mich Talebzadeh
London August 17, 2016
Road Map for Careers in Big Data
Author
About the Author:
Mich Talebzadeh
Big Data and RDBMS Senior Technical Architect
E-mail: mich@Peridale.co.uk
https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2?
https://talebzadehmich.wordpress.com/
2
All rights reserved.
No part of this publication may be reproduced in any form, or by any means, without the prior written permission of the copyright holder.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data
The Big Picture
Get familiar with Big Data in General
The buzzwords and the emerging new products expanding at
amazing pace
Gain an understanding of Big Data space
Relate your existing (often Relational knowledge) to Big Data
Build upon your existing knowledge and experience
Discover where Big Data fits
Gain an insight into the Hadoop core
Make it easier to move to Big Data if you wish
3© 2016 Mich Talebzadeh Roadmap for Careers in Big Data
Trends
Back in 2007, I responded to an article “The Oracle DBMS is
a ‘legacy technology.’” in techtarget.com.
I stated that the argument is not so much that relational
database management systems “should be considered
legacy technology” since they are more than a quarter of
century old, but more to do with the philosophy and the
design approach that they were created for
Simply put today’s systems and users deal much more with
non-transactional (read) activity than with transactional
(write/update) activity.
Does that mean that RDBMSs are legacy systems? I don’t
think so. All it means is that there is a limit that one can take
the relational model so far.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 4
Trends
So what is the solution in a business world that is
growing increasingly distributed and has to deal with the
increased use of heterogeneous systems and proprietary
databases across different levels of business?
Need to deploy tools for the purpose and relying on one
engine philosophy (say RDBMS) solution is no longer
viable.
Move from internal company data to information from
multiple sources
Deal with both transactional and non-transactional data
RDBMS landscape is changing. The newer versions are
adapting themselves for the New landscapes
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 5
Trends
Shift from Data at rest to Data in motion
The World is saving more data than before
The ability to collect and analyze such large amount of
data has given rise to Big Data Technology
Big Data is a trendy term these days and if you don't
know about it and you are in I.T. then you are missing the
buzz.
As we have tools to collect and analyze transactional
data, we need tools to collect and analyze vast quantity
of data
If time is the essence in transactional world, then equally
it is becoming an essence dealing with vast sums of data
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 6
Big Data Meaning and Terminology
Big Data is a term for data sets that are so large
or complex that traditional data processing
applications are inadequate.
The challenges include analysis, capture, data
curation, search, sharing, storage, transfer,
visualization and querying.
The term often refers to the use of predictive
analytics, user behaviour analytics (sentiment
analysis), or certain other advanced data analytics
methods that extract value from the vast amount of
data
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 7
Big Data Meaning and Terminology
Big Data is not only about the volume.
Much of the data is received in real
time and is most valuable at the time of
arrival.
Examples:
− Detect share prices as soon as
possible
− a service operator wants to
detect failures from logs within a
seconds
− news site may want to train their
model to show users content
which they are interested in.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 8
Big Data Benefits
Big Data can lead to more confident decision
making, and better decisions can result in greater
operational efficiency, cost reduction and reduced
risk.
Analysis of this type of data sets can find new
correlations to "spot business trends, prevent
diseases, combat crime and so on
Some common sources of Big Data are shown
below
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 9
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 10
What sort of Data are we dealing with
The current estimate is that around 20-30% of available
data is stored in structured format and the rest are
unstructured.
Unstructured refers to piece of data that either does not
have a pre-defined data model or is not organized in a
pre-defined manner.
Big Data deployment and usage:
− Primarily built on open source
− Distributed data - data physically distributed across a
network using commodity hardware aka Cluster
− Business Intelligence - structured queries
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 11
What sort of Data are we dealing with
Big Data deployment and usage:
− Analytics and data mining - for data from multiple sources
and of variable types
− Cloud Computing - access to large pools of computing
power available as needed
− Networked devices
− Data from Sensors - more sensors producing more data
more frequently
− The Internet of Things - machine-generated data read and
used by other machines
− Deploying both relational and NoSQL (not only SQL) type
storage
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 12
What sort of Data are we dealing with
Big Data Fundamentals: Hadoop core Consists of
− Hadoop Distributed File System (HDFS) on a cluster
− Map-reduce - The default engine to search and retrieve data
across HDFS cluster
− Yarn – General purpose Resource Manager to manage
resources in the cluster
Data Warehouses
− Hive - is the most versatile and capable of the many SQL or
SQL-like ways of accessing data on Hadoop
− Pig – Similar to Hive but supports procedure type language
NoSQL Databases - In general, a NoSQL database relaxes the
rigid requirements of an RDBMS (meaning ACID properties), so
one can implement new usage models that require lower latency
and higher levels of scalability. NoSQL databases come in four
major types:
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 13
Big Data Databases
Key/Value Database - is the simplest type of database and
can store any kind of digital object. Each object is assigned a
key and is retrieved using the same key. However, analytic
capabilities are limited. Examples of key/value databases
include Redis, Riak and Oracle NoSQL Database
- There is no schema for the value returned by the key
It is a heap of data
- Query only by key, cannot query based on the rest of
content
- All queries return the value as a lump of data.
- There is no way to just return some of the value!
Use Cases:
− Session information in web applications
− The shopping cart for an online buyer
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 14
NoSQL Database types
Document Stores- These are a type of key/value store
in which documents are stored in recognizable formats
and are accompanied by metadata. Because of the
consistent formats and the metadata, you can perform
search and analysis without first retrieving the
documents.
Examples of document stores include MongoDB,
MarkLogic and Couchbase.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 15
No SQL Database types
Brief walk-through
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 16
The Relational vs. Document-Oriented Data Model
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 17
The Relational vs. Document-Oriented Data Model
In the relational model, records are normalised across multiple tables,
using primary key and foreign key constraints.
The upside of normalisation is that there should be no duplication of data
in the database.
The downside is that a change in a single record can mean holding locks
on a number of records on different tables as defined by the transaction to
ensure a change does not leave the database in an inconsistent state.
This complex web of interrelationships is what makes it so difficult to
distribute relational data across multiple servers and can lead to
performance challenges both reading and writing data.
Big data is all about scaling out (horizontally). Granted this may result in
duplication of data. However, using more storage in exchange for better
application performance and the ability to easily distribute workloads
across machines is now a better choice for many applications.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 18
The Relational vs. Document-Oriented Data Model
What is a Document Data Model?
Use of the term “document” is a bit confusing. A
document-oriented database really has nothing to do
with “documents” in the classical sense of the word.
It does not mean books, letters or articles. Rather, a
document in this case refers to a data record that is
self-describing as to the data elements it contains.
XML documents, HTML documents and JSON
documents are examples of “documents” in this
context. An example of a record will look like below
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 19
The Relational vs. Document-Oriented Data Model
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 20
The Relational vs. Document-Oriented Data Model
As can be seen, the data is de-normalized. Each record contains a
complete set of information related to the error without external
reference. The records are self-contained. This makes it very easy
to move the entire record to another server – all the information
simply comes along with it. There is no concern about having parts
of the record in other tables left behind.
Only the self-contained record (document) needs to be updated
when changes are made. In the NoSQL world the document ID is
the one and only key to a document.
Document ID is roughly equivalent to a primary key in a
relational database. Usually an ID can only appear once in a
database. Different NoSQL solutions have different names for
these: buckets, collections, tables, etc. A Collection is equivalent
to a Table and a Document to a Row
Use Cases:
− Enterprise data
− Event logging applications
− Content management systems
− Web analytics and real-time analytics
− E-Commerce applications
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 21
The Relational vs. Document-Oriented Data Model
Columnar Databases – These provide varying degrees of row
and column structure. Each row in the column family has a unique
key plus as many columns needed to hold relevant information.
Unlike relational columnar databases like Teradata or SAP Sybase
IQ, there is no requirement for every row to have the same
columns. You can use a columnar database to perform more
advanced queries on big data. Examples of columnar databases
include Hbase, Cassandra and Accumulo.
Use Cases:
− Distribution of data across multiple nodes
− Count and categorize data; sorting, counting for a time frame,
example Web Analytics
− Event logging applications
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 22
No SQL Database types
Graph Databases – These store networks of objects that are linked using relationship
attributes. For example, the objects could be people in a social network who are related as
friends, colleagues or strangers. You can use a graph database to map and quickly analyze
very complex networks. Examples of graph databases include Allegro and Neo4J.
Use Cases
− Who is associated with who
− Fraud detection
− Criminal or Terrorist Cell identification
− Profile analysis in Social media
− Routing and dispatch systems
(public domain image hosted by wikipedia)
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 23
No SQL Database types
Mich’s Journey:
It takes a while to unlearn habits. But do not worry.
You need to adjust the way you think about data and
modelling. By understanding alternatives you will be
able to make more efficient use of your existing
knowledge as well. After all, the tool best suited for
the job will leave you with the least headache. If you
know more tools, you can choose better!
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 24
Advice
As Big Data becomes bigger, companies will increasingly rely
on Hadoop and the leading NoSQL databases.
You are not going to see NoSQL databases unseating
RDBMSs for established workloads so much as taking control
of an increasing share of new workloads.
Big Data in some form and shape is gradually creeping into
organisations.
Some NoSQL shape
Whether you are a developer, architect, analyst or Database
Administrator, you are going to come across it and get
involved in it.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 25
The rise of NoSQL databases
As Big Data databases find a space, it is becoming
important to have the tools to deal with these
distributed databases.
Most of these tools can be used against RDBMS as
well.
The Fibre optics and Gigabit networks have resolved
the latency issues by and large.
I will be talking about one star tool in Big Data
landscape called Apache Spark
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 26
The Rise of tools
Apache Spark is a fast and general purpose engine for large-
scale data processing.
It is written mostly in Scala, and provides APIs for Scala,
Java, Python and R.
It is fully compatible with Hadoop Distributed File System
(HDFS) and can run on Cloud
It extends on Hadoop’s core functionality by providing in-
memory cluster computation among other things.
It has its own Standalone cluster manager and can work with
Yarn resource manager as well.
It is heavily used in Finance (VAR), Health care, Fraud
Detection, streaming data,
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 27
Apache Spark
Spark Combines SQL, streaming, and complex analytics.
Spark powers a stack of libraries including SQL and DataFrames,
MLlib for machine learning, GraphX, and Spark Streaming. You can
combine these libraries seamlessly in the same application.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 28
Apache Spark
Spark supports similar dialect of ANSI-SQL including Analytics
functions.
You can also write the same code using Scala and functional
programming.
This code tells me what I spent in the past six year through my
debit card in my bank (downloaded bank transactions in a CSV
format), for each hashtag (Sainsbury, John Lewis, Waitrose etc.)
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 29
Apache Spark
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 30
Apache Spark
I have used Scala and functional programming in Spark-shell to
produce the same results below:
SQL Programming, Hadoop, Linux,
Scala, Java, Python and R are the most
in-demand skills in positions that mention
big data as a requirement.
In 2015 and 2016 we are seeing an
exponential rise in demand for Spark
with Scala and R skills used in functional
programming and Machine Learning.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 31
Big Data Skills in Demand
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 32
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 33
Tons of information in the Internet
Big Data University
http://bigdatauniversity.com
My articles in LinkedIn
https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2?
Apache Software Foundation
http://www.apache.org
Other available sources
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data 34
Further Information
Author
About the Author:
Mich Talebzadeh
Big Data and RDBMS Senior Technical Architect
E-mail: mich@Peridale.co.uk
https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2?
https://talebzadehmich.wordpress.com/
35
All rights reserved.
No part of this publication may be reproduced in any form, or by any means, without the prior written permission of the copyright holder.
© 2016 Mich Talebzadeh Roadmap for Careers in Big Data

More Related Content

What's hot

Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & ChallengesRupen Momaya
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional careerDavid Rostcheck
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data scienceTyrone Systems
 
The current challenges and opportunities of big data and analytics in emergen...
The current challenges and opportunities of big data and analytics in emergen...The current challenges and opportunities of big data and analytics in emergen...
The current challenges and opportunities of big data and analytics in emergen...IBM Analytics
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries ciaKevin Pledge
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Kevin Pledge
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongThe Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongDATAVERSITY
 
Big Data in Education Sector
Big Data in Education SectorBig Data in Education Sector
Big Data in Education SectorKaran Sachdeva
 
Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureNadia Smith
 

What's hot (20)

Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Data science Big Data
Data science Big DataData science Big Data
Data science Big Data
 
Data science
Data scienceData science
Data science
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Data science as a professional career
Data science as a professional careerData science as a professional career
Data science as a professional career
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 
The current challenges and opportunities of big data and analytics in emergen...
The current challenges and opportunities of big data and analytics in emergen...The current challenges and opportunities of big data and analytics in emergen...
The current challenges and opportunities of big data and analytics in emergen...
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data WrongThe Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
The Heart of Data Modeling: 7 Ways Your Agile Project is Managing Data Wrong
 
Big Data in Education Sector
Big Data in Education SectorBig Data in Education Sector
Big Data in Education Sector
 
Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit Brochure
 

Similar to Road Map for Careers in Big Data

Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
TDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWTDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWJeannette Browning
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...Neil Raden
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Making Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsMaking Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsRackspace
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000Kartik Padmanabhan
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trendsAlan Morrison
 

Similar to Road Map for Careers in Big Data (20)

Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
MongoDB
MongoDBMongoDB
MongoDB
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
TDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DWTDWI checklist - Evolving to Modern DW
TDWI checklist - Evolving to Modern DW
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Making Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsMaking Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High Expectations
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Big Data
Big DataBig Data
Big Data
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Big Idea For Big Data
Big Idea For Big DataBig Idea For Big Data
Big Idea For Big Data
 
Big data
Big dataBig data
Big data
 
Data centric business and knowledge graph trends
Data centric business and knowledge graph trendsData centric business and knowledge graph trends
Data centric business and knowledge graph trends
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

Road Map for Careers in Big Data

  • 1. Architecture Design Series Mich Talebzadeh London August 17, 2016 Road Map for Careers in Big Data
  • 2. Author About the Author: Mich Talebzadeh Big Data and RDBMS Senior Technical Architect E-mail: mich@Peridale.co.uk https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2? https://talebzadehmich.wordpress.com/ 2 All rights reserved. No part of this publication may be reproduced in any form, or by any means, without the prior written permission of the copyright holder. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data
  • 3. The Big Picture Get familiar with Big Data in General The buzzwords and the emerging new products expanding at amazing pace Gain an understanding of Big Data space Relate your existing (often Relational knowledge) to Big Data Build upon your existing knowledge and experience Discover where Big Data fits Gain an insight into the Hadoop core Make it easier to move to Big Data if you wish 3© 2016 Mich Talebzadeh Roadmap for Careers in Big Data
  • 4. Trends Back in 2007, I responded to an article “The Oracle DBMS is a ‘legacy technology.’” in techtarget.com. I stated that the argument is not so much that relational database management systems “should be considered legacy technology” since they are more than a quarter of century old, but more to do with the philosophy and the design approach that they were created for Simply put today’s systems and users deal much more with non-transactional (read) activity than with transactional (write/update) activity. Does that mean that RDBMSs are legacy systems? I don’t think so. All it means is that there is a limit that one can take the relational model so far. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 4
  • 5. Trends So what is the solution in a business world that is growing increasingly distributed and has to deal with the increased use of heterogeneous systems and proprietary databases across different levels of business? Need to deploy tools for the purpose and relying on one engine philosophy (say RDBMS) solution is no longer viable. Move from internal company data to information from multiple sources Deal with both transactional and non-transactional data RDBMS landscape is changing. The newer versions are adapting themselves for the New landscapes © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 5
  • 6. Trends Shift from Data at rest to Data in motion The World is saving more data than before The ability to collect and analyze such large amount of data has given rise to Big Data Technology Big Data is a trendy term these days and if you don't know about it and you are in I.T. then you are missing the buzz. As we have tools to collect and analyze transactional data, we need tools to collect and analyze vast quantity of data If time is the essence in transactional world, then equally it is becoming an essence dealing with vast sums of data © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 6
  • 7. Big Data Meaning and Terminology Big Data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. The challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization and querying. The term often refers to the use of predictive analytics, user behaviour analytics (sentiment analysis), or certain other advanced data analytics methods that extract value from the vast amount of data © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 7
  • 8. Big Data Meaning and Terminology Big Data is not only about the volume. Much of the data is received in real time and is most valuable at the time of arrival. Examples: − Detect share prices as soon as possible − a service operator wants to detect failures from logs within a seconds − news site may want to train their model to show users content which they are interested in. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 8
  • 9. Big Data Benefits Big Data can lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk. Analysis of this type of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on Some common sources of Big Data are shown below © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 9
  • 10. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 10 What sort of Data are we dealing with
  • 11. The current estimate is that around 20-30% of available data is stored in structured format and the rest are unstructured. Unstructured refers to piece of data that either does not have a pre-defined data model or is not organized in a pre-defined manner. Big Data deployment and usage: − Primarily built on open source − Distributed data - data physically distributed across a network using commodity hardware aka Cluster − Business Intelligence - structured queries © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 11 What sort of Data are we dealing with
  • 12. Big Data deployment and usage: − Analytics and data mining - for data from multiple sources and of variable types − Cloud Computing - access to large pools of computing power available as needed − Networked devices − Data from Sensors - more sensors producing more data more frequently − The Internet of Things - machine-generated data read and used by other machines − Deploying both relational and NoSQL (not only SQL) type storage © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 12 What sort of Data are we dealing with
  • 13. Big Data Fundamentals: Hadoop core Consists of − Hadoop Distributed File System (HDFS) on a cluster − Map-reduce - The default engine to search and retrieve data across HDFS cluster − Yarn – General purpose Resource Manager to manage resources in the cluster Data Warehouses − Hive - is the most versatile and capable of the many SQL or SQL-like ways of accessing data on Hadoop − Pig – Similar to Hive but supports procedure type language NoSQL Databases - In general, a NoSQL database relaxes the rigid requirements of an RDBMS (meaning ACID properties), so one can implement new usage models that require lower latency and higher levels of scalability. NoSQL databases come in four major types: © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 13 Big Data Databases
  • 14. Key/Value Database - is the simplest type of database and can store any kind of digital object. Each object is assigned a key and is retrieved using the same key. However, analytic capabilities are limited. Examples of key/value databases include Redis, Riak and Oracle NoSQL Database - There is no schema for the value returned by the key It is a heap of data - Query only by key, cannot query based on the rest of content - All queries return the value as a lump of data. - There is no way to just return some of the value! Use Cases: − Session information in web applications − The shopping cart for an online buyer © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 14 NoSQL Database types
  • 15. Document Stores- These are a type of key/value store in which documents are stored in recognizable formats and are accompanied by metadata. Because of the consistent formats and the metadata, you can perform search and analysis without first retrieving the documents. Examples of document stores include MongoDB, MarkLogic and Couchbase. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 15 No SQL Database types
  • 16. Brief walk-through © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 16 The Relational vs. Document-Oriented Data Model
  • 17. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 17 The Relational vs. Document-Oriented Data Model
  • 18. In the relational model, records are normalised across multiple tables, using primary key and foreign key constraints. The upside of normalisation is that there should be no duplication of data in the database. The downside is that a change in a single record can mean holding locks on a number of records on different tables as defined by the transaction to ensure a change does not leave the database in an inconsistent state. This complex web of interrelationships is what makes it so difficult to distribute relational data across multiple servers and can lead to performance challenges both reading and writing data. Big data is all about scaling out (horizontally). Granted this may result in duplication of data. However, using more storage in exchange for better application performance and the ability to easily distribute workloads across machines is now a better choice for many applications. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 18 The Relational vs. Document-Oriented Data Model
  • 19. What is a Document Data Model? Use of the term “document” is a bit confusing. A document-oriented database really has nothing to do with “documents” in the classical sense of the word. It does not mean books, letters or articles. Rather, a document in this case refers to a data record that is self-describing as to the data elements it contains. XML documents, HTML documents and JSON documents are examples of “documents” in this context. An example of a record will look like below © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 19 The Relational vs. Document-Oriented Data Model
  • 20. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 20 The Relational vs. Document-Oriented Data Model As can be seen, the data is de-normalized. Each record contains a complete set of information related to the error without external reference. The records are self-contained. This makes it very easy to move the entire record to another server – all the information simply comes along with it. There is no concern about having parts of the record in other tables left behind.
  • 21. Only the self-contained record (document) needs to be updated when changes are made. In the NoSQL world the document ID is the one and only key to a document. Document ID is roughly equivalent to a primary key in a relational database. Usually an ID can only appear once in a database. Different NoSQL solutions have different names for these: buckets, collections, tables, etc. A Collection is equivalent to a Table and a Document to a Row Use Cases: − Enterprise data − Event logging applications − Content management systems − Web analytics and real-time analytics − E-Commerce applications © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 21 The Relational vs. Document-Oriented Data Model
  • 22. Columnar Databases – These provide varying degrees of row and column structure. Each row in the column family has a unique key plus as many columns needed to hold relevant information. Unlike relational columnar databases like Teradata or SAP Sybase IQ, there is no requirement for every row to have the same columns. You can use a columnar database to perform more advanced queries on big data. Examples of columnar databases include Hbase, Cassandra and Accumulo. Use Cases: − Distribution of data across multiple nodes − Count and categorize data; sorting, counting for a time frame, example Web Analytics − Event logging applications © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 22 No SQL Database types
  • 23. Graph Databases – These store networks of objects that are linked using relationship attributes. For example, the objects could be people in a social network who are related as friends, colleagues or strangers. You can use a graph database to map and quickly analyze very complex networks. Examples of graph databases include Allegro and Neo4J. Use Cases − Who is associated with who − Fraud detection − Criminal or Terrorist Cell identification − Profile analysis in Social media − Routing and dispatch systems (public domain image hosted by wikipedia) © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 23 No SQL Database types
  • 24. Mich’s Journey: It takes a while to unlearn habits. But do not worry. You need to adjust the way you think about data and modelling. By understanding alternatives you will be able to make more efficient use of your existing knowledge as well. After all, the tool best suited for the job will leave you with the least headache. If you know more tools, you can choose better! © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 24 Advice
  • 25. As Big Data becomes bigger, companies will increasingly rely on Hadoop and the leading NoSQL databases. You are not going to see NoSQL databases unseating RDBMSs for established workloads so much as taking control of an increasing share of new workloads. Big Data in some form and shape is gradually creeping into organisations. Some NoSQL shape Whether you are a developer, architect, analyst or Database Administrator, you are going to come across it and get involved in it. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 25 The rise of NoSQL databases
  • 26. As Big Data databases find a space, it is becoming important to have the tools to deal with these distributed databases. Most of these tools can be used against RDBMS as well. The Fibre optics and Gigabit networks have resolved the latency issues by and large. I will be talking about one star tool in Big Data landscape called Apache Spark © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 26 The Rise of tools
  • 27. Apache Spark is a fast and general purpose engine for large- scale data processing. It is written mostly in Scala, and provides APIs for Scala, Java, Python and R. It is fully compatible with Hadoop Distributed File System (HDFS) and can run on Cloud It extends on Hadoop’s core functionality by providing in- memory cluster computation among other things. It has its own Standalone cluster manager and can work with Yarn resource manager as well. It is heavily used in Finance (VAR), Health care, Fraud Detection, streaming data, © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 27 Apache Spark
  • 28. Spark Combines SQL, streaming, and complex analytics. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 28 Apache Spark
  • 29. Spark supports similar dialect of ANSI-SQL including Analytics functions. You can also write the same code using Scala and functional programming. This code tells me what I spent in the past six year through my debit card in my bank (downloaded bank transactions in a CSV format), for each hashtag (Sainsbury, John Lewis, Waitrose etc.) © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 29 Apache Spark
  • 30. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 30 Apache Spark I have used Scala and functional programming in Spark-shell to produce the same results below:
  • 31. SQL Programming, Hadoop, Linux, Scala, Java, Python and R are the most in-demand skills in positions that mention big data as a requirement. In 2015 and 2016 we are seeing an exponential rise in demand for Spark with Scala and R skills used in functional programming and Machine Learning. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 31 Big Data Skills in Demand
  • 32. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 32
  • 33. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 33
  • 34. Tons of information in the Internet Big Data University http://bigdatauniversity.com My articles in LinkedIn https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2? Apache Software Foundation http://www.apache.org Other available sources © 2016 Mich Talebzadeh Roadmap for Careers in Big Data 34 Further Information
  • 35. Author About the Author: Mich Talebzadeh Big Data and RDBMS Senior Technical Architect E-mail: mich@Peridale.co.uk https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2? https://talebzadehmich.wordpress.com/ 35 All rights reserved. No part of this publication may be reproduced in any form, or by any means, without the prior written permission of the copyright holder. © 2016 Mich Talebzadeh Roadmap for Careers in Big Data