SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Big Data using NoSQL Technologies

Amit Kr. Singh
Senior Developer, Ericsson


December 14, 2012
My Background

    Part of Java and Open Source Practice Area.

    Driving technology initiatives in LockBox project.

    Part of System-X development team.

    Contributing in JOSP Competence Development & Training.
Big Data
Big Data
Ericsson defines:
“People,devices and things are constantly generating massive volumes of data.
At work people create data, as do children at home, students at school, people
and things on the move, as well as objects that are stationary. Devices and
sensors attached to millions of things take measurements from their
surroundings, providing up-to-date readings over the entire globe – data to be
stored for later use by countless different applications.”
Big Data
IBM defines:
“Every day, we create 2.5 quintillion bytes of data — so much that 90% of the
data in the world today has been created in the last two years alone. This data
comes from everywhere: sensors used to gather climate information, posts to
social media sites, digital pictures and videos, purchase transaction records, and
cell phone GPS signals to name a few. This data is big data.”
Big Data
Wikipedia defines:
Big data is a collection of data sets so large and complex that it becomes difficult
to process using on-hand database management tools. The challenges include
capture, curation, storage, search, sharing, analysis and visualization.
Big Data




           Why so many definitions? I am really confused.
Big Data
In simple words, A set of technology advances that have made capturing and
analyzing data at high scale and speed vastly more efficient.
Twitter
Facebook
Six insights from Facebook's former
Head of Big Data
                      Analytics on 900M users
            25PB of compressed data – 125 uncompressed

     New technologies has shifted the conversions from “what data to store” to
    “what can we do with more data”.

    Simplify data anlytics for end users.

    More users means data analytics system have to be more robust.

    Social networking works for Big Data.

    No single infrastructure can solve all Big Data problems.

    Building software is hard, but running a service is even harder.
Big Data in Retail
Big Data in IT
Big Data in Customer Services
Big Data Dimensions
The Three Vs of Big Data
Volume – big data comes in one size XXL and available storage cannot handle
these volumes.


Velocity – data needs to be used quickly to maximize business benefit before
the value of the information is lost.


Variability – data can be structured, unstructured, semi-structured or a mix of all
three. It comes in many forms including text, audio, video, click streams and log
files.
Big Data Technologies
Big-data technologies are usually engineered from the bottom up with two things
in mind: scale and availability. Most solutions are distributed in nature and
introduce new programming models for working with large volumes of data.


Technologies such as Not only SQL (NoSQL), characterized by its non-
adherence to the RDBMS model, used in a wide variety of industry applications.
These technologies have the flexibility to handle Big Data.
Scalability
Scalability refers to the ability of an application or product to increase in size as
demand warrants. The base concept is consistent – the ability for a business or
technology to accept increased volume without impacting the business settings.


    Scale horizontally (scale out)

    Scale vertically (scale up)
Scalability
Scale vertically (scale up)
Extra capacity can be obtained by adding more hardware to a specific computer
or by moving applications to larger computers – a process known as vertical
scaling. One limitation of this approach is the risk of outgrowing the capacity of
the largest computer; this will eventually affect cost. Vendor lock-in is a potential
risk, and vertically scaled solutions can become prohibitively expensive.
Scalability
Scale horizontally (scale out)
Adding computers in parallel can also increase capacity. This approach is known
as horizontal scaling, and Big Data technologies tend to favor it because it
supports network expansion. Systems that are built in this way are more flexible,
and because commodity computers can be operated together in parallel, the risk
associated with single vendor solutions is reduced. Also horizontal scaling is built
for Cloud.
Availability
Availability is a guarantee that every request receives a response
about whether it was successful or failed.

Users want their systems (Facebook, Twitter, Telecom app, etc) to be ready to
serve them at all times. If a user cannot access the system, it is said to be
unavailable. Generally, the term downtime is used to refer to periods when a
system is unavailable.
NoSQL
What NoSQL databases can:

    Serve as an online processing database, so that it becomes the primary
    datasource/operational datastore for online applications.

    Use data stored in primary source systems for real-time, batch analytics, and
    enterprise search operations.

    Handle “big data” use cases that involve data velocity, variety, volume, and
    complexity.

    Excel at distributed database and multi-data center operations.

    Offer a flexible schema design that can be changed without downtime or
    service disruption.

    Accommodate structured, semi-structured, and non-structured data.

    Easily operate in the cloud and exploit the benefits of cloud computing.
Is NoSQL replacing the RDBMS?
The answer is both yes and no, considering that the choice
between the two depends on the Use Case.


NoSQL doesn't take advantage of ACID properties. Applications which depend
on transaction support (Banking, Airlines etc) will continue to work with RDBMS
while Social Media applications which mostly deal with unstructured data will look
at alternative NoSQL solutions. However hybrid architecture may prove
beneficial as well where the power of both RDBMS and NoSQL can be
leveraged.
Is NoSQL replacing the RDBMS?
However many enterprises are choosing to leave some legacy RDBMS systems
in place, while directing new development towards NoSQL databases. This is
especially the case when the applications in question demand high write
throughput, need flexible schema designs, process large volumes of data, and
are distributed in nature.


Technology aside, another reason many new development and/or migration
efforts are being directed towards NoSQL databases is the high cost of legacy
RDBMS vendors versus NoSQL software. In general the fact is that, NoSQL
software is a fraction of what vendors such as IBM and Oracle charge for their
databases.
RDBMS & Big Data
Tactics to extend the useful scope of RDBMS technology

    Sharding

    Denormalizing

    Distributed caching
Sharding
If the data for an application will not fit on a single server or, more likely, if a
single server is incapable of maintaining the I/O throughput required to serve
many users simultaneously, then a tactic known as sharding is frequently
employed.


Database sharding is the process of splitting up a database across multiple
machines to improve the scalability of an application.
Sharding
This does work to spread the load but there are some undesirable
consequences to the approach.


    When you fill a shard, you have to change the sharding strategy in the
    application itself. For example, placing user profile information on one database
    server, friend lists on another and a third for user generated content like photos
    and blogs. The main problem with this approach is that if the site experiences
    additional growth then it may be necessary to further shard a feature specific
    database across multiple servers.


    You lose some of the most important benefits of the relational model. You can’t
    do “joins” across shards. In addition, you can’t do cross-node locking when
    making updates.
Denormalizing
Denormalization is the process of attempting to optimise the read performance of
a database by adding redundant data or by grouping data. In some cases,
denormalisation is a means of addressing performance or improving the
scalability in relational database software.


    Most of the time denorm is application-specific and needs to be re-evaluated if
the application changes.

    Denorm can increase the size of tables.
Distributed Caching
Another tactic used to extend the useful scope of RDBMS technology is to
employ distributed caching technologies, such as Memcached. Today,
Memcached is a key ingredient in the data architecture behind 18 of the top 20
largest (by user count) Web applications, including Google, Wikipedia, Twitter,
YouTube and Facebook.


Memcached “sits in front” of an RDBMS system, caching recently accessed data
in memory and storing that data across any number of servers or virtual
machines. When an application needs access to data, rather than going directly
to the RDBMS, it first checks Memcached to see if the data is available there; if it
is not, then the database is read by the application and stored in Memcached for
quick access next time it is needed.
Distributed Caching
Distributed Caching
Memcached and similar distributed caching technologies used for this purpose
are no magic and can even create problems of their own:


  Memcached was designed to accelerate the reading of data by storing it in
main memory, but it was not designed to permanently store data. Memcached
stores data in memory. If a server is powered off or otherwise fails, or if memory
is filled up, data is lost.


  Again another tier to manage. It should be obvious that inserting another tier of
infrastructure into the architecture to address some (but not all) of the failings of
RDBMS technology in the modern interactive software use case can create its
own set of problems: more capital costs, more operational expense, more points
of failure and more complexity.
NoSQL Technologies
Sharding, Denormalizing, Distributed Caching and other tactics are all attempt to
paper over one simple fact: RDBMS technology is a forced fit for modern
interactive software systems. Because vendors of RDBMS technology have little
incentive to disrupt a technology generating billions of dollars for them annually.
Few application developers from Google (Big Table) and Amazon (Dynamo) took
initiatives and invented, developed No SQL database technologies.
NoSQL Characteristics:

  No schema required. Data can be inserted in a NoSQL database without first
defining a rigid database schema. As a corollary, the format of the data being
inserted can be changed at any time, without application disruption. This
provides immense application flexibility, which ultimately delivers substantial
business flexibility.


  Auto-sharding. A NoSQL database automatically spreads data across servers,
without requiring applications to participate. Servers can be added or removed
from the data layer without application downtime. Most NoSQL databases also
support data replication, storing multiple copies of data across the cluster, and
even across data centers, to ensure high availability and support disaster
recovery.
NoSQL Characteristics:

 Distributed query support. “Sharding” an RDBMS can reduce, or eliminate in
certain cases, the ability to perform complex data queries. NoSQL database
systems retain their full query expressive power even when distributed across
hundreds or thousands of servers.


 Integrated caching. To reduce latency and increase sustained data throughput,
advanced NoSQL database technologies transparently cache data in system
memory. This behavior is transparent to the application developer and the
operations team, in contrast to RDBMS technology where a caching tier is
usually a separate infrastructure tier that must be developed to, deployed on
separate servers, and explicitly managed by the ops team.
Research activities in Big Data

 The White House has recently announced a national "Big Data Initiative" for
improving the ability to extract knowledge and insights from large and complex
collections of digital data. This initiative will help US goverment in scientific
discovery, environmental and biomedical research, education, and national
security.


  NASA is working on number of innovative approaches to advancing Big Data,
including the Lunar Mapping and Modeling Activity
Reference

  www.couchbase.com

  www.datastax.com

  www.forbes.com/sites/davefeinleib/2012/07/09/the-3-is-of-big-data/

  www.slideshare.net/bigdatalandscape/big-data-trends

  www.kunocreative.com/blog/bid/76907/Big-Data-Made-Simple-What-
Marketers-Need-to-Know
Big Data using NoSQL Technologies

Mais conteúdo relacionado

Mais procurados

Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategylarryzagata
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Overview of Artificial Intelligence in Cybersecurity
Overview of Artificial Intelligence in CybersecurityOverview of Artificial Intelligence in Cybersecurity
Overview of Artificial Intelligence in CybersecurityOlivier Busolini
 
MICROSOFT POWER BI PPT.pptx
MICROSOFT POWER BI PPT.pptxMICROSOFT POWER BI PPT.pptx
MICROSOFT POWER BI PPT.pptxridazulquarnain
 
Data Science Powerpoint Presentation Slides
Data Science Powerpoint Presentation SlidesData Science Powerpoint Presentation Slides
Data Science Powerpoint Presentation SlidesSlideTeam
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented databaseKanike Krishna
 
Introduction to power BI
Introduction to power BIIntroduction to power BI
Introduction to power BIRamar Bose
 
Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Priyanka Khanadali
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideSlideTeam
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 

Mais procurados (20)

Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data
Big dataBig data
Big data
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Overview of Artificial Intelligence in Cybersecurity
Overview of Artificial Intelligence in CybersecurityOverview of Artificial Intelligence in Cybersecurity
Overview of Artificial Intelligence in Cybersecurity
 
MICROSOFT POWER BI PPT.pptx
MICROSOFT POWER BI PPT.pptxMICROSOFT POWER BI PPT.pptx
MICROSOFT POWER BI PPT.pptx
 
Data Science Powerpoint Presentation Slides
Data Science Powerpoint Presentation SlidesData Science Powerpoint Presentation Slides
Data Science Powerpoint Presentation Slides
 
Big data
Big dataBig data
Big data
 
Power bi
Power biPower bi
Power bi
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Introduction to power BI
Introduction to power BIIntroduction to power BI
Introduction to power BI
 
Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019Power bi-dashboard-in-a-day-diad-mumbai-2019
Power bi-dashboard-in-a-day-diad-mumbai-2019
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 

Semelhante a Big Data using NoSQL Technologies

Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Ieee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportIeee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportOutsource Portfolio
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Big data management
Big data managementBig data management
Big data managementzeba khanam
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 

Semelhante a Big Data using NoSQL Technologies (20)

Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data
Big DataBig Data
Big Data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Hadoop
HadoopHadoop
Hadoop
 
Ieee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportIeee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture report
 
AtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White PapaerAtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White Papaer
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Datos iO Product Overview
Datos iO Product OverviewDatos iO Product Overview
Datos iO Product Overview
 
MongoDB
MongoDBMongoDB
MongoDB
 
Big data management
Big data managementBig data management
Big data management
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big Data przt.pptx
Big Data przt.pptxBig Data przt.pptx
Big Data przt.pptx
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Último (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Big Data using NoSQL Technologies

  • 1. Big Data using NoSQL Technologies Amit Kr. Singh Senior Developer, Ericsson December 14, 2012
  • 2. My Background  Part of Java and Open Source Practice Area.  Driving technology initiatives in LockBox project.  Part of System-X development team.  Contributing in JOSP Competence Development & Training.
  • 4. Big Data Ericsson defines: “People,devices and things are constantly generating massive volumes of data. At work people create data, as do children at home, students at school, people and things on the move, as well as objects that are stationary. Devices and sensors attached to millions of things take measurements from their surroundings, providing up-to-date readings over the entire globe – data to be stored for later use by countless different applications.”
  • 5. Big Data IBM defines: “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.”
  • 6. Big Data Wikipedia defines: Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, curation, storage, search, sharing, analysis and visualization.
  • 7. Big Data Why so many definitions? I am really confused.
  • 8. Big Data In simple words, A set of technology advances that have made capturing and analyzing data at high scale and speed vastly more efficient.
  • 11. Six insights from Facebook's former Head of Big Data Analytics on 900M users 25PB of compressed data – 125 uncompressed  New technologies has shifted the conversions from “what data to store” to “what can we do with more data”.  Simplify data anlytics for end users.  More users means data analytics system have to be more robust.  Social networking works for Big Data.  No single infrastructure can solve all Big Data problems.  Building software is hard, but running a service is even harder.
  • 12. Big Data in Retail
  • 14. Big Data in Customer Services
  • 16. The Three Vs of Big Data Volume – big data comes in one size XXL and available storage cannot handle these volumes. Velocity – data needs to be used quickly to maximize business benefit before the value of the information is lost. Variability – data can be structured, unstructured, semi-structured or a mix of all three. It comes in many forms including text, audio, video, click streams and log files.
  • 17. Big Data Technologies Big-data technologies are usually engineered from the bottom up with two things in mind: scale and availability. Most solutions are distributed in nature and introduce new programming models for working with large volumes of data. Technologies such as Not only SQL (NoSQL), characterized by its non- adherence to the RDBMS model, used in a wide variety of industry applications. These technologies have the flexibility to handle Big Data.
  • 18. Scalability Scalability refers to the ability of an application or product to increase in size as demand warrants. The base concept is consistent – the ability for a business or technology to accept increased volume without impacting the business settings.  Scale horizontally (scale out)  Scale vertically (scale up)
  • 19. Scalability Scale vertically (scale up) Extra capacity can be obtained by adding more hardware to a specific computer or by moving applications to larger computers – a process known as vertical scaling. One limitation of this approach is the risk of outgrowing the capacity of the largest computer; this will eventually affect cost. Vendor lock-in is a potential risk, and vertically scaled solutions can become prohibitively expensive.
  • 20. Scalability Scale horizontally (scale out) Adding computers in parallel can also increase capacity. This approach is known as horizontal scaling, and Big Data technologies tend to favor it because it supports network expansion. Systems that are built in this way are more flexible, and because commodity computers can be operated together in parallel, the risk associated with single vendor solutions is reduced. Also horizontal scaling is built for Cloud.
  • 21. Availability Availability is a guarantee that every request receives a response about whether it was successful or failed. Users want their systems (Facebook, Twitter, Telecom app, etc) to be ready to serve them at all times. If a user cannot access the system, it is said to be unavailable. Generally, the term downtime is used to refer to periods when a system is unavailable.
  • 22. NoSQL What NoSQL databases can:  Serve as an online processing database, so that it becomes the primary datasource/operational datastore for online applications.  Use data stored in primary source systems for real-time, batch analytics, and enterprise search operations.  Handle “big data” use cases that involve data velocity, variety, volume, and complexity.  Excel at distributed database and multi-data center operations.  Offer a flexible schema design that can be changed without downtime or service disruption.  Accommodate structured, semi-structured, and non-structured data.  Easily operate in the cloud and exploit the benefits of cloud computing.
  • 23. Is NoSQL replacing the RDBMS? The answer is both yes and no, considering that the choice between the two depends on the Use Case. NoSQL doesn't take advantage of ACID properties. Applications which depend on transaction support (Banking, Airlines etc) will continue to work with RDBMS while Social Media applications which mostly deal with unstructured data will look at alternative NoSQL solutions. However hybrid architecture may prove beneficial as well where the power of both RDBMS and NoSQL can be leveraged.
  • 24. Is NoSQL replacing the RDBMS? However many enterprises are choosing to leave some legacy RDBMS systems in place, while directing new development towards NoSQL databases. This is especially the case when the applications in question demand high write throughput, need flexible schema designs, process large volumes of data, and are distributed in nature. Technology aside, another reason many new development and/or migration efforts are being directed towards NoSQL databases is the high cost of legacy RDBMS vendors versus NoSQL software. In general the fact is that, NoSQL software is a fraction of what vendors such as IBM and Oracle charge for their databases.
  • 25. RDBMS & Big Data Tactics to extend the useful scope of RDBMS technology  Sharding  Denormalizing  Distributed caching
  • 26. Sharding If the data for an application will not fit on a single server or, more likely, if a single server is incapable of maintaining the I/O throughput required to serve many users simultaneously, then a tactic known as sharding is frequently employed. Database sharding is the process of splitting up a database across multiple machines to improve the scalability of an application.
  • 27. Sharding This does work to spread the load but there are some undesirable consequences to the approach.  When you fill a shard, you have to change the sharding strategy in the application itself. For example, placing user profile information on one database server, friend lists on another and a third for user generated content like photos and blogs. The main problem with this approach is that if the site experiences additional growth then it may be necessary to further shard a feature specific database across multiple servers.  You lose some of the most important benefits of the relational model. You can’t do “joins” across shards. In addition, you can’t do cross-node locking when making updates.
  • 28. Denormalizing Denormalization is the process of attempting to optimise the read performance of a database by adding redundant data or by grouping data. In some cases, denormalisation is a means of addressing performance or improving the scalability in relational database software.  Most of the time denorm is application-specific and needs to be re-evaluated if the application changes.  Denorm can increase the size of tables.
  • 29. Distributed Caching Another tactic used to extend the useful scope of RDBMS technology is to employ distributed caching technologies, such as Memcached. Today, Memcached is a key ingredient in the data architecture behind 18 of the top 20 largest (by user count) Web applications, including Google, Wikipedia, Twitter, YouTube and Facebook. Memcached “sits in front” of an RDBMS system, caching recently accessed data in memory and storing that data across any number of servers or virtual machines. When an application needs access to data, rather than going directly to the RDBMS, it first checks Memcached to see if the data is available there; if it is not, then the database is read by the application and stored in Memcached for quick access next time it is needed.
  • 31. Distributed Caching Memcached and similar distributed caching technologies used for this purpose are no magic and can even create problems of their own:  Memcached was designed to accelerate the reading of data by storing it in main memory, but it was not designed to permanently store data. Memcached stores data in memory. If a server is powered off or otherwise fails, or if memory is filled up, data is lost.  Again another tier to manage. It should be obvious that inserting another tier of infrastructure into the architecture to address some (but not all) of the failings of RDBMS technology in the modern interactive software use case can create its own set of problems: more capital costs, more operational expense, more points of failure and more complexity.
  • 32. NoSQL Technologies Sharding, Denormalizing, Distributed Caching and other tactics are all attempt to paper over one simple fact: RDBMS technology is a forced fit for modern interactive software systems. Because vendors of RDBMS technology have little incentive to disrupt a technology generating billions of dollars for them annually. Few application developers from Google (Big Table) and Amazon (Dynamo) took initiatives and invented, developed No SQL database technologies.
  • 33. NoSQL Characteristics:  No schema required. Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.  Auto-sharding. A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster, and even across data centers, to ensure high availability and support disaster recovery.
  • 34. NoSQL Characteristics:  Distributed query support. “Sharding” an RDBMS can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds or thousands of servers.  Integrated caching. To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the ops team.
  • 35. Research activities in Big Data  The White House has recently announced a national "Big Data Initiative" for improving the ability to extract knowledge and insights from large and complex collections of digital data. This initiative will help US goverment in scientific discovery, environmental and biomedical research, education, and national security.  NASA is working on number of innovative approaches to advancing Big Data, including the Lunar Mapping and Modeling Activity
  • 36. Reference  www.couchbase.com  www.datastax.com  www.forbes.com/sites/davefeinleib/2012/07/09/the-3-is-of-big-data/  www.slideshare.net/bigdatalandscape/big-data-trends  www.kunocreative.com/blog/bid/76907/Big-Data-Made-Simple-What- Marketers-Need-to-Know