SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
A Beginner’s guide to Big Data, Analytics and Cloud.
I have been hearing a lot about these buzz words in the title for about couple of years now, and
luckily had couple of opportunities to work on them over the past 6 months or so, thanks to my own
consultancy, which made me read a few books and articles on these subjects that got me ticking to
know more. With certainly no claim to being an expert in these areas, I have managed to gain some fair
understanding that I thought would share here purely for education purposes, so that everyone who
hears these buzzwords knows what it is all about and can manage to have a good conversation around
it. I must definitely thank the references below as most of the writing here is just a highly edited
summation of the details found in them, to keep your reading more at the layman level. Consider this as
a crisp primer for the uninitiated from both the technology and the consumption angle.
Business Process Outsourcing (BPO) that made India its right-sourcing capital during the first decade
of this century has slowly moved on to the shores of Philippines and Vietnam. Now it seems that
everyone is talking about the Analytics and Cloud wave to have hit India, either through the typical right
sourcing to analytics companies here or as a captive analytics center for some big multinational
company. Certainly I see the value being created in this wave to be higher than the BPO wave, and looks
like India can establish its credential using its early mover advantage. Lots of big names like HCL and
IBM have got major contracts to maintain the entire IT departments of the Fortune 500 companies on a
cloud model and this is a growing area.
Digital economy(sales) is ready to surpass physical economy. Nowadays, all organizations are asking
what their customers want and what do they generally do? With your private information on any social
media not exactly private as you think, they want to know who your friends are and what do they like?
Who influences you and whom do you influence? They have large quantities of the data in the world to
analyze these information from and design a product or a feature to a product that you are bound to be
happy with. Companies are thinking on their feet, in real time, very quick to react to the feedback you
have given them… at least good companies strive to do this and are placing their bets in this direction. It
is no more about a group of customers or a cross section of people that they want to study, but they
want to know YOU as a customer individually. Oh how lovely you feel!
Analytics is being used in both B2C and B2B but the former is more challenging than the latter
because predicting end consumer’s behavior to buy, which is usually emotional and irregular, is touch.
Businesses buy or consume in a more regular and rational fashion using usually a well-known process.
What makes the B2C modelling much harder is the fact the data here is more complex due to its volume
and variants as more than half the data is ‘unstructured’.
Definitions
This Big DATA is just ‘data that is quite large that cannot be processed by conventional methods’.
McKinsey defines Big Data as large data sets that cannot be captured or analyzed by typical database
software tools. So, Today’s Big Data may not be tomorrow’s Big Data as the tools would have caught up
to analyze today’s Big Data tomorrow but Big Data of tomorrow would be orders of magnitude higher
than today’s data so that the same problem remains. Hence if it safe to say that we are just at the
beginning of an explosion of a DATA world. Big Data is all about the Internet of Things, social and
mobile put together.
The industry has defined Big Data across three V characteristics: Volume, Variety, Velocity and
sometimes a fourth gets added – Veracity. The volume is measured by the sheer size of the data, the
variety talks about the assortment of data (structured vs unstructured) and the velocity about the speed
at this data get created or processed. Veracity is the one that talks about the accuracy of these huge
data, trust behind these data sources and how to take off the noise to arrive at decent useful
information that makes good business sense. The source of data can be either machine generated like
sensor data, web log data, transaction data etc. which are structured and satellite images, scientific
data, multimedia which are unstructured data, or human generated like survey input data, click-stream
data, etc. which are structured data and emails, social media data, SMS etc. which are unstructured
data. Each one of these unstructured data can be an analytics domain independently, like text analytics,
and lots of research is going into them. Usually structured data are stored in some sort of a table in a
RDBMS and can be queried through an SQL.
Traditionally we know of only the ‘structured’ data – the ones that can put into a database. For the
past few years, thanks to the explosion of social media and smart phones, we have ‘unstructured’ data
in the form of text(emails)/SMS, multimedia (audio, video), (A)GPS and other location based data, data
from sensors , etc. that seems to be imploding daily. These ‘unstructured’ data are the ones that are
becoming to be less private because you like to share them across the social platforms and the
corporations want to have a strong direct relationship with you based on these data. They want to do
everything they can to acquire new customers and retain and cross-sell to existing customers. If you
sneeze, the corporations catch a cold – this is how close they get to you.
Analytics is the way in which corporates handle these complexities and speed in data to arrive at a
business value that gives them the competitive advantage. Analytics is just an interface between these
large data and the business model. It uses mathematics to derive meaning from data. Most of the
analytics has its roots to Google, Yahoo and Amazon who are considered pioneers in these and the
technology being used. In the earlier days, they just used to work on ‘samples’ or a smaller subset of the
data, discarding all the outliers, and do some predictions. Nowadays, with the availability of affordable
storage, networking and computing power and even pay-as-you-use models, all the generated data gets
analyzed to arrive at deeper and broader insights. Since all the decisions are getting to be more data-
centric, it is imperative there is proper transformation and cultural change across the corporation in
terms of all the people, the process and strong leadership
Big Data analytics have moved from being descriptive (based on past information using statistics –
Business Intelligence to understand what happened) to inquisitive analytics (why it happened) to being
predictive (used past information to predict future outcomes- Data mining and forecasting for what is
likely to happen) to being prescriptive (used past information to direct future results – optimization to
arrive what should happen). The world has moved from models created by small ‘samples’ to using ALL
the data to create more complex models and simulate evolving scenarios. All these outcomes of
information management in the form of reports, dashboards or animated visualization gets up-levelled
to the senior leadership team to arrive at some qualified decisions which becomes the baseline for the
way ahead for corporations. The talent that is required to do all these modelling are essentially a
combination of data scientists with good maths, statistics and technology background and business
managers with good economics, behavioral science and social skills.
Cloud is just a means to provide shared computing resources that are pay-as-you-go and in the IT
jargon, it is often referred to as XaaS where X can stand for I or H, P, S, etc. IT services are seen as
utilities and one pays only for the time the resources are being used, hence cloud is also referred to as
Utility computing. Infrastructure as a Service (IaaS, Hardware as a Service – HaaS) is the most common
of all cloud services that delivers all computing resources on a rental basis, Platform as a Service (PaaS)
is a means by which tools and middleware gets integrated with IaaS to provide a comprehensive
consistent platform, and Software as a Service (Saas) is an application that gets created and hosted by
the developer in a multi-client mode and will sit on top of a PaaS or a IaaS. Cloud, be it private which
means owned and operated by the organization itself or public which means owned and operated by a
vendor or hybrid which is a combination of both Private and Public, is essential for Big Data. Examples
of Iaas would be Amazon EC2-cloud Compute service and Rightscale, of PaaS would be Microsoft Azure
and of SaaS would be a CRM like Salesforce.com. Google has also introduced Data as a Service (DaaS)
where one can use the cloud to store and retrieve data. Cloud computing still has some nagging issues
of security, privacy and standardization (or lack thereof) which are slowly falling in place, and the old IT
organization and the CIO roles are getting transformed taking this new paradigm into effect.
Technology
There are many Big Data technologies being used but the most common today is the Apache Hadoop
framework which is an open-source platform for both storage and processing of all data variants. The
two critical components of Hadoop are the Distributed file Systems (HDFS) used for storage and the Map
Reduce which does the analysis on the data, both in the distributed sense.
MapReduce was designed originally by Google that distributes the problem and later aggregates the
result in batch mode. Google developed Big Table as their distributed storage system from where
Hadoop derived the HDFS.
Hardware, networking and storage have become more affordable now and are constantly getting
cheaper to enable distributed computing in a big way. Cloud gives you all these through subscription
based service, with no upfront capital or maintenance costs.
Open source software is key and was made prevalent by Google through its Android mobile OS and is
the key forward for any new technologies to be embraced quickly – the eco-system builds up around
this open source efficiently and quickly, thus able to deliver all sorts of solutions for a very low cost.
The smaller companies seem to be more agile in delivering a solution for a customer need than the big
software vendors and this is creating competition where size does not matter. The software has moved
from a classic licensing model to a royalty based model to an annual fee based model thereby benefiting
the end user who always has the latest updated version to work with.
Distributed computing is a fundamental technology that allows independent computing resources to
be networked seamlessly together across a huge geographical area to make it look like one single
coherent environment. Computing resources that are being shared can include computing entities to
memory to networks to storage, but they all have to work together to execute a program. Over the
years, distributed computing has evolved from mainframe computing where there was a large
computer using multiple processors with massive IO operations used for batch and transactional
processing, to Cluster computing where several cheap commodity machines were connected by a high
bandwidth network and controlled by specific software tool for parallel computing, to Grid computing
which is an evolution of clusters where the grids are actually an aggregation of geographically dispersed
clusters connected by Internet and users can ‘consume’ resources just like any other utility.
Distributed computing can be regarded as a super set of parallel computing, the latter implying a
tightly coupled system of mostly homogenous components sharing the same physical memory or shared
memory. Distributed computing encompasses all architectures that use heterogeneous computing
elements not necessarily co-located. The differences between these two types are getting blurred as
these two terms indeed gets used loosely to mean the same thing – both are used to perform multiple
activities in parallel. Since in Big Data, the data complexity is high due to its volume, variants and
distribution, and the computational complexity may also be high, distributed heterogeneous computing
fits well for statistical models, and simulations. Cloud technologies support Big Data well by providing
large computing resources on demand, providing large storage for keeping these large data and
providing frameworks for optimized processing of large amount of data.
The foundation of cloud computing is Virtualization that separates the resources and services from
the underlying physical system- here again, this logical split can happen at the server end through a thin
software layer inserted into the hardware that contains a virtual machine monitor (VMM) or Hypervisor,
at the application level to make it OS independent, at the memory level where the memory gets
decoupled from the server, at the networking level through a SW that just makes a pool of connectivity
available or at the storage level – this level of abstraction that virtualization gives just provides the
relevant information needed and hides the exact details which may not be relevant, and makes
applications portable across different hardware and software environment. Although not meaning the
same, this software abstraction is more or less similar to the green-font HW machines called ‘XTerm’
used by DEC and SUN during the 1980s that front-ended for their servers there were at the back for
computing. The most common technologies used here are Xen, VmWare and Microsoft Hyper-V.
Applications
Analytics has become prevalent in some key areas now and is slowly changing the way we do business:
Financial-Banking and Insurance – perhaps the prevalent users of analytics and early adopters as well
Credit Card Fraud: The transaction record of the customer is validated against the customer
records and his/her past transactions, their travel schedules (getting access to travel sites from
where they did the booking) and place of transaction to identify if there is any abnormal activity,
as they are transacting in real-time. There are certain rules set for each customer based on
his/her history that the transaction gets checked against. If some transaction is believed to be
‘suspicious’, then more ‘verification’ process is added to the transaction to make it more secure.
Credit Risk analysis: Banks wants to play safe to ensure they can retrieve the loan from their
customers – they look at past credit history against your name to see if you are a ‘safe bet’.
Thanks to the credit rating agency like Crisil which does this as their main line of business, the
information of all credit transaction of all kind is available to the banks and loan-giver to verify
the details and distribute a loan or give you a credit card or line of credit.
Insurance Risk analysis: Right now, your vehicle insurance premium is based on the city you
live, the risk of the neighborhood you are in, and your driving points against you and prior claims
made. In the USA, few insurance companies are generating the premium based on INDIVIDUAL
customers and customized to them as a pay-as-you-drive insurance policy. The onboard
telematics sends feeds to your insurer on your braking and acceleration habits, distance you
travel, and the roads you frequently travel on (using GPS) – thanks to these various sensorial
data, higher premium is charged for more irresponsible driving. This in turn serves both
purposes – makes insurance companies more profitable and also betters one’s driving habits. A
shining example of not only where the ‘rubber meets the road’ but also where the ‘engine
meets the wallet’!
Healthcare
The biggest bang for the buck for analytics, in my opinion, would be in two areas – healthcare which
impacts everyone’s life, and in retail to understand customers better. Healthcare comes today at a cost
and is heavily dependent on the facilities of the hospitals or clinics you are getting the treatment in, and
the knowledge of the doctor attending to you. Healthcare is one critical industry like power where the
government needs to ensure it is affordable to all its citizens, and at the same time must be the best
available there is to all.
For all this to happen, a good start would be a health record of the patient available electronically
across the nation and the globe. This would carry a history of ailments, conditions, surgery and
medications of the patient and the regular health check-up results – this is the Electronic Health Record
(EHR) available in the US and other countries. The second would be the availability of all clinical trials
that are in process or already FDA-approved, side-effects data of all medications, common diseases data
prevalent in certain parts of the world and definitely the insurance data of the individual. With these
two together, any doctor from anywhere in the world can give guidance to the best and optimum cure
and care for the patient, best medicine from any pharma company for a particular condition, and the
best insurance plan for an individual and his family based on the risks they carry. Data drives most of
these integrated decisions now, along with the doctor’s experience to suggest a remedy – compare it to
the yesteryears where the former data would not be available. This also further progress into tele-
medicine where a solder injured in the battle is in an operation tent with medical gadgets streaming
data to experienced doctors sitting elsewhere to guide the surgery procedure and to have him get out of
danger quickly.
Retail
All your purchase patterns and transactions are being collected and analyzed carefully to send you
targeted advertisements with e-coupons, to aid companies do location based marketing, to help
companies get data on leaving customers and where they are going to and why, in managing the
effectiveness of an ad campaign, and in knowing details of acquired customers to improve cross-selling.
The better they know the customers, the better would be their sales in an industry with thin margins.
The other areas in retails where big data analytics is already in use are in inventory management,
logistics optimization, merchandize assortment and pricing optimization, fraud and loss prevention and
vendor rationalization.
Classing examples of analytics are Amazon “you may also want” prompts and Netflix “what your
friends thought” of movie suggestion, both of which shows good results for the retailer.
Travel
Many of the travel sites collects the log files from all the searches made by the users, and based on
your desired preferences will strive to increase their bookings ratio. They would also have data from the
text analytics report from your TripAdvisor reviews and based on what you like and do not like, and
based on your past history on their site and other sites, will be able to give out optimized flight and
hotel options taking together the inputs you had given based on budgets and time.
Transport
Volvo along with Sweden’s Transportation department is using cloud service for car-to-car
communication to warn the drivers ahead of icy and slippery roads, thus making safety a priority. They
collect the data from the sensors (ESPs) fitted inside their cars - ESP stabilizes the car as well as sends
signals of hazardous road conditions through the mobile network to the cloud. This real-time
information is shared with the cars behind that are to use the same road so that they are pre-warned
about the actual condition of the road and this information compliments any blanket weather warning
that the drivers automatically get updated on.
Media
Major part of advertising is the reach and conversion that one gets through any forms of media, be it
mobile, TV, Web or the classic print. Advertising is what brings money to the media houses. Despite
the numerous ads that come on any websites, only a few gets clicked and only a small percentage of
these clicks actually turns into a purchase. The marketing world is always challenged with how an ad can
be more effective so that the hit ratio increases. Now with the digital cable and dish TVs clearly revealing
your viewing patterns, your online purchases and shop transactions revealing your buying pattern, with
the website having a history of your visits in some format, and with the operator knowing what Value-
added services you have enrolled in, and with the world knowing what paper you read, all these
combined through analytics would clearly describe a ‘path-to-purchase’ pattern to enable the media
houses to focus their ads appropriately. It would not too long before ads stream into your TV or mobile
that is customized based on your likes.
We already have News websites that customize your viewing page automatically based on your
interest as this data is already collected and analyzed based on your previous trips to the website.
Pharmaceuticals
The business problems that get tackled here through analytics are classified into three buckets:
 Sales and Marketing to understand their sales force effectiveness and resource optimization,
market assessment and competitive analysis
 Research and Development for clinical trials and reporting to FDA, safety analysis for the
product, and licensing
 Pricing and contracting for inventory and logistics management, and for setting up contracts and
buybacks and rebates etc.
The other applications that are prevalent, some of which are being used by you daily without being
aware they are Cloud based , are Google Docs, Gmail and Yahoo Mail, wearable health devices that has
sensors that routinely monitor vital patient data and feeds back to the hospital or doctor who can take
action based on any anomalies immediately, gene profiling and protein structure modelling that was
done using community cloud from research institutions, use of satellite image processing used by
several countries now for natural disaster management, opinion polls during elections, online document
storage like Dropbox of iCloud by Apple, all the social networking sites like Facebook and Twitter, online
gaming and casino gambling predictions.
Transformation in the future
How do you feel if some complex tool used by a company predicts your next behavior with reasonable
accuracy? How can companies use the data you provide and analyze them to make you BUY? How can
healthcare be more focused to your particular problem and provide the best care at the cost you want?
How you get the best travel package suited for you and your family based on your likes and dislikes that
would enhance the memories of the travel? How can your insurance be tailor made for you based on
your own defensive driving habits and your history of no claims? How can the banks give you the best
bang for your buck by automatically understanding your financial goals and delivering a better return for
you as a privileged customer? How can airlines make you fly with them frequently by enhancing your
particular travel experience every time?
Big Data and its associated analytics are used to take on each customer as a time and enhance their
experience. We can still use the old route and use the 80/20 rule that says that one can easily draw
effective 80% of the conclusions and decisions based on the top 20% of the overall customer data. The
choice is clear.
REFERENCES:
(i) Big Data, Big Analytics – Michael Minelli et al , Wiley, 2013
(ii) Big Data for dummies – Judith Hurwitz et al, Wiley, 2013
(iii) Mastering Cloud Computing – Rajkumar Buyya et al, McGraw Hill, 2013
Many thanks to the reviewers of this blog and their valuable feedback – Vishoo, Venki and John, all of
them from either analytics or e-commercebackground.

Mais conteúdo relacionado

Mais procurados

Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Happiest Minds Technologies
 
White_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyWhite_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyTania Mushtaq
 
Whitebook on Big Data
Whitebook on Big DataWhitebook on Big Data
Whitebook on Big DataViren Aul
 
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018Yoh Staffing Solutions
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052Gilbert Rozario
 
The implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSThe implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSGeorge Kershoff
 
SB106 -- Social Business in the Context of IBM's Overall Strategy
SB106 -- Social Business in the Context of IBM's Overall StrategySB106 -- Social Business in the Context of IBM's Overall Strategy
SB106 -- Social Business in the Context of IBM's Overall StrategyArthur Fontaine
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellenceMudit Mangal
 
Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Charlotte Skornik
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startupswallesplace
 

Mais procurados (19)

Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
White_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyWhite_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copy
 
Big data Readiness white paper
Big data  Readiness white paperBig data  Readiness white paper
Big data Readiness white paper
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Whitebook on Big Data
Whitebook on Big DataWhitebook on Big Data
Whitebook on Big Data
 
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
 
Making sense of consumer data
Making sense of consumer dataMaking sense of consumer data
Making sense of consumer data
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
The state of the Big Data market
The state of the Big Data marketThe state of the Big Data market
The state of the Big Data market
 
Buyer's guide to strategic analytics
Buyer's guide to strategic analyticsBuyer's guide to strategic analytics
Buyer's guide to strategic analytics
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
The implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSThe implications of Big Data for BTS and COS
The implications of Big Data for BTS and COS
 
SB106 -- Social Business in the Context of IBM's Overall Strategy
SB106 -- Social Business in the Context of IBM's Overall StrategySB106 -- Social Business in the Context of IBM's Overall Strategy
SB106 -- Social Business in the Context of IBM's Overall Strategy
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 
Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012
 
Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startups
 
The dawn of Big Data
The dawn of Big DataThe dawn of Big Data
The dawn of Big Data
 

Destaque

Business intelligence primer
Business intelligence primerBusiness intelligence primer
Business intelligence primerKarthick S
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics PrimerChad Richeson
 
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...CA Technologies
 
Introduction to business analytics
Introduction to business analyticsIntroduction to business analytics
Introduction to business analyticsAna Canhoto
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
A Primer in Startup Analytics
A Primer in Startup AnalyticsA Primer in Startup Analytics
A Primer in Startup AnalyticsGeorge Voulgaris
 

Destaque (8)

Business intelligence primer
Business intelligence primerBusiness intelligence primer
Business intelligence primer
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
 
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...
Technology Primer: Learn How Analytics Can Sort the Wheat From the Chaff and ...
 
Introduction to business analytics
Introduction to business analyticsIntroduction to business analytics
Introduction to business analytics
 
Data models
Data modelsData models
Data models
 
Dbms models
Dbms modelsDbms models
Dbms models
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
A Primer in Startup Analytics
A Primer in Startup AnalyticsA Primer in Startup Analytics
A Primer in Startup Analytics
 

Semelhante a BEGINNER'S GUIDE TO BIG DATA, ANALYTICS AND CLOUD

Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsAhmed Banafa
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Aditya205306
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group
 
Intro to big data and applications - day 1
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1Parviz Vakili
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Kavika Roy
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analyticsGahya Pandian
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021Bernard Marr
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Jennifer Walker
 
10 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 201710 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 2017Ajeet Singh
 
Big data's impact on online marketing
Big data's impact on online marketingBig data's impact on online marketing
Big data's impact on online marketingPros Global Inc
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
 
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...Dana Gardner
 

Semelhante a BEGINNER'S GUIDE TO BIG DATA, ANALYTICS AND CLOUD (20)

Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basics
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
What is big data
What is big dataWhat is big data
What is big data
 
Intro to big data and applications - day 1
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021
 
Big Data
Big DataBig Data
Big Data
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
 
10 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 201710 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 2017
 
Big data's impact on online marketing
Big data's impact on online marketingBig data's impact on online marketing
Big data's impact on online marketing
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
Agnostic Tool Chain Key to Fixing the Broken State of Data and Information Ma...
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 

Mais de Rajagopalan V

Customer recollect and reconnect - a simple CRM story
Customer recollect and reconnect - a simple CRM storyCustomer recollect and reconnect - a simple CRM story
Customer recollect and reconnect - a simple CRM storyRajagopalan V
 
Knowledge Hierarchy leading to Creativity
Knowledge Hierarchy leading to CreativityKnowledge Hierarchy leading to Creativity
Knowledge Hierarchy leading to CreativityRajagopalan V
 
Journey from a Startup to an enterprise
Journey from a Startup  to an enterpriseJourney from a Startup  to an enterprise
Journey from a Startup to an enterpriseRajagopalan V
 
Why are the Japanese not getting a handle on india
Why are the Japanese not getting a handle on indiaWhy are the Japanese not getting a handle on india
Why are the Japanese not getting a handle on indiaRajagopalan V
 
Value centric organization 1
Value centric organization 1Value centric organization 1
Value centric organization 1Rajagopalan V
 
Yoga - Basic tips for an avid practitioner
Yoga - Basic tips for an avid practitionerYoga - Basic tips for an avid practitioner
Yoga - Basic tips for an avid practitionerRajagopalan V
 
E-Retail in India - assessment today, boom or kaboom?
E-Retail in India - assessment today, boom or kaboom?E-Retail in India - assessment today, boom or kaboom?
E-Retail in India - assessment today, boom or kaboom?Rajagopalan V
 
Incremental innovations are good enough
Incremental innovations are good enoughIncremental innovations are good enough
Incremental innovations are good enoughRajagopalan V
 
Ipl6 statistics and team details summary.
Ipl6 statistics and team details summary.Ipl6 statistics and team details summary.
Ipl6 statistics and team details summary.Rajagopalan V
 

Mais de Rajagopalan V (9)

Customer recollect and reconnect - a simple CRM story
Customer recollect and reconnect - a simple CRM storyCustomer recollect and reconnect - a simple CRM story
Customer recollect and reconnect - a simple CRM story
 
Knowledge Hierarchy leading to Creativity
Knowledge Hierarchy leading to CreativityKnowledge Hierarchy leading to Creativity
Knowledge Hierarchy leading to Creativity
 
Journey from a Startup to an enterprise
Journey from a Startup  to an enterpriseJourney from a Startup  to an enterprise
Journey from a Startup to an enterprise
 
Why are the Japanese not getting a handle on india
Why are the Japanese not getting a handle on indiaWhy are the Japanese not getting a handle on india
Why are the Japanese not getting a handle on india
 
Value centric organization 1
Value centric organization 1Value centric organization 1
Value centric organization 1
 
Yoga - Basic tips for an avid practitioner
Yoga - Basic tips for an avid practitionerYoga - Basic tips for an avid practitioner
Yoga - Basic tips for an avid practitioner
 
E-Retail in India - assessment today, boom or kaboom?
E-Retail in India - assessment today, boom or kaboom?E-Retail in India - assessment today, boom or kaboom?
E-Retail in India - assessment today, boom or kaboom?
 
Incremental innovations are good enough
Incremental innovations are good enoughIncremental innovations are good enough
Incremental innovations are good enough
 
Ipl6 statistics and team details summary.
Ipl6 statistics and team details summary.Ipl6 statistics and team details summary.
Ipl6 statistics and team details summary.
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

BEGINNER'S GUIDE TO BIG DATA, ANALYTICS AND CLOUD

  • 1. A Beginner’s guide to Big Data, Analytics and Cloud. I have been hearing a lot about these buzz words in the title for about couple of years now, and luckily had couple of opportunities to work on them over the past 6 months or so, thanks to my own consultancy, which made me read a few books and articles on these subjects that got me ticking to know more. With certainly no claim to being an expert in these areas, I have managed to gain some fair understanding that I thought would share here purely for education purposes, so that everyone who hears these buzzwords knows what it is all about and can manage to have a good conversation around it. I must definitely thank the references below as most of the writing here is just a highly edited summation of the details found in them, to keep your reading more at the layman level. Consider this as a crisp primer for the uninitiated from both the technology and the consumption angle. Business Process Outsourcing (BPO) that made India its right-sourcing capital during the first decade of this century has slowly moved on to the shores of Philippines and Vietnam. Now it seems that everyone is talking about the Analytics and Cloud wave to have hit India, either through the typical right sourcing to analytics companies here or as a captive analytics center for some big multinational company. Certainly I see the value being created in this wave to be higher than the BPO wave, and looks like India can establish its credential using its early mover advantage. Lots of big names like HCL and IBM have got major contracts to maintain the entire IT departments of the Fortune 500 companies on a cloud model and this is a growing area. Digital economy(sales) is ready to surpass physical economy. Nowadays, all organizations are asking what their customers want and what do they generally do? With your private information on any social media not exactly private as you think, they want to know who your friends are and what do they like? Who influences you and whom do you influence? They have large quantities of the data in the world to analyze these information from and design a product or a feature to a product that you are bound to be happy with. Companies are thinking on their feet, in real time, very quick to react to the feedback you have given them… at least good companies strive to do this and are placing their bets in this direction. It is no more about a group of customers or a cross section of people that they want to study, but they want to know YOU as a customer individually. Oh how lovely you feel! Analytics is being used in both B2C and B2B but the former is more challenging than the latter because predicting end consumer’s behavior to buy, which is usually emotional and irregular, is touch. Businesses buy or consume in a more regular and rational fashion using usually a well-known process. What makes the B2C modelling much harder is the fact the data here is more complex due to its volume and variants as more than half the data is ‘unstructured’. Definitions This Big DATA is just ‘data that is quite large that cannot be processed by conventional methods’. McKinsey defines Big Data as large data sets that cannot be captured or analyzed by typical database software tools. So, Today’s Big Data may not be tomorrow’s Big Data as the tools would have caught up to analyze today’s Big Data tomorrow but Big Data of tomorrow would be orders of magnitude higher than today’s data so that the same problem remains. Hence if it safe to say that we are just at the
  • 2. beginning of an explosion of a DATA world. Big Data is all about the Internet of Things, social and mobile put together. The industry has defined Big Data across three V characteristics: Volume, Variety, Velocity and sometimes a fourth gets added – Veracity. The volume is measured by the sheer size of the data, the variety talks about the assortment of data (structured vs unstructured) and the velocity about the speed at this data get created or processed. Veracity is the one that talks about the accuracy of these huge data, trust behind these data sources and how to take off the noise to arrive at decent useful information that makes good business sense. The source of data can be either machine generated like sensor data, web log data, transaction data etc. which are structured and satellite images, scientific data, multimedia which are unstructured data, or human generated like survey input data, click-stream data, etc. which are structured data and emails, social media data, SMS etc. which are unstructured data. Each one of these unstructured data can be an analytics domain independently, like text analytics, and lots of research is going into them. Usually structured data are stored in some sort of a table in a RDBMS and can be queried through an SQL. Traditionally we know of only the ‘structured’ data – the ones that can put into a database. For the past few years, thanks to the explosion of social media and smart phones, we have ‘unstructured’ data in the form of text(emails)/SMS, multimedia (audio, video), (A)GPS and other location based data, data from sensors , etc. that seems to be imploding daily. These ‘unstructured’ data are the ones that are becoming to be less private because you like to share them across the social platforms and the corporations want to have a strong direct relationship with you based on these data. They want to do everything they can to acquire new customers and retain and cross-sell to existing customers. If you sneeze, the corporations catch a cold – this is how close they get to you. Analytics is the way in which corporates handle these complexities and speed in data to arrive at a business value that gives them the competitive advantage. Analytics is just an interface between these large data and the business model. It uses mathematics to derive meaning from data. Most of the analytics has its roots to Google, Yahoo and Amazon who are considered pioneers in these and the technology being used. In the earlier days, they just used to work on ‘samples’ or a smaller subset of the data, discarding all the outliers, and do some predictions. Nowadays, with the availability of affordable storage, networking and computing power and even pay-as-you-use models, all the generated data gets analyzed to arrive at deeper and broader insights. Since all the decisions are getting to be more data- centric, it is imperative there is proper transformation and cultural change across the corporation in terms of all the people, the process and strong leadership Big Data analytics have moved from being descriptive (based on past information using statistics – Business Intelligence to understand what happened) to inquisitive analytics (why it happened) to being predictive (used past information to predict future outcomes- Data mining and forecasting for what is likely to happen) to being prescriptive (used past information to direct future results – optimization to arrive what should happen). The world has moved from models created by small ‘samples’ to using ALL the data to create more complex models and simulate evolving scenarios. All these outcomes of information management in the form of reports, dashboards or animated visualization gets up-levelled
  • 3. to the senior leadership team to arrive at some qualified decisions which becomes the baseline for the way ahead for corporations. The talent that is required to do all these modelling are essentially a combination of data scientists with good maths, statistics and technology background and business managers with good economics, behavioral science and social skills. Cloud is just a means to provide shared computing resources that are pay-as-you-go and in the IT jargon, it is often referred to as XaaS where X can stand for I or H, P, S, etc. IT services are seen as utilities and one pays only for the time the resources are being used, hence cloud is also referred to as Utility computing. Infrastructure as a Service (IaaS, Hardware as a Service – HaaS) is the most common of all cloud services that delivers all computing resources on a rental basis, Platform as a Service (PaaS) is a means by which tools and middleware gets integrated with IaaS to provide a comprehensive consistent platform, and Software as a Service (Saas) is an application that gets created and hosted by the developer in a multi-client mode and will sit on top of a PaaS or a IaaS. Cloud, be it private which means owned and operated by the organization itself or public which means owned and operated by a vendor or hybrid which is a combination of both Private and Public, is essential for Big Data. Examples of Iaas would be Amazon EC2-cloud Compute service and Rightscale, of PaaS would be Microsoft Azure and of SaaS would be a CRM like Salesforce.com. Google has also introduced Data as a Service (DaaS) where one can use the cloud to store and retrieve data. Cloud computing still has some nagging issues of security, privacy and standardization (or lack thereof) which are slowly falling in place, and the old IT organization and the CIO roles are getting transformed taking this new paradigm into effect. Technology There are many Big Data technologies being used but the most common today is the Apache Hadoop framework which is an open-source platform for both storage and processing of all data variants. The two critical components of Hadoop are the Distributed file Systems (HDFS) used for storage and the Map Reduce which does the analysis on the data, both in the distributed sense. MapReduce was designed originally by Google that distributes the problem and later aggregates the result in batch mode. Google developed Big Table as their distributed storage system from where Hadoop derived the HDFS. Hardware, networking and storage have become more affordable now and are constantly getting cheaper to enable distributed computing in a big way. Cloud gives you all these through subscription based service, with no upfront capital or maintenance costs. Open source software is key and was made prevalent by Google through its Android mobile OS and is the key forward for any new technologies to be embraced quickly – the eco-system builds up around this open source efficiently and quickly, thus able to deliver all sorts of solutions for a very low cost. The smaller companies seem to be more agile in delivering a solution for a customer need than the big software vendors and this is creating competition where size does not matter. The software has moved from a classic licensing model to a royalty based model to an annual fee based model thereby benefiting the end user who always has the latest updated version to work with.
  • 4. Distributed computing is a fundamental technology that allows independent computing resources to be networked seamlessly together across a huge geographical area to make it look like one single coherent environment. Computing resources that are being shared can include computing entities to memory to networks to storage, but they all have to work together to execute a program. Over the years, distributed computing has evolved from mainframe computing where there was a large computer using multiple processors with massive IO operations used for batch and transactional processing, to Cluster computing where several cheap commodity machines were connected by a high bandwidth network and controlled by specific software tool for parallel computing, to Grid computing which is an evolution of clusters where the grids are actually an aggregation of geographically dispersed clusters connected by Internet and users can ‘consume’ resources just like any other utility. Distributed computing can be regarded as a super set of parallel computing, the latter implying a tightly coupled system of mostly homogenous components sharing the same physical memory or shared memory. Distributed computing encompasses all architectures that use heterogeneous computing elements not necessarily co-located. The differences between these two types are getting blurred as these two terms indeed gets used loosely to mean the same thing – both are used to perform multiple activities in parallel. Since in Big Data, the data complexity is high due to its volume, variants and distribution, and the computational complexity may also be high, distributed heterogeneous computing fits well for statistical models, and simulations. Cloud technologies support Big Data well by providing large computing resources on demand, providing large storage for keeping these large data and providing frameworks for optimized processing of large amount of data. The foundation of cloud computing is Virtualization that separates the resources and services from the underlying physical system- here again, this logical split can happen at the server end through a thin software layer inserted into the hardware that contains a virtual machine monitor (VMM) or Hypervisor, at the application level to make it OS independent, at the memory level where the memory gets decoupled from the server, at the networking level through a SW that just makes a pool of connectivity available or at the storage level – this level of abstraction that virtualization gives just provides the relevant information needed and hides the exact details which may not be relevant, and makes applications portable across different hardware and software environment. Although not meaning the same, this software abstraction is more or less similar to the green-font HW machines called ‘XTerm’ used by DEC and SUN during the 1980s that front-ended for their servers there were at the back for computing. The most common technologies used here are Xen, VmWare and Microsoft Hyper-V. Applications Analytics has become prevalent in some key areas now and is slowly changing the way we do business: Financial-Banking and Insurance – perhaps the prevalent users of analytics and early adopters as well Credit Card Fraud: The transaction record of the customer is validated against the customer records and his/her past transactions, their travel schedules (getting access to travel sites from where they did the booking) and place of transaction to identify if there is any abnormal activity, as they are transacting in real-time. There are certain rules set for each customer based on
  • 5. his/her history that the transaction gets checked against. If some transaction is believed to be ‘suspicious’, then more ‘verification’ process is added to the transaction to make it more secure. Credit Risk analysis: Banks wants to play safe to ensure they can retrieve the loan from their customers – they look at past credit history against your name to see if you are a ‘safe bet’. Thanks to the credit rating agency like Crisil which does this as their main line of business, the information of all credit transaction of all kind is available to the banks and loan-giver to verify the details and distribute a loan or give you a credit card or line of credit. Insurance Risk analysis: Right now, your vehicle insurance premium is based on the city you live, the risk of the neighborhood you are in, and your driving points against you and prior claims made. In the USA, few insurance companies are generating the premium based on INDIVIDUAL customers and customized to them as a pay-as-you-drive insurance policy. The onboard telematics sends feeds to your insurer on your braking and acceleration habits, distance you travel, and the roads you frequently travel on (using GPS) – thanks to these various sensorial data, higher premium is charged for more irresponsible driving. This in turn serves both purposes – makes insurance companies more profitable and also betters one’s driving habits. A shining example of not only where the ‘rubber meets the road’ but also where the ‘engine meets the wallet’! Healthcare The biggest bang for the buck for analytics, in my opinion, would be in two areas – healthcare which impacts everyone’s life, and in retail to understand customers better. Healthcare comes today at a cost and is heavily dependent on the facilities of the hospitals or clinics you are getting the treatment in, and the knowledge of the doctor attending to you. Healthcare is one critical industry like power where the government needs to ensure it is affordable to all its citizens, and at the same time must be the best available there is to all. For all this to happen, a good start would be a health record of the patient available electronically across the nation and the globe. This would carry a history of ailments, conditions, surgery and medications of the patient and the regular health check-up results – this is the Electronic Health Record (EHR) available in the US and other countries. The second would be the availability of all clinical trials that are in process or already FDA-approved, side-effects data of all medications, common diseases data prevalent in certain parts of the world and definitely the insurance data of the individual. With these two together, any doctor from anywhere in the world can give guidance to the best and optimum cure and care for the patient, best medicine from any pharma company for a particular condition, and the best insurance plan for an individual and his family based on the risks they carry. Data drives most of these integrated decisions now, along with the doctor’s experience to suggest a remedy – compare it to the yesteryears where the former data would not be available. This also further progress into tele- medicine where a solder injured in the battle is in an operation tent with medical gadgets streaming data to experienced doctors sitting elsewhere to guide the surgery procedure and to have him get out of danger quickly.
  • 6. Retail All your purchase patterns and transactions are being collected and analyzed carefully to send you targeted advertisements with e-coupons, to aid companies do location based marketing, to help companies get data on leaving customers and where they are going to and why, in managing the effectiveness of an ad campaign, and in knowing details of acquired customers to improve cross-selling. The better they know the customers, the better would be their sales in an industry with thin margins. The other areas in retails where big data analytics is already in use are in inventory management, logistics optimization, merchandize assortment and pricing optimization, fraud and loss prevention and vendor rationalization. Classing examples of analytics are Amazon “you may also want” prompts and Netflix “what your friends thought” of movie suggestion, both of which shows good results for the retailer. Travel Many of the travel sites collects the log files from all the searches made by the users, and based on your desired preferences will strive to increase their bookings ratio. They would also have data from the text analytics report from your TripAdvisor reviews and based on what you like and do not like, and based on your past history on their site and other sites, will be able to give out optimized flight and hotel options taking together the inputs you had given based on budgets and time. Transport Volvo along with Sweden’s Transportation department is using cloud service for car-to-car communication to warn the drivers ahead of icy and slippery roads, thus making safety a priority. They collect the data from the sensors (ESPs) fitted inside their cars - ESP stabilizes the car as well as sends signals of hazardous road conditions through the mobile network to the cloud. This real-time information is shared with the cars behind that are to use the same road so that they are pre-warned about the actual condition of the road and this information compliments any blanket weather warning that the drivers automatically get updated on. Media Major part of advertising is the reach and conversion that one gets through any forms of media, be it mobile, TV, Web or the classic print. Advertising is what brings money to the media houses. Despite the numerous ads that come on any websites, only a few gets clicked and only a small percentage of these clicks actually turns into a purchase. The marketing world is always challenged with how an ad can be more effective so that the hit ratio increases. Now with the digital cable and dish TVs clearly revealing your viewing patterns, your online purchases and shop transactions revealing your buying pattern, with the website having a history of your visits in some format, and with the operator knowing what Value- added services you have enrolled in, and with the world knowing what paper you read, all these combined through analytics would clearly describe a ‘path-to-purchase’ pattern to enable the media
  • 7. houses to focus their ads appropriately. It would not too long before ads stream into your TV or mobile that is customized based on your likes. We already have News websites that customize your viewing page automatically based on your interest as this data is already collected and analyzed based on your previous trips to the website. Pharmaceuticals The business problems that get tackled here through analytics are classified into three buckets:  Sales and Marketing to understand their sales force effectiveness and resource optimization, market assessment and competitive analysis  Research and Development for clinical trials and reporting to FDA, safety analysis for the product, and licensing  Pricing and contracting for inventory and logistics management, and for setting up contracts and buybacks and rebates etc. The other applications that are prevalent, some of which are being used by you daily without being aware they are Cloud based , are Google Docs, Gmail and Yahoo Mail, wearable health devices that has sensors that routinely monitor vital patient data and feeds back to the hospital or doctor who can take action based on any anomalies immediately, gene profiling and protein structure modelling that was done using community cloud from research institutions, use of satellite image processing used by several countries now for natural disaster management, opinion polls during elections, online document storage like Dropbox of iCloud by Apple, all the social networking sites like Facebook and Twitter, online gaming and casino gambling predictions. Transformation in the future How do you feel if some complex tool used by a company predicts your next behavior with reasonable accuracy? How can companies use the data you provide and analyze them to make you BUY? How can healthcare be more focused to your particular problem and provide the best care at the cost you want? How you get the best travel package suited for you and your family based on your likes and dislikes that would enhance the memories of the travel? How can your insurance be tailor made for you based on your own defensive driving habits and your history of no claims? How can the banks give you the best bang for your buck by automatically understanding your financial goals and delivering a better return for you as a privileged customer? How can airlines make you fly with them frequently by enhancing your particular travel experience every time? Big Data and its associated analytics are used to take on each customer as a time and enhance their experience. We can still use the old route and use the 80/20 rule that says that one can easily draw effective 80% of the conclusions and decisions based on the top 20% of the overall customer data. The choice is clear. REFERENCES:
  • 8. (i) Big Data, Big Analytics – Michael Minelli et al , Wiley, 2013 (ii) Big Data for dummies – Judith Hurwitz et al, Wiley, 2013 (iii) Mastering Cloud Computing – Rajkumar Buyya et al, McGraw Hill, 2013 Many thanks to the reviewers of this blog and their valuable feedback – Vishoo, Venki and John, all of them from either analytics or e-commercebackground.