SlideShare uma empresa Scribd logo
1 de 21
Big Data and Hadoop
Team 2: Stephen Allegretto, Jeffery Daly,
Christopher Rizza, Matthew Urdan
WHAT IS BIG DATA?
WHAT IS HADOOP?
WHAT CAN BIG DATA DO FOR BUSINESS?
Business Transformation
Predictive
Analytics
Understanding
Markets and
Customers
Business
Processes
TRANSFORMATION OF BUSINESS PROCESSES
Big Data
UPS Route
Optimization
Major Cost
Savings
TRANSFORMATION OF BUSINESS UNDERSTANDING
PREDICTIVE ANALYTICS
CHALLENGES TO UTILIZING BIG DATA
DATA CHALLENGES
Redundancy
Quality
ScalabilityAvailability
Discovery
PROCESS CHALLENGES
Data Capture
Must Capture Data
and Make it Useful
to End Users
Data Cleanup
Data Needs to be
Sorted and Cleaned
Up Prior to Analysis
Data Analysis
Analysis Yields
Valuable
Information, but
Must be Shared
SECURITY AND PRIVACY CHALLENGES
“The security challenge for Big Data lies in providing an effective
security model across the life cycle of the process without impeding
Volume, Variety and Velocity or compromising the rest of the
information estate” (Morton, 2014).
3 RISKS TO BIG DATA ASSETS
RISK
Information
Life Cycle
Data
Provenance
Technology
Unknowns
PRIVACY CONCERNS
SO WHERE DOES HADOOP COME IN?
Open Source
Framework
Hadoop
Common
HDFS
Hadoop
YARN
Map Reduce
HADOOP AND DATA PROCESSING
HADOOP AS A BUSINESS
CHALLENGES IN HADOOP UTILIZATION
Implementation
Difficulties
Many
Pieces
Partitioning
Priorities
Difficulty
Balancing
THE FUTURE OF BIG DATA
CONCLUSION
References• All Images used in this presentation are Copyright Free and Fully Licensed from Adobe Stock Images
• Akerkar, Rajendra. Big Data Computing. N.p.: Boca Raton : CRC, n.d. Arnold Bernhard Library Database. Web. 15 Sept. 2015.
• Bappalige, S. (2014, August 26). An introduction to Apache Hadoop for big data.
• Retrieved September 16, 2015, from http://opensource.com/life/14/8/intro-apache-hadoop-big-data
• Bertolucci, J. (2013, November 19). How to explain Hadoop to non-geeks. Retrieved September 16, 2015, from Information Week:
http://www.informationweek.com/big-data/software-platforms/how-to-explain-hadoop-to-non-geeks/d/d-id/899721
• Chen, Min, Shiwen Mao, Yin Zhang, and Victor Chung Ming Leung. Big Data: Related
• Technologies, Challenges and Future Prospects. N.p.: Cham : Springer International : Imprint: Springer, 2014. Arnold Bernhard
Library Database. Web. 15 Sept. 2015.
• Clancy, H. (2015, January 5). Predictive analytics, a potent prescription for health care. Retrieved September 14, 2015, from
Fortune: http://fortune.com/2015/01/05/predictive-analytics-health-care/
• Collins, Keith. "A Quick Guide to the Worst Corporate Hack Attacks." Bloomberg.com. Bloomberg, 18 Mar. 2015. Web. 17 Sept.
2015.
• Davenport, T. H., & Dyche, J. (2013). Big data in big companies. SAS Institute. International Institute for Analytics.
• Duan, L., & Xiong, Y. (2015, March 19). Big data analytics and business analytics. Journal of Management Analytics, 2(1), 1-21.
• IBM. (2015). What is Hadoop? Retrieved September 16, 2015, from IBM: http://www-
01.ibm.com/software/data/infosphere/hadoop/
References• IBM Software. (2015). Making the case for big data and Hadoop in the enterprise. Retrieved September 16, 2015, from IBM:
http://www-01.ibm.com/common/ssi/cgi-
bin/ssialias?subtype=BK&infotype=PM&appname=SWGE_IM_DD_USEN&htmlfid=IMM14161USEN&attachment=IMM14161USEN.
PDF#loaded
• McGinn, J. (2015, February 17). The future of data potential is here. Retrieved September 16, 2015, from IBM Big Data Hub:
http://www.ibmbigdatahub.com/blog/future-data-potential-here
• Morton, John. Big Data: Opportunities and Challenges. N.p.: Swindon : BCS, The Chartered Institute for IT, 2014. Arnold Bernhard
Library Database. Web. 14 Sept. 2015.
• Newman, D. (2015, February 2015). Big Data: Why Facebook Knows Us Better Than Our Therapist. Retrieved September 14, 2015,
from Forbes: http://www.forbes.com/sites/danielnewman/2015/02/24/big-data-why-facebook-knows-us-better-than-our-
therapist/
• Noyes, K. (2014, July 25). The shortest distance between two points? At UPS, it's complicated. Retrieved September 14, 2015, from
Fortune: http://fortune.com/2014/07/25/the-shortest-distance-between-two-points-at-ups-its-complicated/
• Oliver, A. (2015, July 2). Big data, big challenges: Hadoop in the enterprise. Retrieved
• September 16, 2015, from http://www.infoworld.com/article/2943252/application-development/the-challenges-of-deploying-
hadoop-in-the-enterprise.html
• SAS Institute. (2015). What is big data? Retrieved September 16, 2015, from SAS: http://www.sas.com/en_us/insights/big-
data/what-is-big-data.html
• Top 6 Hadoop Vendors providing Big Data Solutions in Open Data Platform. (2015,
• April 8). Retrieved September 16, 2015, from http://www.dezyre.com/article/-top-6-hadoop-vendors-providing-big-data-solutions-
in-open-data-platform/93
• Vera-Baquero, A., Palacios, R. C., Stantchev, V., & Molloy, O. (2015). Leveraging big-data for business process analytics. The
Learning Organization. Emerald Group Publishing Limited.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Big data
Big dataBig data
Big data
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big data
Big dataBig data
Big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
BIG DATA in MARKETING
BIG DATA in MARKETINGBIG DATA in MARKETING
BIG DATA in MARKETING
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Semelhante a Team 2 Big Data Presentation

An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A StudyIRJET Journal
 
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects IJMER
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsHappiest Minds Technologies
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applicationsIJARIIT
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data BSP Media Group
 
Know The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfKnow The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfAnil
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviationranjit banshpal
 
Module 1 the power of data
Module 1 the power of dataModule 1 the power of data
Module 1 the power of datacaniceconsulting
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementSimen Smaaberg
 

Semelhante a Team 2 Big Data Presentation (20)

An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
new.pptx
new.pptxnew.pptx
new.pptx
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
IRJET- Big Data: A Study
IRJET-  	  Big Data: A StudyIRJET-  	  Big Data: A Study
IRJET- Big Data: A Study
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applications
 
Capturing big value in big data
Capturing big value in big data Capturing big value in big data
Capturing big value in big data
 
Know The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdfKnow The What, Why, and How of Big Data_.pdf
Know The What, Why, and How of Big Data_.pdf
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Module 1 the power of data
Module 1 the power of dataModule 1 the power of data
Module 1 the power of data
 
INN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for managementINN530 - Assignment 2, Big data and cloud computing for management
INN530 - Assignment 2, Big data and cloud computing for management
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Team 2 Big Data Presentation

  • 1. Big Data and Hadoop Team 2: Stephen Allegretto, Jeffery Daly, Christopher Rizza, Matthew Urdan
  • 2. WHAT IS BIG DATA?
  • 4. WHAT CAN BIG DATA DO FOR BUSINESS? Business Transformation Predictive Analytics Understanding Markets and Customers Business Processes
  • 5. TRANSFORMATION OF BUSINESS PROCESSES Big Data UPS Route Optimization Major Cost Savings
  • 10. PROCESS CHALLENGES Data Capture Must Capture Data and Make it Useful to End Users Data Cleanup Data Needs to be Sorted and Cleaned Up Prior to Analysis Data Analysis Analysis Yields Valuable Information, but Must be Shared
  • 11. SECURITY AND PRIVACY CHALLENGES “The security challenge for Big Data lies in providing an effective security model across the life cycle of the process without impeding Volume, Variety and Velocity or compromising the rest of the information estate” (Morton, 2014).
  • 12. 3 RISKS TO BIG DATA ASSETS RISK Information Life Cycle Data Provenance Technology Unknowns
  • 14. SO WHERE DOES HADOOP COME IN? Open Source Framework Hadoop Common HDFS Hadoop YARN Map Reduce
  • 15. HADOOP AND DATA PROCESSING
  • 16. HADOOP AS A BUSINESS
  • 17. CHALLENGES IN HADOOP UTILIZATION Implementation Difficulties Many Pieces Partitioning Priorities Difficulty Balancing
  • 18. THE FUTURE OF BIG DATA
  • 20. References• All Images used in this presentation are Copyright Free and Fully Licensed from Adobe Stock Images • Akerkar, Rajendra. Big Data Computing. N.p.: Boca Raton : CRC, n.d. Arnold Bernhard Library Database. Web. 15 Sept. 2015. • Bappalige, S. (2014, August 26). An introduction to Apache Hadoop for big data. • Retrieved September 16, 2015, from http://opensource.com/life/14/8/intro-apache-hadoop-big-data • Bertolucci, J. (2013, November 19). How to explain Hadoop to non-geeks. Retrieved September 16, 2015, from Information Week: http://www.informationweek.com/big-data/software-platforms/how-to-explain-hadoop-to-non-geeks/d/d-id/899721 • Chen, Min, Shiwen Mao, Yin Zhang, and Victor Chung Ming Leung. Big Data: Related • Technologies, Challenges and Future Prospects. N.p.: Cham : Springer International : Imprint: Springer, 2014. Arnold Bernhard Library Database. Web. 15 Sept. 2015. • Clancy, H. (2015, January 5). Predictive analytics, a potent prescription for health care. Retrieved September 14, 2015, from Fortune: http://fortune.com/2015/01/05/predictive-analytics-health-care/ • Collins, Keith. "A Quick Guide to the Worst Corporate Hack Attacks." Bloomberg.com. Bloomberg, 18 Mar. 2015. Web. 17 Sept. 2015. • Davenport, T. H., & Dyche, J. (2013). Big data in big companies. SAS Institute. International Institute for Analytics. • Duan, L., & Xiong, Y. (2015, March 19). Big data analytics and business analytics. Journal of Management Analytics, 2(1), 1-21. • IBM. (2015). What is Hadoop? Retrieved September 16, 2015, from IBM: http://www- 01.ibm.com/software/data/infosphere/hadoop/
  • 21. References• IBM Software. (2015). Making the case for big data and Hadoop in the enterprise. Retrieved September 16, 2015, from IBM: http://www-01.ibm.com/common/ssi/cgi- bin/ssialias?subtype=BK&infotype=PM&appname=SWGE_IM_DD_USEN&htmlfid=IMM14161USEN&attachment=IMM14161USEN. PDF#loaded • McGinn, J. (2015, February 17). The future of data potential is here. Retrieved September 16, 2015, from IBM Big Data Hub: http://www.ibmbigdatahub.com/blog/future-data-potential-here • Morton, John. Big Data: Opportunities and Challenges. N.p.: Swindon : BCS, The Chartered Institute for IT, 2014. Arnold Bernhard Library Database. Web. 14 Sept. 2015. • Newman, D. (2015, February 2015). Big Data: Why Facebook Knows Us Better Than Our Therapist. Retrieved September 14, 2015, from Forbes: http://www.forbes.com/sites/danielnewman/2015/02/24/big-data-why-facebook-knows-us-better-than-our- therapist/ • Noyes, K. (2014, July 25). The shortest distance between two points? At UPS, it's complicated. Retrieved September 14, 2015, from Fortune: http://fortune.com/2014/07/25/the-shortest-distance-between-two-points-at-ups-its-complicated/ • Oliver, A. (2015, July 2). Big data, big challenges: Hadoop in the enterprise. Retrieved • September 16, 2015, from http://www.infoworld.com/article/2943252/application-development/the-challenges-of-deploying- hadoop-in-the-enterprise.html • SAS Institute. (2015). What is big data? Retrieved September 16, 2015, from SAS: http://www.sas.com/en_us/insights/big- data/what-is-big-data.html • Top 6 Hadoop Vendors providing Big Data Solutions in Open Data Platform. (2015, • April 8). Retrieved September 16, 2015, from http://www.dezyre.com/article/-top-6-hadoop-vendors-providing-big-data-solutions- in-open-data-platform/93 • Vera-Baquero, A., Palacios, R. C., Stantchev, V., & Molloy, O. (2015). Leveraging big-data for business process analytics. The Learning Organization. Emerald Group Publishing Limited.

Notas do Editor

  1. The Business world is filled with acronyms and buzzwords, new theories, approaches and technologies. Among the most important being written about and increasingly utilized, however, is Big Data.
  2. “Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. Big Data can be characterized by the 3 Vs: Variety, Volume and Velocity. Big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses” (SAS Institute, 2015, para 1). However, to analyze Big Data requires tremendous computing power. Hadoop makes Big Data accessible.
  3. “Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.” (IBM, 2015, para 1). Through the process of Map Reduce, Hadoop can analyze extremely large data sets quickly because Map Reduce brings the software to the data, rather than the time-consuming process of serving vast amounts of data to the software.   This presentation will examine Big Data and Hadoop in detail, especially as they apply to business applications.
  4. Quite simply, Big Data has the power to transform any business organization in three significant ways. Big Data can transform a company’s business processes. Big Data can transform a company’s understanding of its market and its customers. Finally, Big Data can give any company better forecasting tools through predictive analytics.
  5. Big Data provides opportunities to improve efficiencies on scales not previously envisioned (Vera-Baquero, Palacios, Stantchev, & Molloy, 2015). For example, in a project named Orion, UPS leverages Big Data to create the most efficient route possible for its carrier trucks. (Noyes, 2014). UPS determined that the enormous effort needed to help optimize its routes would pay huge dividends. UPS estimates that a reduction of one mile per day per driver, could save the company as much as $50M annually.
  6. Big Data can transform a company’s understanding of its position in its market and create complete profiles of its customers. Through descriptive analytics applied to Big Data sources, organizations are able to develop a more robust and complete understanding of their market niche. In the past, limited data generation, limited storage capacity, and limitations on processing power forced organizations to sample populations in small doses (Duan & Xiong, 2015). Today we see those historic limits unbounded. For example Facebook examines what used to be disparate and disconnected data points to form a more complete view of the individual (Newman, 2015). Facebook data points include a person’s friends, family, photos, companies liked, posts, comments, shared content and so much more. Facebook uses this information to target advertising to the individual user. Not surprisingly, companies are willing to pay large amounts of money to learn from this complete customer profile.
  7. Another strength of Big Data is predictive analytics. The ability to store and analyze great amounts of data allows companies to learn quickly from past experiences and apply those lessons to future situations (Duan & Xiong, 2015). Amara Health Analytics, for example, is focused on the early diagnosis of sepsis, a disease that is notoriously difficult to identify. Amara looks at the data points from previous diagnoses and looks for commonalities in the data points to predict in real time which patients currently being monitored in hospitals might be at risk and alerts clinicians accordingly (Clancy, 2015).
  8. But before Big Data can transform businesses, the data has to be collected and formatted into a usable form. Consequently, there are three major challenges to the utilization of Big Data: the challenges of data itself, process challenges and management challenges.
  9. Data challenges can range from redundancy issues, data discovery, data quality, data availability, and scalability. All of these challenges can make implementing Big Data difficult. Many databases have a redundancy issue. One key challenge is being able to reduce redundancy throughout the database. Redundancy and data compression can assist in reducing the costs of the entire database in the long run. The problem here lies with being able to notice the redundancy and avoiding them. Redundancies will clutter up a database which will waste money and time. Being able to find available data that is useful can sometimes be a challenge for companies. Once data is found it can be even harder to decipher if it will be beneficial and of good quality. Scalability is another type of challenge with big data. With scalability “the analytical system of big data must support present and future datasets. The analytical algorithm must be able to process increasingly expanding and more complex datasets” (Chen, 2014). Infrastructure limitations affect all companies in some way. As the hardware gets older it will likely become unreliable: “Companies can’t afford to lose data that they gathered in the past years” (Adrian, 2013), so overcoming this challenge is critical.
  10. Process challenges entail the capturing of data and making it useful. Data is useless if it is not interpreted and analyzed. “It can take significant trial and error to find the right model for analysis” (Akerkar, n.d.). “IT specialists say that they spend more time trying to “clean up” the data than they are analyzing it. Sorting and cleaning up data is a challenge that is hardly overcome” (Adrian, 2013). There are ways to help speed up the process but if they are not implemented properly sorting through big data can cause delays to a company. Once the data is outputted it can be difficult to properly share the information in the right manner and with the right people. Being able to get the outputted data to the right people is imperative. Big data provides valuable information that can be beneficial if shared. Sadly a number of companies do not want to share information for reasons other than security. “Regarding companies, this is a challenge that most refuse to overcome” (Adrian, 2013).
  11. A significant challenge for management is being able to secure a company’s data. There are multiple security challenges that data administrators face when protecting large amounts of data. Management has to focus on data security and privacy to ensure databases are used and protected properly. Similar to all other forms of information technology big data is subject to misuse and criminal activity. “Big data can be misused through abuse of privilege by those with access to the data and analysis tools; curiosity may lead to unauthorized access and information may be deliberately leaked. Mistakes can also cause problems where corner-cutting could lead to disclosure or incorrect analysis” (Morton). Managers need to have control over the databases to ensure it is protected against intruders and against unauthorized users. “The security challenge for big data lies in providing an effective security model across the life cycle of the process without impeding volume, variety and velocity or compromising the rest of the information estate” (Morton, 2014).
  12. Three major risks to a company’s big data assets include information life cycle, data provenance, and technology unknowns. The information life cycle is always different when big data is involved. In certain cases the owner of the data might not be known. In other cases it is unknown what type of useful information might be discovered even after analysis. Data provenance is another security concern that managers need to pay attention too. Big data might not be coming from a reliable source. It might be compiled from a number of different areas. “Big data involves absorbing and analyzing large amounts of data that may have originated outside the organization that is using it. If you don’t control the data creation and collection process, then how can you be sure of the data source and the integrity of the data?” (Morton, 2014). The final risk involves information unknowns. The technology that was designed and is in use to process big data is focused on "massive scalability." The main focus has not been security which can lead to problems in the long run. Focusing on security is vital to protecting sensitive data.
  13. Along with the issues of data security come the challenges of data privacy. Databases contain large amounts of personal information. Management has to be able to use the information for their benefit, but at the same time make sure it stays private and away from criminals. “The challenges are: ensuring that data are used correctly (abiding by its intended uses and relevant laws), tracking how the data are used, transformed, derived, etc., and managing its lifecycle” (Akerkar, n.d.). Companies like Home Depot and Target are examples of large institutions that have had sensitive data stolen. These are massive data breaches for large well-known companies. “Many data warehouses contain sensitive data such as personal data. There are legal and ethical concerns with accessing such data. So the data must be secured and access controlled” (Akerkar, n.d.). The way data is accessed and secured needs to be constantly monitored.
  14. Again, Apache Hadoop is “an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware” (Bappalige, 2014). The fact that it is open source means that its code is free and accessible to all programmers and users to edit, comment and improve upon. Hadoop has a framework composed of several modules: 1) Hadoop Common, which contains libraries and utilities needed by other Hadoop modules; 2) Hadoop Distributed File System (HDFS), which stores data on commodity machines across a distributed file system (which are ‘off the shelf’ devices that use large numbers of already-available computing components) that provides high aggregate bandwidth across the cluster; 3) Hadoop YARN, which is a resource-management platform that manages cluster resources and schedules user applications; and 4) Hadoop MapReduce, which is a programming model for large scale data processing (Bappalige, 2014). Overall, Hadoop is a very powerful and versatile tool that can aid in harnessing the challenges related to dealing with large amounts of data clusters across segregated file systems.
  15. At the very basics level, Hadoop is able to store many files that, individually, are larger than an individual PC’s capacity. The benefits this presents for businesses that need to store large amounts of data then are readily apparent. Utilization of Hadoop clusters easily “removes the constraints [companies] had on storing and processing data” (Bertolucci, 2013). So, Hadoop helps to store large amounts of data, but it also is very efficient at processing data. The Bertolucci article references a comparison of when you are trying to open a very large file on a PC and it takes an extremely long time to open. This is because, in most cases, data flows to the software for processing; however, Hadoop brings the software to the data, which allows it to process extremely large amounts of data very quickly. Just these two basic principles of storing and processing data can help a business become more efficient overnight and for very little cost.
  16. Because of the success that Hadoop has had in the market since 2002, a whole new industry has emerged from its creation. Allied Market Research calls this “Hadoop-as-a-Service,” which they anticipate will grow to $50.2 billion by the year 2020 (Top 6 Hadoop Vendors providing Big Data Solutions in Open Data Platform, 2015). Many large companies have actually become Hadoop Vendors, including: Amazon, IBM, and Microsoft. These companies have helped package Hadoop and distribute it among users. For example, IBM Hadoop users can “easily set up and move data to Hadoop clusters in approximately 30 minutes with data processing rates of 60 cents per cluster per hour” (Top 6 Hadoop Vendors providing Big Data Solutions in Open Data Platform, 2015). The benefit of this is that customers are able to get to market at a rapid rate and IBM also incorporates advanced Big Data Analytics by “harnessing the power of Hadoop.”
  17. Despite its extreme growth and increased popularity of Hadoop utilization, Hadoop technology is still in its developmental stages when it comes to management and deployment tools. Additionally, installation and implementation is time consuming. Andrew Oliver, a Strategic Developer, cites four challenges that companies face upon attempting to centralize Hadoop: 1) Hadoop isn’t a single thing, meaning that there are many pieces that make up Hadoop as a whole and each piece is packaged and implemented separately; 2) Diverse workloads makes systems balancing difficult; 3) Partitioning, which presents an issue when differentiating between streaming jobs and batch jobs because they require different levels of service (this can result in the need for multiple Hadoop clusters, which would need to be managed separately); and finally, 4) Priorities, which Oliver explains as a situation where just because your company or organization requires a certain amount of resources, doesn’t guarantee you will receive the resources you need because of the way the database is stored (Oliver, 2015). Overall, there is not a large selection of solutions to these challenges, but they are slowly being developed and will aid in the deployment and maintenance of Hadoop within larger organizations.
  18. We are only just beginning to realize the potential of Big Data. Jennifer McGinn of the IBM Big Data and Analytics Hub hints at the tip of the iceberg of what we will be able to accomplish with Big Data and the analytical capabilities of Hadoop: Nothing will ever be out of stock because companies will be able to better predict what we want and where we want to buy it (McGinn, 2015, para 5). Cars, trucks and equipment won’t breakdown as often because predictive maintenance will tell you when and where to get things fixed before they break (McGinn, 2015, para 5). Roads will be free from pot holes because sensors will know where they are and tell crews to fix them (McGinn, 2015, para 5). The common flu won’t stand a chance of spreading because healthcare workers will be able to track outbreaks and treat them on the spot (McGinn, 2015, para 5).
  19. “Big data—data from many sources, of varying formats, both structured and unstructured—means different things in different industries. But as different as their needs and usage of big data may be, there is one commonality among all industries: the opportunity to plumb big data for better, more informed perspectives on their customers, products, partners, competitors and strategies. As organizations begin to explore the possibilities enabled by big data and analytics, they need new ways to store and access data—fast. Apache Hadoop provides an answer to that challenge” (IBM Software, 2015, p. 3).