Without the right data management strategy, investments in Internet of Things (IoT) can yield limited results. Apache Hadoop has emerged as a key architectural component that can help make sense of IoT data, enabling never before seen data products and solutions.
IDC predicts the worldwide Internet of Things (IoT) market will grow from $655.8 billion in 2014 to $1.7
trillion in 2020 with a compound annual growth rate (CAGR) of 16.9%. The installed base of IoT
endpoints will grow from 10.3 billion in 2014 to more than 29.5 billion in 2020 with a CAGR of 19.2%.
IDC predicts the worldwide Internet of Things (IoT) market will grow from $655.8 billion in 2014 to $1.7
trillion in 2020 with a compound annual growth rate (CAGR) of 16.9%. The installed base of IoT
endpoints will grow from 10.3 billion in 2014 to more than 29.5 billion in 2020 with a CAGR of 19.2%.
Data is the key to IoT – all of the ability to gain insights out of all of this data
DC’s Digital Universe study predicts the world's data will amount to 44 zettabytes by 2020, 10% of it from the internet of thing.
The amount of data on the planet is set to grow 10-fold in the next six years to 2020 from around 4.4 zettabytes to 44ZB. That’s according to IDC’s annual Digital Universe study, which also predicted that, by 2020, the amount of information produced by machines, the so-called internet of things, will account for about 10% of data on earth.
SLIDE 3: Takeaway — It’s all kind of meaningless unless you can make sense of all that data
POV: Hadoop is a central component for success with IoT
Action (General action step): Re-architect now to prepare for IoT deluge
Benefit: Get ahead of IoT wave, stop talking - start doing
Talking points: While internet of things holds a lot of promise, it’s all for not unless you can actually do something with the data that is the bi-product of IoT itself. The data deluge that comes with it requires a modernized approach to data management infrastructure, one that accounts for the new requirements to store, process and analyze enormous volumes of data. Successful organizations start from the architecture out are generally the most successful. Re-architecting now, at the onset of your IoT journey, will put you in a position to capitalize on all that data and deliver meaningful results. As we’ll discuss throughout this presentation, Hadoop is a central pillar to that modernized information architecture.
Based on a recent report from McKinsey analysis, less than 1 percent of the data being generated by the 30,000 sensors on an offshore oil rig is currently used to make decisions. And of the data that are actually used—for example, in manufacturing automation systems on factory floors—most are used only for real-time control or anomaly detection.
< 1% of data is currently utilized, mostly for anomaly detection or real-time control; more can be used for optimization and prediction algorithms. 99% of the data is not being utilized, analyzed or leveraged for business decision making. ~40% of all data is never stored; remainder is stored locally in silos for a short period but not utilized for business analytics.
A critical challenge is how to utilize the flood gate of data generated by IoT devices for predication & optimization. Knowing what to do with data – predicting a machine failure before it happens.
Where IoT data are being used, they are often used only for anomaly detection or real-time control, rather than for optimization or prediction, which is where much additional value can be derived. For example, in manufacturing, an increasing number of machines are “wired,” but this instrumentation is used primarily to control the tools or to send alarms when it detects something out of tolerance. The data from these tools are often not analyzed (or even collected in a place where they could be analyzed), even though the data could be used to optimize processes and head off disruptions.
Importance of interoperability in generating maximum value from IoT applications. McKinsey estimates that situations in which two or more IoT systems must work together can account for about 40 percent of the total value that can be unlocked by the Internet of Things. For ex. Interoperability would significantly improve performance by combining sensor data from different machines and systems to provide decision makers with an integrated view of performance across an entire factory or oil rig.
Lot of focus on Consumer IoT – around Nest Thermostats & Fitbit wearables and watches but there is huge IoT ecosystem outside of this, and outside of consumer IoT frames.
In fact over 70% of the economic value in IoT will be generated in Industrial or B2B IoT settings
For example Manufacturing is a huge area – where things like predictive maintenance and operations optimization can revolutionize manufacturing. Using real-time data to predict and prevent breakdowns can reduce downtime by 50 percent. Supply Chain Optimization and Real-Time View of the Supply Chain can drive down costs by 50%.
Energy & Utilities:
Smart Buildings: industrial zones, office parks, shopping malls, airports or seaports, IoT can help reduce the cost of energy, and building maintenance by up to 30 percent.”
Healthcare: In Healthcare with connected & real-time monitoring, IoT can enable cutting the costs of chronic disease treatment by as much as 50 percent
Connected cars will enable low speed crashes by 80%
McKinsey estimates that B2B uses can generate nearly 70 percent of potential value enabled by IoT –
Healthcare: IoT consists of technology mash-ups: devices integration, software, networks, and analytics. Two very
different examples underline the spectrum of change within healthcare IoT: from niche to ubiquitous
and from expected to unexpected. The deployment in a UK hospital's pediatric unit of McLaren
Applied Technologies analytics technology, normally used for racing cars, for early warning of patient
deterioration, is an example of how the industry can borrow innovation from other sectors. Google's
ramp up of healthcare activity comes as no surprise; its nanoparticle project, designed to detect early
signs of disease, exemplifies the growing raft of both tech-driven innovation and the special attention
devoted to healthcare.
IoT will be a key component of new mega-vendor healthcare strategies and partnerships. GE
Healthcare's $1bn R&D budget for its campaign against cancer, announced in 2011, marries
advanced cancer diagnostics, molecular imaging, and advanced tech for biopharmaceuticals and
cancer research. Philips' rapprochement with Amazon Web Services (AWS), centered on the use of
AWS's IoT platform to expand the capabilities of its HealthSuite platform, comes hot on the heels on
IBM's launch of Watson Health and its string of healthcare analytics acquisitions. The involvement of
Caterpillar
VIMS – Vehicle Information Management System
This application is an IoT application. It collects sensor data from a subset of the Caterpillar fleet and is used for performance analysis, defect detection, etc. Used mainly for processing for downstream BI and analysis applications.
This use case was originally developed by Cloudera PS and has been extended and supported by Caterpillar
Lockheed Martin
One of the major projects is the Orion Multi-Purpose Crew Vehicle, which is designed for long-duration, human-rated deep space exploration. Orion will transport humans to interplanetary destinations beyond low Earth orbit, such as asteroids, the moon, and eventually Mars, and return them safely back to Earth.
With human lives on the line, along with millions of dollars of highly technical equipment, testing of all the systems and subsystems for the Orion Space systems is long and exhaustive. Simulating the space environment on the ground and testing the system is a lengthy and expensive effort – requiring perfect results for each subsystem and its subsystems. These tests generate hundreds of megabytes per second of telemetry data that in very short time becomes petabytes of data that needs to managed, analyzed, and leveraged to validate healthy functioning of all the systems.
For Orion, the telemetry is produced in a variety of simulation and test environments which includes at least 7 differ labs across the US.
In simulating the real mission environment, test telemetry data is streamed to the testing system. This telemetry data contains mission-critical health information about equipment, and the test’s status. Knowing whether or not the test is progressing correctly can advise the test conductors to make decisions about the continuance or modification of a test scenario – a test that may take weeks to accomplish.
Our work with the Orion Space capsule takes this streaming test data and saves it to a Hadoop-based cluster supporting high data rate ingest. Advanced analytics can be run on the streaming data to check for expected or indeterminate patterns. This method of data analytics for system testing in an online environment opens up new opportunities for the test conductors to significantly reduce the risk of missing critical test parameters. It also creates a highly cost-effective and productive test environment.
Orion’s First Test:Orbited the Earth twice, traveling approximately 3,600 miles above the Earth’s surface
15 times farther than the International Space Station.
Generated more than 80% of the return velocity experienced during a reentry from the moon, which allows engineers to model expected reentries from future missions in deep space.
Orion’s next mission (EM-1) in 2018
2 weeks instead of 4 hours
4 times as many computers
Twice as many instruments
Subsystems that support Human Flight!
One of the fastest growing cities in the UK, Milton Keynes has to support that expansion within local infrastructure constraints, while meeting stretching expenditure and carbon reduction targets. Joining forces with The Open University, BT and other partners, Milton Keynes Council formed a Smart City collaboration to rise to those challenges.
There are around 25,000 parking spaces in Milton Keynes and forecasts suggest that perhaps as many as 12,000 more may be needed by 2020. Brian Matthews, head of transport at Milton Keynes Council, says: “If we don’t act soon, parking in Milton Keynes will become a big problem. But we know that around 7,000 existing spaces are empty at any one time and, in some cases, this is because people don’t know where to find them.” Better utilisation of existing parking spaces will save the Council a substantial sum. “It costs around £15,000 to create a new parking bay,” says Brian Matthews. “If we built new ones when there are 7,000 unused we could be wasting truly significant amounts of money.”
A pilot was launched to manage the use of short-term parking spaces at Milton Keynes railway station. Designed by specialist technology provider Deteq working with BT, it involved installing sensors in each of the parking bays. Bonded to the tarmac, they’re powered by lithium-ion batteries with an over four-year lifespan. Detecting the arrival and departure of a vehicle, the sensors send information wirelessly to lamppost mounted solar powered repeaters. These aggregate the data and transmit it over the internet to the MK Data Hub, which is currently hosted by BT. There it’s processed and the resulting analysis made available on the Milton Keynes Council public information dashboard, as well as via a browser that displays bay status as red (occupied) or green (free) via an overlay to Google maps.
The prize from full deployment will be a capital saving of at least £105 million, with reduced fuel use and vehicle emissions.
30-70% Drop in the price of MEMS sensors in past five years – McKinsey Research
Diverse data types – from intermittent sensor readings of temperature and pressure to real-time location data or streaming live videos for video analytics
Given the flexible, scalable nature of cloud-based infrastructure and the fact that machine data often originates off premises, we expect a lot of IoT data to be stored and processed in the cloud. The ideal
IoT data platform can be deployed either on premise or in a public, hybrid, or private cloud environment. It should be possible to administer the platform via both a web-based interface and API calls.
Gateways collects, aggregates, and optionally processes the data generated by the devices. The gateway can also accept and route commands sent from the backend to the respective device. Gateway is responsible for authenticating and authorizing the devices to participate in the workflow. It ensures secure communication between the devices and the centralized command center. The gateway is capable of dealing with multiple protocols and data formats.
Some Pointers:
Given the flexible, scalable nature of cloud-based infrastructure and the fact that machine data often originates off premises, we expect a lot of IoT data to be stored and processed in the cloud. The ideal
IoT data platform can be deployed either on premise or in a public, hybrid, or private cloud environment. It should be possible to administer the platform via both a web-based interface and API calls.
One of the scarcest resources in many IoT environments is likely to be network bandwidth, either because it is simply not available or because it is expensive. The volume, complexity, and growing geographical dispersal of IoT data calls for a database that is optimized to handle the type of data that IoT devices will generate.
Support for real-time analytics and events
The ability to analyze sensor data in real time is key to putting the information to work. The technology should also specify event-triggers to save the need to manually code threshold checks or event monitoring.
Fast, Easy, Secure
An enterprise data hub can store unlimited data, cost-effectively and reliably, for as long as you need, and lets users access that data in a variety of ways. Data can be collected, stored, processed, explored, modeled, and served in one unified platform. It’s connected to the systems you already rely on.
Cloudera’s enterprise data hub, powered by Apache Hadoop, the popular open source distributed data platform, is differentiated in several crucial areas. We provide:
Leading query performance.
The enterprise management and governance that you require of all of your mission-critical infrastructure.
Comprehensive, transparent, compliance-ready security at the core.
An open source platform that is also built of open standards – projects that are supported by multiple vendors to ensure sustainability, portability, and compatibility.
Our platform runs in your choice of environment, whether on-premises or in the cloud.
===
Cheat Sheet version: Our enterprise data hub is:
One place for unlimited data
Accessible to anyone
Connected to the systems you already depend on
Secure, governed, managed & compliant
Built on open source and open standards
Deployed however you want
Coupled with the support and enablement you need to succeed.
Important Note: Our EDH emphasizes “unified analytics” over “unified data”: It’s not practical or probable that customers will actually unify all their data. Much of it lives in the cloud or on storage (e.g. Isilon), in remote datacenters, is of uncertain value vs. cost of moving it to a hub, or security mandates preclude collocation. We enable customers to gather unlimited data, while bringing diverse processing and analytics to that data.
How Cloudera’s EDH fits into the IoT Ecosystem
Can ingest data from multiple sources including real-time streaming sensor data
You can combine the sensor data with data other internal and external sources to drive business insights
You can deploy EDH on prem (in your data center) or on hybrid cloud environments and still be able to manage it centrally
And you can serve and analyze the data in a number of different ways - integrate it with existing BI solutions, do search or machine learning or integrate it with real time applications
Streaming Ingest – Kafka & Flume plus Data Pipeline Visualization with our partner Streamsets
Kudu – Real Time updates and real-time appends to data – Ideal for streaming data to query data as it lands
Streaming Data Processing - Spark – Cloudera leadership in Spark
Batch data processing – HDFS/ Hbase Capabilities
Centralized Cluster Mgmt – Unified Monitoring & Troubleshooting with Cloudera Manager
Deployment Flexibility - Can take this to Cloud easily (Director) – High Availability
Hybrid Portable Deployment – Deploy a cluster in AWS & Google Cloud and effectively manage the Clusters with the same interface
Security Features – Adding security features for Kafka + Record Service – Unified Access Policies irrespective access frameworks
Build integrations between Kafka and the Gateway to push data back to the sensors
Tesla
Onboard sensors capture 30,000 + signal types from onboard sensors
• Component data – how fast, voltages, temperatures, work performed
• Operational metrics – how many times charging port has been opened / closed, air conditioner operation metrics …
What components and software were installed?
Opower is a Utility Analytics company that provides 360-degree views into energy usage patterns and similar household comparisons to help consumers save energy.
Challenge: With the advent of smart meters and ever-growing utility data streams, Opower recognized the need to capture, store and analyze this data in order to help consumers save energy.
Solution: Opower built an analytical application on Cloudera Enterprise, leveraging Apache HBase, to bring together utility consumption data along with weather data, consumer behavior data, and other disparate sources of information.
Benefit: By pulling together, processing, analyzing, and then presenting information to homeowners, Opower is helping more than four million homes save hundreds of millions of dollars on their energy bills.
Assicurazioni Generali
https://www.michaeljfox.org/foundation/publication-detail.html?id=555&category=7
The Michael J. Fox Foundation and Intel Join Forces to Improve Parkinson's Disease Monitoring and Treatment through Advanced Technologies
August 13,2014
The Michael J. Fox Foundation for Parkinson’s Research (MJFF) and Intel Corporation announced today a collaboration aimed at improving research and treatment for Parkinson’s disease — a neurodegenerative brain disease second only to Alzheimer’s in worldwide prevalence. The collaboration includes a multiphase research study using a new big data analytics platform that detects patterns in participant data collected from wearable technologies used to monitor symptoms. This effort is an important step in enabling researchers and physicians to measure progression of the disease and to speed progress toward breakthroughs in drug development.
“Nearly 200 years after Parkinson’s disease was first described by Dr. James Parkinson in 1817, we are still subjectively measuring Parkinson’s disease largely the same way doctors did then,” said Todd Sherer, PhD, CEO of The Michael J. Fox Foundation. “Data science and wearable computing hold the potential to transform our ability to capture and objectively measure patients’ actual experience of disease, with unprecedented implications for Parkinson’s drug development, diagnosis and treatment.”
“The variability in Parkinson’s symptoms creates unique challenges in monitoring progression of the disease,” said Diane Bryant, senior vice president and general manager of Intel’s Data Center Group. “Emerging technologies can not only create a new paradigm for measurement of Parkinson’s, but as more data is made available to the medical community, it may also point to currently unidentified features of the disease that could lead to new areas of research.”
Tracking an Invisible Enemy
For nearly two decades, researchers have been refining advanced genomics and proteomics techniques to create increasingly sophisticated cellular profiles of Parkinson’s disease pathology. Advances in data collection and analysis now provide the opportunity to expand the value of this wealth of molecular data by correlating it with objective clinical characterization of the disease for use in drug development.
The potential to collect and analyze data from thousands of individuals on measurable features of Parkinson’s, such as slowness of movement, tremor and sleep quality, could enable researchers to assemble a better picture of the clinical progression of Parkinson’s and track its relationship to molecular changes. Wearables can unobtrusively gather and transmit objective, experiential data in real time, 24 hours a day, seven days a week. With this approach, researchers could go from looking at a very small number of data points and burdensome pencil-and-paper patient diaries collected sporadically to analyzing hundreds of readings per second from thousands of patients and attaining a critical mass of data to detect patterns and make new discoveries.
MJFF and Intel initiated a study earlier this year to evaluate the usability and accuracy of wearable devices for tracking agreed physiological features from participants and to develop a big data analytics platform to collect and analyze the data. The participants (16 Parkinson’s patients and nine control volunteers) wore the devices during two clinic visits and at home continuously over four days.
Bret Parker, 46, of New York, is living with Parkinson’s and participated in the study. “I know that many doctors tell their patients to keep a log to track their Parkinson’s,” said Parker. “I am not a compliant patient on that front. I pay attention to my Parkinson’s, but it’s not everything I am all the time. The wearables did that monitoring for me in a way I didn’t even notice, and the study allowed me to take an active role in the process for developing a cure.”
Intel data scientists are now correlating the collected data to clinical observations and patient diaries to gauge the devices’ accuracy, and are developing algorithms to measure symptoms and disease progression.
Later this year, Intel and MJFF plan to launch a new mobile application that enables patients to report their medication intake as well as how they are feeling. The effort is part of the next phase of the study to enable medical researchers to study the effects of medication on motor symptoms via changes detected in sensor data from wearable devices.
Collecting, Storing and Analyzing the Data
To analyze the volume of data — more than 300 observations per second from each patient — Intel developed a big data analytics platform that integrates a number of software components including Cloudera® CDH* — an open-source software platform that collects, stores, and manages data. The data platform is deployed on a cloud infrastructure optimized on Intel® architecture, allowing scientists to focus on research rather than the underlying computing technologies. The platform supports an analytics application developed by Intel to process and detect changes in the data in real time. By detecting anomalies and changes in sensor and other data, the platform can provide researchers with a way to measure the progression of the disease objectively.
In the near future, the platform could store other types of data such as patient, genome and clinical trial data. In addition, the platform could enable other advanced techniques such as machine learning and graph analytics to deliver more accurate predictive models that researchers could use to detect change in disease symptoms. These advances could provide unprecedented insights into the nature of Parkinson’s disease, helping scientists measure the efficacy of new drugs and assisting physicians with prognostic decisions.
Shared Commitment to Open-Access Data
MJFF and Intel share a commitment to increasing the rate of progress made possible by open access to data. The organizations aim to share data with the greater Parkinson’s community of physicians and researchers as well as invite them to submit their own de-identified patient and subject data for analysis. Teams may also choose to contribute de-identified patient data for inclusion in broader, population-scale studies.
The Foundation has previously made de-identified data and bio-samples from its sponsored studies available to qualified researchers, including from individuals with a Parkinson’s-implicated mutation in their LRRK2 gene. MJFF has also opened access to resources from its landmark biomarker study the Parkinson’s Progression Markers Initiative (PPMI) since it launched in 2010. Parkinson’s scientists around the world have downloaded PPMI data more than 235,000 times to date.
Every Hadoop platform gives you scalability and flexibility. Cloudera makes Hadoop fast, easy, and secure.
Trap Questions:
Spark: What matters to you in supporting Spark and Hadoop?
Impala: How many BI users will you have? What additional budget have you allocated for Hive?
Kudu: How do you plan to address operational data warehouse / time series use cases?
Cloudera Navigator Optimizer: How do you know what data should be in Hadoop vs existing systems?
Trap Questions:
Cloudera Manager: How much downtime are you willing to accept during an upgrade? What if your operations tools fail during an outage? What does your team need to debug critical and latent issues?
Cloudera Director: Where is your data being created? How do you plan to manage across environments? Are you prepared to train staff on both Amazon and on-premises Hadoop platforms?
Expert Support: How can a core R&D group simultaneously respond to frequent customer issues and also build a culture of innovation? [only Cloudera has a back-line support team to address issues without bringing in R&D]
Trap Questions:
Navigator Encrypt/KeyTrustee: What is the impact of an information leak from intermediate MR results, log files, etc?
Sentry/RecordService: How are you planning to secure access to sensitive data across Hive and Spark?
Navigator: Do your governance needs extend beyond Hive?
Manager: How will you keep end users from damaging your production environments?