Companies from across sectors are experiencing exponential growth in data as social interactions, rich media and a variety of devices generate new content. A tidal wave... of digital data is getting created through emails, instant messaging, survey videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter and Facebook, and so many other sources.
Human Factors of XR: Using Human Factors to Design XR Systems
Smarter Big Data Strategies
1. Insights
Smarter Big Data Strategies
- Girish Khanzode
Companies from across sectors are experiencing exponential growth in data, thanks to new content generated by social
interactions, rich media and a variety of devices. This vast amount of digital data is getting created through emails, instant
messaging, surveys, videos, images, RFID tags, web text, blogs, geo-location devices, collaboration platforms like Twitter
and Facebook and so many other sources. This data, when combined with in-house legacy data, is a potential goldmine of
opportunity for organizations of all types.
www.infosys.com
2. The scientific and creative analysis of this large complex data in real time can generate deeper insights offering
360-degree perspectives around customer sentiment and behavior. Companies can respond to market trends
dynamically, improve operational efficiencies and gain significant competitive advantage. Smarter analytics,
machine learning and intelligent algorithms can help discover new patterns that can result in identification of
more patterns, and replace intuitive management decision-making with the one driven by facts. With proper
data analysis in place, many companies are now able to answer questions that were never asked before.
Clearly, successful Big Data management can radically transform organizations. A retailer using Big Data
can improve operating margin by more than half; US healthcare can save more than $300 billion every year;
consumer goods and service companies can create fine-grained customer segments in real time, in order
to improve the precision of targeting promotions and advertising; healthcare companies can discover new
treatments faster; and investors can predict stock market events with higher accuracy.
2 | Infosys
3. Challenges of Big Data Initiatives
Although Big Data has its promise, it has its perils too. Companies trying to
leverage it can face significant challenges. Studies indicate that more than
80% of Fortune 500 organizations will fail to take advantage of Big Data
by 2015. Failure on this front represents serious risks to a company and
can disrupt its business. On the contrary, smarter execution can propel the
organization into a new trajectory of growth by attracting more customers,
improving sales margins, introducing newer products and services much faster,
and achieving higher satisfaction levels and loyalty of existing customers.
The size, speed, complexity and diversity of Big Data can push the capabilities
of traditional data management technologies towards an extreme and in most
cases also cause them to fail. This challenge is further compounded by the
need to manage data in the context and in real time. The collection, storage,
processing, analysis and visualization of this data can overwhelm existing IT
infrastructure. Timeliness, privacy and shortage of relevant skillsets are other
impediments in implementation.
Companies could look at the following 10 practical strategies to successfully
leverage Big Data:
Infosys | 3
4. 1 Because data storage costs are falling continuously, companies have a tendency to store excessive data
for future use. However, they need to avoid this practice since the costs of data collection, storage, and
analysis - considering the rapid velocity of data growth - can quickly rise significantly.
Top Business There is a risk of readily available or easily acquirable data becoming the driver of Big Data strategy.
Instead the strategy should be driven by the data’s potential to add value by solving major pain points or
Needs as yielding healthy return on investments, for instance.
Primary Drivers
The first step should be to identify a set of key questions targeted at areas that the company wants to grow.
For example, an online business might be focused on fine grained customer segmentation, cross-selling,
strengthening multi-channel reach or improving its recommendation intelligence, whereas a manufacturing
unit might be looking at improving product designs. These are the type of questions that should drive Big Data
implementation strategies. These questions should further get mapped to a clear set of business requirements,
which are critical to identify timelines and resource needs (like skillset). The implementation process should be
iterative in nature to ensure that it meets the needs of continuously evolving key questions, enables the collection
of the right data and helps garner the intended insights.
2 Big Data projects require different skillsets compared to traditional IT needs. Companies require
data scientists, managers and engineers with expertise in multiple domains like computers, business
operations, machine learning, statistics, analytics, advanced mathematics and visualization tools.
Data scientists should be able to - formulate models and perform data mining, spot patterns and
Criticality of the associations, and create appropriate logic that can process data into business decisions.
Right Skillset Data managers should be conversant with business operations and capable of - asking the right questions for
generating business insights, mapping results to formulate business strategy and creating recommendations.
Data engineers should be able to - design, develop and maintain applications in Big Data environments,
program visualization tools and dashboards, and maintain the infrastructure to perform analytics.
Equally important is creativity and the ability to leverage data to improve business growth.
Since Big Data is a recent phenomenon, there is a shortage of trained and qualified data professionals. This
shortage will continue to exist in the next few years considering the huge demand. Lack of suitable talent will
prove be a major hindrance to Big Data strategy implementation. Supplementing hiring with appropriate training
and redeployment of existing staff can mitigate some part of this risk.
3 In theory, more data means better analysis; however, we live in a real world with limitations where many
challenges such as the cost of data storage, manipulation and computational power rise with volume.
Big Data, helped by the tendency of data to proliferate quickly, can force traditional data platforms to
scale beyond the levels they are not designed for. Beyond petabytes of datasets, current warehouse
Optimizing infrastructures become uneconomical.
Storage Needs It is important to note that more data does not automatically mean higher accuracy and in some cases,
may even introduce noise that can obscure weaker patterns. It also increases the risk of false discoveries
resulting in insights that will not yield positive results.
Growth in computing processing power and drop in memory-capture prices is making it possible to build
data on the fly and process it in-memory. This strategy reduces the need for very large storage capacity. When
storing Big Data, it is also important to remember that replication systems can introduce security vulnerabilities
and RAID at petabyte scale can lead to data loss.
For efficient processing, data must be split and stored in different segments based on its value, sensitivity and
costs involved. The most valuable data should be housed inside the corporate data warehouses, less valuable
data on cheaper commodity storage like Cloud, and the rest should be put within analytical tools. When results
are desired, all of this data can be pulled together dynamically and analysis can be performed on-the-fly. Metadata
needs special attention since it is growing at twice the rate of other data. Companies, especially those in the
starting phases of the Big Data drive, should set up a clear set of rules and guidelines detailing which data should
be retained, archived and how long.
4 | Infosys
5. 4 Due to greater variety and volume, the acquisition of Big Data needs infrastructure capable of
supporting flexible data structures, very high transaction volumes and the ability to process queries
in a distributed environment, along with delivering predictable latency after a query is fired.
While network performance is critical, communication paths increase significantly with the number
Scaling the of nodes in a cluster. The transfer of a larger dataset requires higher networking bandwidths and WAN
Infrastructure optimization technologies.
A multimode cluster using HDFS can create high levels of traffic across the network since Hadoop spreads
the data across the member servers of the cluster. Direct attached storages (DAS) can help create islands of
information that can be processed by analytics applications but impair data and resource sharing with other
servers. While SANs offer better throughput and scalability, local storage is cheaper and performs better overall.
Storage appliances designed for Hadoop and Big Data analytics are another option.
A decision on a Big Data storage solution must take into account space requirements, data growth, frequency
of analytics execution and type of data processed. All these factors coupled with security, allocated budget and
processing time should drive Big Data investments.
5 While collecting Big Data, a significant amount of garbage can creep in. Poor quality data can result
in faulty analysis especially when finding outliers. With massive amounts of data getting generated
from machines and sensors, the potential for pollution in data goes up exponentially driven by factors
Ensuring Data like transmission errors, incorrect device calibrations, inaccurate device measurement methods or poor
device performance under peak loads. Stringent quality control and inspection mechanisms along with
Quality good data governance are critical to reduce data ‘obesity’ and derive insights that are correct
Data typically becomes less valuable to the business as it ages. Conservation policies of data, based on
timelines, can play a significant role in preserving data quality in analysis. In addition, hygiene techniques like
quality maintenance, profiling, standardization, ensuring consistency and integration, along with rules-driven
testing should be part of the Big Data strategy.
6 At the end of the day, the success of Big Data initiatives will be measured by the widespread usage
of analytic applications by business users. It will depend on their ability to easily create data sets
that fit their needs, and their ability to feed these to analytical tools developed as part of the Big Data
Maximizing User initiative, without the help of corporate IT, to build insights in real time.
Adoption Growth and maturity in Cloud and appliances, coupled with the arrival of newer analytics tools, is resulting
in users focusing more on business value than underlying technologies. Important qualities like system
performance, scalability, availability, user experience and manageability will be critical to the adoption
of Big Data applications within the organization. In addition, making these applications accessible from
multiple device types will improve user adoption significantly considering the trend of bring- your-own-
device gaining traction.
Infosys | 5
6. 7
Big Data’s value creation potential depends on users’ ability to seamlessly access data for analysis.
Typically, data like customer records, resides across multiple departments, geographies or silos thereby
Importance of creating obstacles to its sharing and aggregation. This is a problem when companies want to integrate
Data Access external data acquired from third parties with their own corporate data pool to create insights. This lack
of a centralized customer focused view can hinder the organization’s ability to exploit Big Data. An effective
enterprise data access strategy must include interoperable data models, transactional data architectures,
interoperability standards, analytical architecture, security and compliance.
8 Successful Big Data processing is dependent on rapidity of data acquisition and its analysis. Big Data
systems should be quickly adaptable to changing market realities on the ground and not constrained
by traditional long application development cycles that can run for many months and beyond.
Faster Response
to Market Big Data comes in several forms, such as device / sensor and scientific information, bar codes, vehicle
telematics, surgery videos, stock market trades, x-rays, telephonic conversations, contracts, advertisements,
Conditions
spreadsheets, audit trails and so on.
As the type or a source of data changes, it should be easier to adapt implementation to this new data and
such changes should be delivered in shorter duration of two-three month cycles. Overall, the whole philosophy
of analytic solution implementations should be driven by the fact that Big Data will continue to evolve in all
aspects and Big Data applications must be able to respond in the shortest possible time to reap the rewards and
keep the analysis relevant.
9
The management of Big Data typically involves predictive analysis, natural language processing,
image analysis or advanced statistical techniques such as discrete choice modeling and mathematical
optimizations. This requires technologies that are quite different from the traditional ones.
Building
Big Data solutions should focus on processing data in a manner that avoids costly movement of large
Appropriate volumes of data, apart from the need to handle very high data flow rate and a large variety of formats.
Technology
Apache Hadoop is utilized to deliver analytics solutions in distributed and massively parallel environments
Ecosystem
running on a cluster of commodity hardware to filter and capture high-velocity incoming streams while
keeping the data on the original data storage clusters; and providing fault tolerance and scalability. The
Hadoop Distributed File System (HDFS) is commonly deployed for distributed storage of Big Data.
NoSQL databases trade off integrity guarantees with high scalability and are well suited for dynamic data
structures involving heterogeneous data. These database systems can capture all data without categorizing
and parsing, which is useful in the collection and storage of data like social media.
Generally NoSQL solutions are required to combine with SQL solutions in order to meet the manageability and
security requirements of enterprises. Custom MapReduce programs are required for parallel execution on the
distributed data nodes. A tool like Apache Giraph is better suited to fulfill specialized needs like social graph analysis,
because it can extract insight from complicated social relationships for customer marketing and retention campaigns.
However, deriving insights using these new technologies requires significant programming efforts and skills to
interpret the storage logic used and perform analysis. Specialized needs can create new challenges such as the
lack of support for complex query patterns in case of NoSQL databases. Further complications could arise from
the distributed nature of processing along with the demand for results in real-time with context considerations.
The Big Data strategy must pay careful attention to all these aspects while zeroing in on Big Data products and
solutions along with other important factors like their interoperability and standards.
6 | Infosys
7. 10
Big Data is breaking traditional barriers of flow with large amounts of data getting digitized and
traveling across boundaries. This can create issues for data portability, security, privacy, compliance,
intellectual property and liability. With more data getting stored on external Cloud as it is an inexpensive
Avoiding Security alternative, concerns around security and privacy issues are gaining larger proportions.
and Privacy
Pitfalls Since Big Data involves processing customer information, organizations should ensure confidentiality of
personally identifiable and sensitive data. Data protection policies and tools like data masking must be
used to protect personal and corporate sensitive data to avoid costly consequences like loss of customer
and stakeholder faith, brand erosion, liabilities and fines. Data privacy laws differ across countries and Big
Data processing efforts should ensure that these privacy regulations are adhered to.
Big Data analytics is getting so advanced that sometimes it can create insights that the customer is not
aware of. Companies must be careful while issuing personalized recommendations based on analytics of
vast amount of individual data they possess, because in some cases it can make customers uncomfortable.
Organizations should make sensitive data accessible on “need to know” basis and ensure adequate data
security. Companies should deploy tools and technologies like multifactor authentication, VPNs, intranet
firewalls, biometric systems and threat monitoring suites in order to protect valuable data assets. Recent studies
indicate that security breaches cost companies $204 per compromised customer record. Since data can quickly
proliferate or combine easily with other data, and can be used by multiple persons, it is necessary to institute
policies addressing intellectual property issues and liabilities to safeguard the organization.
About the Author
Girish Khanzode
Products & Platforms Innovator for Futuristic Technologies, Infosys
Girish is a veteran in Enterprise Software Product design and development with more than 20 years of professional
experience. He has built and led large product engineering teams to deliver highly complex products in multiple
domains, covering entire product life cycle. Currently, he is engaged in innovating and building the next generation
products and platforms in emerging new technology areas like Enterprise Data Security and Privacy, Collaboration
technologies, Digital Workplace, Social Analytics, Smart Cities, Big Data and Internet of Things. Girish holds M. Tech.
degree in Computer Engineering and a bachelor’s degree in Electrical Engineering.
Infosys | 7