Comparing Sidecar-less Service Mesh from Cilium and Istio
The value of our data
1. The Value of Big Data
Robin Basham,
Director Integrated Audit, Ellie Mae, Inc.
CISA, CGEIT, CRISC, M.Ed, M.IT, VRP, CRP, HISP
Prepared for ISACA SV and IMA Palo Alto
With reference to “Infonomics: The Practice of Information Economics” Doug Laney and
“Big data for the Masses The Unique Challenge of Big Data Integration” A Talend White
Paper
2. AGENDA
The characteristics of
Big Data – It’s just data
◦ Limits and Benefits in use
◦ Why we use Big Data
◦ How we use Big Data
Structured v.
Unstructured Data
◦ Web 3.0
◦ So, is it big or just more
BI?
Overview of new
technologies
◦ Hiring these skills and
creating these skills
◦ Simply, what they do, how
they fit into any solution
◦ Complexity and
interpretation risk, you get
what you pay for
Is Social Data on the
Balance Sheet?
◦ Risks in using social data
◦ Problems caused by
investing in social data
◦ Gartner guidance to
question data on the
balance sheet
2
3. As Director, Integrated Audit at Ellie Mae, is
accountable to creating and using a GRC
program, conducting SOX, SOC, ISMS and
various program specific audits including and
FDIC examination.
As creator of Facilitated Compliance
Management Software (4Point GRC), and
founder of EnterpriseGRC Solutions and
Phoenix Business and Systems Process, Inc.
- ISACA SV Conference Director, an ITPreneurs
partner, and board advisor for Holistic
Information Security Practitioners, provides
Cloud Security & Virtualization Controls
Management training in the San Francisco and
Bay Area. She’s known for successful GRC
implementations, supplying overall design,
development and training to companies ranging
from start up to fortune five hundred. Past
president for the Association for Certified Green
Technology Auditors, ACGTA, a frequent
committee contributor to the ISACA Silicon
Valley Chapter and liaison to the ITSMF SV
chapter, as well as frequent participant in Cloud
Security Alliance local chapter. EnterpriseGRC
Solutions is recently added to the Cloud
Credential Council and is named to the
certification committee of The Holistic
Information Security Practitioner Institute
(HISPI). EnterpriseGRC Solutions® is an active
sponsor to Information Systems Audit and
Control Association, ISACA®, listed as
corporate sponsor and many time CobiT®
trainer for the ITGI.
Visit http://enterprisegrc.com
Robin Basham, M.ED, M.IT, CISA, CGEIT, CRISC,
ACC, CRP, VRP, and HISP, founder EnterpriseGRC
Solutions Inc.®
Director Integrated Audit, Ellie Mae, Inc.
24. Cloud will create 14 Millions Jobs by
2014
Twitter
- tweet
Digg
LinkedIn
Share
Login
Questionnaire
Like
Bury
Login
Login
Login
Digg
Like
New
Threats
New
Fraud
New
Markets
35. Risks in Life Logging - ENISA
R1 – Breach of privacy
R2 – Inappropriate secondary use of data
R3 – Malicious attacks on smart devices
increase as their value to authenticate
individuals and store personal data increases
R4 – Compliance with and enforcement of
data protection legislation made more difficult
R5 – Discrimination and exclusion
R6 – Monitoring, cyber-stalking, child
grooming and “friendly” surveillance
R7 – Unanticipated changes in citizens’
behavior and creation of an “obedient” citizen
R8 – Poor decision making / inability to make
decisions
R9 – Psychological harm
R10 – Physical theft of property or private
information from home environment
R11 – Reduction of choices available to
individuals as consumers and user lock-in
R12 – Decrease of productivity
To log or not to log? - Risks and benefits
of emerging life-logging applications
http://www.enisa.europa.eu/activities/risk-management/emerging-and-future-
risk/deliverables/life-logging-risk-assessment/to-log-or-not-to-log-risks-and-benefits-of-
emerging-life-logging-applications
Audience is told to gain more insight from the source document. http://info.talend.com
2.1. Recommendation Engine
For years, organizations such as Amazon, Facebook and Google have used recommendation engines to match and recommend products, people and
advertisements to users based on analysis of user profile and behavioral data. These problems were some of the first tackled by big data and have helped
develop the technology into what it is today.
2.2. Marketing Campaign Analysis
The more information made available to a marketer the more granular targets can be identified and messaged. Big data is used to analyze massive amounts of data that was just not possible with traditional relational solutions. They are now able to better identify a target audience and identify the “right” person for the “right” products and service offerings. Big Data allows marketing teams to evaluate large volumes from new data sources, like click-stream data and call detail records, to increase the accuracy of analysis.
2.3. Customer Retention and Churn Analysis
An increase in products per customer typically equates to reduce churn and many organizations have large-scale efforts to improve this key performance indicator. However, analysis of customers and products across lines of business is often difficult as formats and governance issues restrict these efforts. Some enterprises are able to load this data into a Hadoop cluster to perform wide scale analysis and identify patterns that indicate which customers are most likely to leave for a competing vendor or better yet, which customers are more likely to expand their relationship with the company. Action can then be taken to save or incent these customers.
2.4. Social Graph Analysis
There are users and there are “super” users in any social network or community and it is difficult to identify these key influencers within these groups. With big
data, social networking data is mined to identify the participants that pose the most influence over others inside social networks. This helps enterprises ascertain
the “most important” customers, who may or may not be the customers with the most products or spend as we traditionally identify with business analytics.
2.5. Capital Markets Analysis
Whether looking for broad economic indicators, specific market indicators, or sentiments concerning a specific company or its stocks, there is a wealth of data
available to analyze in both traditional and new media sources. While basic keyword analysis and entity extraction have been in use for years, the combination
of this old data with new sources such as Twitter and other social media sources provide great detail about public opinion… in near real-time. Today, most
financial institutions are using some sort of sentiment analysis to gauge public opinion about their company, market, or the economy as a whole.
2.6. Predictive Analytics
Within capital markets, analysts have used advanced algorithms for correlations and probability calculations against current and historical data to predict markets
as standard practice. The large amounts of historical market data, and the speed at which new data needs to be evaluated (e.g. complex derivatives valuations)
make this a big data problem. The ability to perform these calculations faster and on commodity hardware makes big data a reliable substitute for the relatively slow
and expensive legacy approach.
2.7. Risk Management
Advanced, aggressive organizations seek to mitigate risk with continuous risk management and broader analysis of risk factors across wider sets of data.
Further, there is mounting pressure to increase the speed at which this is analyzed despite a growing volume of data. Big data technologies are growing popularity to
solve this issue as they parallelize data access and computation. Whether it is cross-party analysis or the integration of risk and finance, risk-adjusted returns and
P&L require that growing amounts of data be integrated from multiple, standalone departments across the firm, and accessed and analyzed on the fly.
2.8. Rogue Trading
Deep analytics that correlate accounting data with position tracking and order management systems can provide valuable insights that are not available using
traditional data management tools. In order to identify these issues, an immense amount of near real time data needs to be crunched from multiple, inconsistent
sources. This computationally intense function can now be accomplished using big data technologies.
2.9. Fraud Detection
Correlating data from multiple, unrelated sources has the potential to catch fraudulent activities. Consider for instance the potential of correlating Point
How do we establish ROI?
Should we be defining one side of data, as opposed to a two way communication?
Strategy by customer segment has implications to privacy and identity
The need to qualify social data is the basis of entirely new services
If those services are new, how will we validate control over the interpretation of those results?
How do we establish ROI?
Should we be defining one side of data, as opposed to a two way communication?
Strategy by customer segment has implications to privacy and identity
The need to qualify social data is the basis of entirely new services
If those services are new, how will we validate control over the interpretation of those results?
MapReduce as a frameworkMapReduce enables big data technology such as Hadoop to function. For instance, the Hadoop File System (HDFS) uses these components to persist data, execute functions against it and then find results. The NoSQL databases, such as MongoDB and Cassandra use the functions to store and retrieve data for the respective services. Hive uses this framework as the baseline for a data warehouse.How Hadoop WorksHadoop was born because existing approaches were inadequate to process huge amounts of data. Hadoop was built to address the challenge of indexing the entire World Wide Web every day. Google developed a paradigm called MapReduce in 2004, and Yahoo! Eventually started Hadoop as an implementation of MapReduce in 2005 and released it as an open source project in 2007. Much like any other operating system, Hadoop has the basic constructs needed to perform computing: It has a file system, a language to write programs, a way of managing the distribution of those programs over a distributed cluster, and a way of accepting the results of those programs. Ultimately the goal is to create a single result set.With Hadoop, big data is distributed into pieces that are spread over a series of nodes running on commodity hardware. In this structure the data is also replicated several times on different nodes to secure against node failure. The data is not organized into the relational rows and columns as expected in traditional persistence. This lends to the ability to store structured, semi-structured and unstructured content.There are four types of nodes involved within HDFS. They are: Name Node: a facilitator that provides information on the location of data. It knows which nodes are available, where in the cluster certain data resides, and which nodes have failed. Secondary Node: a backup to the Name Node Job Tracker: coordinates the processing of the data using MapReduce Slave Nodes: store data and take direction from the Job Tracker.A Job Tracker is the entry point for a “map job” or process to be applied to the data. A map job is typically a query written in java and is the first step in the MapReduce process. The Job Tracker asks the name node to identify and locate the necessary data to complete the job. Once it has this information it submits the query to the relevant named nodes. Any required processing of the data occurs within each named node, which provides the massively parallel characteristic of Map Reduce. When the each node has finished processing, it stores the results. The client then initiates a "Reduce" job. The results re then aggregated to determine the “answer” to the original query.. The client then accesses these results on the filesystem and can use them for whatever purpose.PigThe Apache Pig project is a high-level data-flow programming language and execution framework for creating MapReduce programs used with Hadoop. The abstract language for this platform is called Pig Latin and it abstracts the programming into a notation, which makes MapReduce programming similar to that of SQL for RDBMS systems. Pig Latin is extended using UDF (User Defined Functions), which the user can write in Java and then call directly from the language.HiveApache Hive is a data warehouse infrastructure built on top of Hadoop (originally by Facebook) for providing data summarization, ad-hoc query, and analysis of large datasets. It provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. It eases integration with business intelligence and visualization tools.HBaseHBase is a non-relational database that runs on top of the Hadoop file system (HDFS). It is columnar and provides fault-tolerant storage and quick access to large quantities of sparse data. It also adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. It was originally developed by Facebook to serve their messaging systems and is used heavily by eBay as well.HCatalogHCatalog is a table and storage management service for data created using Apache Hadoop. It allows interoperability across data processing tools such as Pig, Map Reduce, Streaming, and Hive and a shared schema and data type mechanism.
FlumeFlume is a system of agents that populate a Hadoop cluster. These agents are deployed across an IT infrastructure and collect data and integrate it back into Hadoop.OozieOozie coordinates jobs written in multiple languages such as Map Reduce, Pig and Hive. It is a workflow system that links these jobs and allows specification of order and dependencies between them.MahoutMahout is a data mining library that implements popular algorithms for clusteringand statistical modeling in MapReduce.SqoopSqoop is a set of data integration tools that allow non-Hadoop data stores to interact with traditional relational databases and data warehouses.NoSQL(Not only SQL)NoSQL refers to a large class of data storage mechanisms that differ significantly from the well-known, traditional relational data stores (RDBMS). These technologies implement their own query language and are typically built on advanced programming structures for key/value relationships, defined objects, tabular methods or tuples. The term is often used to describe the wide range of data stores classified as big data. Some of the major flavors adopted within the big data world today include Cassandra, MongoDB, NuoDB, Couchbase and VoltDB.
This presentation took much longer for far fewer answers than other topics.
My alarm was in the cross usage of my LinkedIn data, which I pay for, and had not considered as information “for sale”.
Just reading the news I have engaged in “Life logging”
Will these jobs be in the US?
I am on the fence about buying a Lenovo too. Was it really “unsafe”? I wonder why?
(America Disclose Act)
If the US Government knows that they can buy this information from Facebook, then they probably don’t need to care if in court they win or lose. The question for us as citizens is, do we want any entity to have this much information? Should the phone companies be able to provide this intel to anyone?
Automation for the purpose of review means that the content is not “opinion”
Yes, I am plugging McAfee. They earned it.
Webutation has responded in writing that they are actively questioning their continued affiliation with WOT
Defamation of character has been a cause of action at law, involves false, defamatory statements made to a third party which cause damages. Some statements can be privileged - immune from suit, i.e. statements made in court cases. Slander is when the statements are oral. When the statements are written, they are libel. Related causes of action include public disclosure of private facts and portrayal in a false light. It is more difficult for public figures to sue for defamation of character because they have thrust themselves into the public eye.
Infonomics: The Practice of Information Economics By Douglas Laney Gartner, Inc.
Today it’s very likely that you and your other business and IT leaders regularly talk about information as one of your most valuable assets. But do you value or manage information like one? Consider your company’s well-honed supply chain and asset management practices for physical assets, or your financial management and reporting discipline. Do you have similar accounting and asset management practices in place for your “information assets?” Not likely, but no worries, few do.
When considering how to put information to work for your organization, it’s important to go beyond thinking and talking about information as an asset, to actually valuing and treating it as one. This is the basis of the new theory and emerging discipline of Infonomics which provides organizations a foundation and methods for quantifying information asset value and formal information asset management practices. Infonomics posits that information should be considered a new asset class in that it has measurable economic value and other properties that qualify it to be accounted for and administered as any other recognized type of asset—and that there are significant strategic, operational and financial reasons for doing so.
The Value of Information
Although information arguably meets accounting standards criteria for an asset, and more specifically, further litmus tests for an intangible asset, it is not found on public companies’ balance sheets. Regardless of what our 75-year old accounting standards dictate, if you’re not quantifying information’s value then you’re not likely to be generating or demonstrating sufficient value from it. Nor are you reaping any of the other potential benefits from quantifying information’s value.
While involving the CFO in valuing your company’s data may be premature, doing so may also assist him or her in demonstrating overall corporate wealth and health to the board and investors. Even non-economic indicators of information value, quality and performance can help IT organizations and businesses set a course for better managing and leveraging information. In fact, organizations that are intent on becoming more information-centric, as well as those that have altogether information-based business models, should make it a critical function to audit the actual and potential value of their information assets.
Why Put a Value on Information?
We generally talk about the concept of information in either purely technical or strictly contextual terms. Information is something to be created, captured, updated, stored, moved, arranged, integrated and ultimately accessed, used (or ignored) and retired. Beyond its technical manifestation, however, information means something. It has context, particularly when applied. It is a message, an event, or a unit of knowledge.
Yet information isn’t actually any of those things. Rather, it is merely symbolic of them — a proxy. While the meaning of information ultimately drives business processes and decisions, it is the increasingly efficient, neat and compact way with which we can technically represent information that allows its near-unfettered flow and accumulation. Therefore, it is both information’s meaning and physical representation that combine to improve business process performance, decision making, and innovation.
Organizations whose business and IT leaders recognize this cycle and the growing importance of information are better positioned to take advantage of it. Information should no longer be seen merely as an operations by-product to be managed, or even as just a business resource to be leveraged, but it should be seen as an enterprise asset to be valued. Leading organizations in nearly every industry — including retail, financial services, manufacturing, life sciences and telecommunications — recognize information’s benefits, sometimes even above some traditional assets, in generating revenue.
Information is a Unique Asset
Business leaders must recognize that there are things in the business world that financial assets can’t buy, physical assets can’t perform, and humans can’t process. In supply chain and customer relationship domains, for example, many businesses would rather forego cash from business partners in lieu of a cache of information. Businesses that do a better job of compiling, managing and making available their information assets are more-valued business partners. Not only are information-based transactions fast becoming a means to avoid the tax man, it is a way to conceal certain business activity from public disclosure and thereby the prying eyes of competitors.
Once the corner store clerk knew his customers’ buying habits, family, financial situation and personal interests. Today, this familiarity must be approximated on a grander scale in a global online information-based marketplace. Information assets and analytics have become the necessary, albeit impersonal, substitute for this personal touch. Only by considering information as a true enterprise asset are organizations better positioned to manage and deploy it with the same discipline as traditional assets — leading to vast improvements in the realized value of information.
Just as with conventional assets, information is increasingly amassed as a resource to generate tangible benefits, although primarily via improved decision making and process performance. However, business leaders are often tripped-up by the common misconception that information only has value when applied — i.e., data has no value when sitting idly in a database. This simply is not so. Just as physical inventory sitting on a shelf in a warehouse has discernible value, so do idle information assets. The difference is between realized value and the accounting definition of an asset’s value—which takes into consideration its probable future economic benefit.
Organizations that treat idle information, or so-called “dark data”, as anything less than having potential economic benefit will find themselves at increased competitive disadvantage. All information has a probable future economic benefit. IT and business leaders need to keep this in mind when considering business strategies and options for acquiring, administering and applying information. This probability varies based on a number of factors, including but not limited to its completeness, accuracy, consistency, timeliness and business process relevance. And like any asset, information’s value depends upon the organization’s capacity to deploy it.
The courts are split and insurers are confounded about recognizing information as a form of property. Recent legal rulings against insurers’ denied claims related to data loss or destruction sometimes have confirmed electronic data as a physical property—and other times not. This has prompted insurers to offer electronic data insurance and/or specifically exclude data in property and casualty policies. In addition to a lack of accounting visibility, many businesses also are at increasing financial and legal risk of having insufficient insurance coverage for data-related misappropriation, mishaps or misconduct.
Where Are Information Assets on the Balance Sheet?
In interviews with three dozen nonfinancial business executives we conducted at industry events over the past year, nearly 80% believe that their company’s information asset value is represented under Goodwill, Other Intangibles or elsewhere on the balance sheet. However, despite meeting all the criteria of an intangible asset, information surprisingly is entirely absent as an asset class on the balance sheet. Even among enterprises whose core business is the buying and selling of information (e.g., TransUnion, Experian, Dun & Bradstreet, IMS Health, A.C. Nielsen and SymphonyIRI Group), information assets are nowhere to be found on their balance sheets.
Public companies are not required to inventory, quantify or assess the value of their information assets. Yet, these assets are either their primary source of revenue generation, or increasingly and materially contribute to their top line. Even intangible assets, such as copyrights, patents and brand, are reported in financial statements. Therefore, the growing disparity between corporate book values and market values is in large part due to the undisclosed value of information assets. Case-in-point is the yawning gap between Facebook’s near $100B market valuation versus its book value of under $7 billion. As a pure information-based business, this suggests that Facebook’s off-balance sheet information assets generated by nearly a billion unassuming, unpaid information workers ostensibly are worth more than $90 billion. This figure represents the investing public’s present-value expectations of Facebook’s ability to monetize this data.
While some companies may claim that information is not possible to quantifiably value, valuation models for other similarly non-depleting balance sheet intangibles are straightforward enough to apply.
Reasons to Acknowledge and Account for Information as an Asset
Regardless of what financial reporting standards may dictate, quantifying the value of information assets offers a range of benefits to your enterprise:
Measuring the Value of Your Information
First is the maxim that “you can’t manage what you don’t measure.” Enterprises regularly invest in data collection, management and access technologies, resources and projects with only a vague estimation of the economic benefit this information will deliver. Without mapping the probable usage of data to actual business processes, projects and technologies can hardly be justified, nor can ROI be computed.
Understanding Your True Information ROI
Similarly, organizations spend significant slices of their IT budget on information management. Only by measuring the comparative value that information delivers before and after this investment can they determine the ROI on information management initiatives. Otherwise these investments are perceived and recorded only as a sunk cost.
Securing Your Information
Consider the millions of dollars organizations spend securing information assets. These solutions should be budgeted in direct relation to the probable economic value of losing or misappropriating data, but are not. They proceed on the broad assumption that information security solutions merely cost less than the probable economic loss over their life span. How does an organization know whether it’s spending too much or too little on information security without quantifying the information’s value?
Influencing Your Corporate Valuation
At some point during a business’ lifetime, it inevitably encounters opportunities to be acquired. The ability to claim and receive a premium valuation based on a formal quantification of information asset value can translate into found money. Conversely, a business with strong skills in monetizing information assets may find that potential acquirees with a wealth of underutilized information assets are a sweet deal.
Assessing Contractual Risks
Most legal contracts fail to specify whether parties are indemnified for the misuse, damage or loss of electronic data. Although some courts have begun to side on the notion of e-data as “tangible property,” protracted litigation is no fun, and assessed damages are still a judicial art. Understanding the value of information that potentially falls under the dominion of any contract can help companies better assess a contract’s value and risks, and to set damage limits.
Borrowing Against Information
Over the next decade, Gartner foresees the potential of information being used to collateralize loans. Particularly for information-centric businesses and others that have demonstrable (if not auditable) valuation models for information assets, the ability to obtain lines of credit against the value of their customer database, for example, will become a viable option. Ultimately “information banks” will emerge to handle information-related safekeeping, access, transactions and investments.
Bartering With Information
Even as early as the 1990s, officials from the Internal Revenue Service were acutely aware of the practice of bartering with information. Nearly two decades later, revenue service organizations around the world still have no answer for this gray market created by information as a currency. The simple act of swiping one’s grocery store loyalty card for free food has evolved into a way of doing business among large enterprises. If information is a currency in this new “infoconomy,” then it should be translatable into common economic terms. Organizations that have valid methods for quantifying their (and others’) information value are in a stronger negotiating position with business partners, and better poised to innovate around information-based bartering than those without such methods.
Selling Information
Information can be directly monetizable as a viable product itself. Just as once the lumber industry discarded shavings but now profits from this byproduct (e.g., artificial logs, mulch and particle board), most enterprises generate a wealth of data that may be more valuable to others than to their own organizations. Packaging the data and testing the market leads to new revenue sources for forward-thinking business leaders. As this practice becomes more prevalent, we anticipate the emergence of a vibrant information marketplace industry for commercial data assets.
For now, information assets cannot be formally recognized according to accounting standards, but in time that may change. Regardless, we believe it is a good practice for businesses to begin internally valuing their information assets so they can manage and wield them more effectively. Supplemental balance sheets that include information asset value can be an effective tool for planning, measuring and demonstrating information management maturity. They will also give CEOs a better overall picture of corporate value; give CFOs a tool for gauging investments and performance; and give CIOs, who manage and help business units brandish information as a strategic asset, a seat at the strategy table.
Accounting treatments aside, if you’re not measuring the actual benefits generated from information assets versus their potential, then you’re in a poor position to recognize and close that gap. That is, you are possibly incurring greater “inventory carrying costs” for information than the economic value they’re generating. Especially in this era of Big Data, that differential manifests as an ever-ominous IT budget line item. But at the same time it represents a massive opportunity and clear impetus to aggressively identify and implement ways to monetize your information assets, both indirectly and directly.
Doug Laney is a research vice president for Gartner, where he covers business analytics solutions and projects, performance management, and data-governance-related issues.
Challenge the ROI
See if there are controllers from any of the top 18 companies in the audience
Replace anonymous with protected. There is no such thing as anonymous.