SlideShare uma empresa Scribd logo
1 de 7
Baixar para ler offline
Alation Centralizes Enterprise Data Knowledge by
Employing Machine Learning and Crowd Sourcing
Transcript of a sponsored discussion on how Alation makes data actionable by keeping it up-to-
date and accessible using innovative means.
Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett
Packard Enterprise.
Dana Gardner: Hello, and welcome to the next edition of the HPE Discover Podcast Series.
I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this
ongoing discussion on IT innovation -- and how it’s making an impact on
people’s lives.
Our next big-data case study discussion focuses rather on the Tower of
Babel problem for data and how Alation maps across data disparity while
employing machine learning and crowdsourcing to help centralize data
knowledge.
We'll explore how Alation makes data actionable by keeping data up to date
and accessible using such innovative means as experts and systems.
To learn more about how enterprises and small companies alike can access more data for better
analytics, please join me in welcoming our guest. We're here with Stephanie McReynolds, Vice-
President of Marketing at Alation in Redwood City, California. Welcome.
Embed the HPE
Big Data
OEM Software
Stephanie McReynolds: Thank you, Dana. Glad to be here.
Gardner: We're glad to have you. I've heard of crowdsourcing for many things, and machine
learning is more-and-more prominent with big-data activities, but I haven't necessarily seen them
together. How did that come about? How do you, and why do you need to, employ both machine
learning and experts in crowdsourcing?
McReynolds: Traditionally, we've looked at data as a technology problem. At least over the last
5-10 years, we’ve been pretty focused on new systems like Hadoop for storing and processing
larger volumes of data at a lower cost than databases could traditionally support. But what we’ve
overlooked in the focus on technology is the real challenge of how to help organizations use the
data that they have to make decisions. If you look at what happens when organizations go to
Gardner
apply data, there's often a gap between the data we have available and what decision-makers are
actually using to make their decisions.
There was a study that came out within the last couple of years that showed that about 56 percent
of managers have data available to them, but they're not using it . So, there's a human gap there.
Data is available, but managers aren't successfully applying data to business
decisions, and that’s where real return on investment (ROI) always comes
from. Storing the data, that’s just an insurance policy for future use.
The concept of crowdsourcing data, or tapping into experts around the data,
gives us an opportunity to bring humans into the equation of establishing trust
in data. Machine-learning techniques can be used to find patterns and clean
the data. But to really trust data as a foundation for decision making human
experts are needed to add business context and show how data can be used
and applied to solving real business problems.
Gardner: Usually, when you're employing people like that, it can be expensive and doesn't scale
very well. How do you manage the fit-for-purpose approach to crowdsourcing where you're
doing a service for them in terms of getting the information that they need and you want to
evaluate that sort of thing? How do you balance that?
Using human experts
McReynolds: The term  "crowdsourcing" can be interpreted in many ways. The approach that
we’ve taken at Alation is that machine learning actually provides a foundation for tapping into
human experts.
We go out and look at all of the log data in an organization. In particular, what queries are being
used to access data and databases or Hadoop file structures. That creates a
foundation of knowledge so that the machine can learn to identify what data would
be useful to catalog or to enrich with human experts in the
organization. That's essentially a way to prioritize how to tap
into the number of humans that you have available to help
create context around that data.
That’s a great way to partner with machines, to use humans for what they're good for, which is
establishing a lot of context and business perspective, and use machines for what they're good
for, which is cataloging the raw bits and bytes and showing folks where to add value.
Gardner: What are some of the business trends that are driving your customers to seek you out
to accomplish this? What's happening in their environments that requires this unique approach of
the best of machine and crowdsourcing and experts?
McReynolds
McReynolds: There are two broader industry trends that have converged and created a space for
a company like Alation. The first is just the immense volume and variety of data that we have in
our organizations. If it weren’t the case that we're adding additional data storage systems into our
enterprises, there wouldn't be a good groundwork laid for Alation, but I think more interestingly
perhaps is a second trend and that is around self-service business intelligence (BI).
So as we're increasing the number of systems that we're using to store and access data, we're also
putting more weight on typical business users to find value in that data and trying to make that as
self-service a process as possible. That’s created this perfect storm for a system like Alation
which helps catalog all the data in the organization and make it more accessible for humans to
interpret in accurate ways.
Gardner: And we often hear in the big data space the need to scale up to massive amounts, but it
appears that Alation is able to scale down. You can apply these benefits to quite small companies.
How does that work when you're able to help a very small organization with some typical use
cases in that size organization?
McReynolds: Even smaller organizations, or younger organizations, are beginning to drive their
business based on data. Take an organization like Square, which is a great brand name in the
financial services industry, but it’s not a huge organization in and of itself, or Inflection or
Invoice2go, which are also Alation customers.
We have many customers that have data analyst teams that maybe start with 5 people or 20
people. We also have customers like eBay that have closer to a thousand analysts on staff. What
Alation provides to both of those very different sizes of organizations is a centralized place,
where all of the information around their data is stored and made accessible.
Even if you're only collaborating with three to five analysts, you need that ability to share your
queries, to communicate on which queries addressed which business problems, which tables
from your Vertica database were appropriate for that, and maybe what Hive tables on your
Hadoop implementation you could easily join to those Vertica tables. That type of conversation is
just as relevant in a 5-person analytics team as it is in a 1000-person analytics team.
Gardner: Stephanie, if I understand it correctly, you have a fairly horizontal capability that
could apply to almost any company and almost any industry. Is that fair, or is there more
specialization or customization that you apply to make it more valuable, given the type of
company or type of industry?
Generalized technology
McReynolds: The technology itself is a generalized technology. Our founders come from
backgrounds at Google and Apple, companies that have developed very generalized computing
platforms to address big problems. So the way the technology is structured is general.
The organizations that are going to get the most value out of an Alation implementation are those
that are data-driven organizations that have made a strategic investment to use analytics to make
business decisions and incorporate that in the strategic vision for the company.
So even if we're working with very small organizations, they are organizations that make data
and the analysis of data a priority. Today, it’s not every organization out there. Not every mom-
and-pop shop is going to have an Alation instance in their IT organization.
Gardner: Fair enough. Given those organizations that are data-driven, have a real benefit to gain
by doing this well, they also, as I understand it, want to get as much data involved as possible,
regardless of its repository, its type, the silo, the platform, and so forth. What is it that you've had
to do to be able to satisfy that need for disparity and variety across these data types? What was
the challenge for being able to get to all the types of data that you can then apply your value to?
Embed the HPE
Big Data
OEM Software
McReynolds: At Alation, we see the variety of data as a huge asset, rather than a challenge. If
you're going to segment the customers in your organization, every event and every interaction
with those customers becomes relevant to understanding who that individual is and how you
might be able to personalize offerings, marketing campaigns, or product development to those
individuals.
That does put some burden on our organization, as a technology organization, to be able to
connect to lots of different types of databases, file structures, and places where data sits in an
organization.
So we focus on being able to crawl those source systems, whether they're places where data is
stored or whether they're BI applications that use that data to execute queries. A third important
data source for us that may be a bit hidden in some organizations is all the human information
that’s created, the metadata that’s often stored in Wiki pages, business glossaries, or other
documents that describe the data that’s being stored in various locations.
We actually crawl all of those sources and provide an easy way for individuals to use that
information on data within their daily interactions. Typically, our customers are analysts who are
writing SQL queries. All of that context about how to use the data is surfaced to them
automatically by Alation within their query-writing interface so that they can save anywhere
from 20 percent to 50 percent of the time it takes them to write a new query during their day-to-
day jobs.
Gardner: How is your solution architected? Do you take advantage of cloud when appropriate?
Are you mostly on-premises, using your own data centers, some combination, and where might
that head to in the future?
Agnostic system
McReynolds: We're a young company. We were founded about three years ago and we
designed the system to be agnostic as to where you want to run Alation. We have customers who
are running Alation in concert with Redshift in the public cloud. We have customers that are
financial services organizations that have a lot of personally identifiable information (PII) data
and privacy and security concerns, and they are typically running an on-premise Alation
instance.
We architected the system to be able to operate in different environments and have an ability to
catalog data that is both in the cloud and on-premise at the same time.
The way that we do that from an architectural perspective is that we don’t replicate or store data
within Alation systems. We use metadata to point to the location of that data. For any analyst
who's going to run a query from our recommendations, that query is getting pushed down to the
source systems to run on-premise or on the cloud, wherever that data is stored.
Gardner: And how did HP Vertica come to play in that architecture? Did it play a role in the
ability to be agnostic as you describe it?
McReynolds: We use HP Vertica in one portion of our product that allows us to provide
essentially BI on the BI that’s happening. Vertica is used as a fundamental component of our
reporting capability called Alation Forensics that is used by IT teams to find out how queries are
actually being run on data source systems, which backend database tables are being hit most
often, and what that says about the organization and those physical systems.
It gives the IT department insight. Day-to-day Alation is typically more of a business person’s
tool for interacting with data.
Gardner: We've heard from HP that they expect a lot more of that IT department specific ops
efficiency role and use case to grow. Do you have any sense of what some of the benefits have
been from your IT organization to get that sort of analysis? What's the ROI?
McReynolds: The benefits of an approach like Alation include getting insight into the behaviors
of individuals in the organization. What we’ve seen at some of our larger customers is that they
may have dedicated themselves to a data-governance program where they want to document
every database and every table in their system, hundreds of millions of data elements.
Using the Alation system, they were able to identify within days the rank-order priority list of
what they actually need to document, versus what they thought they had to document. The cost
savings comes from taking a very data-driven realistic look at which projects are going to
produce value to a majority of the business audience, and which projects maybe we could hold
off on or spend our resources more wisely.
One team that we were working with found that about 80 percent of their tables hadn't been used
by more than one person in the last two years. In that case, if only one or two people are using
those systems, you don't really need to document those systems. That individual or those two
individuals probably know what's there. Spend your time documenting the 10 percent of the
system that everybody's using and that everyone is going to receive value from.
Where to go next
Gardner: Before we close out, any sense of where Alation could go next? Is there another use
case or application for this combination of crowdsourcing and machine learning, tapping into all
the disparate data that you can and information including the human and tribal knowledge?
Where might you go next in terms of where this is applicable and useful?
McReynolds: If you look at what Alation is doing, it's very similar to what Google did for the
Internet in terms of being available to catalog all of the webpages that were available to
individuals and service them in meaningful ways. That's a huge vision for Alation, and we're just
in the early part of that journey to be honest. We'll continue to move in that direction of being
able to catalog data for an enterprise and make easily searchable, findable, and usable all of the
information that is stored in that organization.
Gardner: Well, very good. I'm afraid we will have to leave it there. We've been examining how
Alation maps across disparate data while employing machine learning and crowdsourcing to help
centralize and identify data knowledge. And we've learned how Alation makes data actionable by
keeping it up-to-date and accessible using innovative means.
Embed the HPE
Big Data
OEM Software
So a big thank you to our guest. We've been joined by Stephanie McReynolds, Vice-President of
Marketing at Alation in Redwood City, California. Thank you so much, Stephanie.
McReynolds: Thank you. It was a pleasure to be here.
Gardner: And a big thank you as well to our audience for joining us for this big data innovation
case study discussion.
I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett
Packard Enterprise.
Transcript of a sponsored discussion on how Alation makes data actionable by keeping it up-to-
date and accessible using innovative means. Copyright Interarbor Solutions, LLC, 2005-2015.
All rights reserved.
You may also be interested in:
	 •	 Intralinks Uses Hybrid Cloud to Blaze a Compliance Trail Across the Regulatory
Minefield of Data Soveriegnty
	 •	 Redmonk analysts on best navigating the tricky path to DevOps adoption
	 •	 DevOps by design--A practical guide to effectively ushering DevOps into any
organization
	 •	 Need for Fast Analytics in Healthcare Spurs Sogeti Converged Solutions Partnership
Model
	 •	 HPE's composable infrastructure sets stage for hybrid market brokering role
	 •	 Nottingham Trent University Elevates Big Data's role to Improving Student Retention in
Higher Education
	 •	 Forrester analyst Kurt Bittner on the inevitability of DevOps
	 •	 Agile on fire: IT enters the new era of 'continuous' everything
	 •	 Big data enables top user experiences and extreme personalization for Intuit TurboTax
	 •	 Feedback loops: The confluence of DevOps and big data
	 •	 IoT brings on development demands that DevOps manages best, say experts
	 •	 Big data generates new insights into what’s happening in the world's tropical ecosystems
	 •	 DevOps and security, a match made in heaven

Mais conteúdo relacionado

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Destaque

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Alation Centralizes Enterprise Data Knowledge by Employing Machine Learning and Crowd Sourcing

  • 1. Alation Centralizes Enterprise Data Knowledge by Employing Machine Learning and Crowd Sourcing Transcript of a sponsored discussion on how Alation makes data actionable by keeping it up-to- date and accessible using innovative means. Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett Packard Enterprise. Dana Gardner: Hello, and welcome to the next edition of the HPE Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation -- and how it’s making an impact on people’s lives. Our next big-data case study discussion focuses rather on the Tower of Babel problem for data and how Alation maps across data disparity while employing machine learning and crowdsourcing to help centralize data knowledge. We'll explore how Alation makes data actionable by keeping data up to date and accessible using such innovative means as experts and systems. To learn more about how enterprises and small companies alike can access more data for better analytics, please join me in welcoming our guest. We're here with Stephanie McReynolds, Vice- President of Marketing at Alation in Redwood City, California. Welcome. Embed the HPE Big Data OEM Software Stephanie McReynolds: Thank you, Dana. Glad to be here. Gardner: We're glad to have you. I've heard of crowdsourcing for many things, and machine learning is more-and-more prominent with big-data activities, but I haven't necessarily seen them together. How did that come about? How do you, and why do you need to, employ both machine learning and experts in crowdsourcing? McReynolds: Traditionally, we've looked at data as a technology problem. At least over the last 5-10 years, we’ve been pretty focused on new systems like Hadoop for storing and processing larger volumes of data at a lower cost than databases could traditionally support. But what we’ve overlooked in the focus on technology is the real challenge of how to help organizations use the data that they have to make decisions. If you look at what happens when organizations go to Gardner
  • 2. apply data, there's often a gap between the data we have available and what decision-makers are actually using to make their decisions. There was a study that came out within the last couple of years that showed that about 56 percent of managers have data available to them, but they're not using it . So, there's a human gap there. Data is available, but managers aren't successfully applying data to business decisions, and that’s where real return on investment (ROI) always comes from. Storing the data, that’s just an insurance policy for future use. The concept of crowdsourcing data, or tapping into experts around the data, gives us an opportunity to bring humans into the equation of establishing trust in data. Machine-learning techniques can be used to find patterns and clean the data. But to really trust data as a foundation for decision making human experts are needed to add business context and show how data can be used and applied to solving real business problems. Gardner: Usually, when you're employing people like that, it can be expensive and doesn't scale very well. How do you manage the fit-for-purpose approach to crowdsourcing where you're doing a service for them in terms of getting the information that they need and you want to evaluate that sort of thing? How do you balance that? Using human experts McReynolds: The term  "crowdsourcing" can be interpreted in many ways. The approach that we’ve taken at Alation is that machine learning actually provides a foundation for tapping into human experts. We go out and look at all of the log data in an organization. In particular, what queries are being used to access data and databases or Hadoop file structures. That creates a foundation of knowledge so that the machine can learn to identify what data would be useful to catalog or to enrich with human experts in the organization. That's essentially a way to prioritize how to tap into the number of humans that you have available to help create context around that data. That’s a great way to partner with machines, to use humans for what they're good for, which is establishing a lot of context and business perspective, and use machines for what they're good for, which is cataloging the raw bits and bytes and showing folks where to add value. Gardner: What are some of the business trends that are driving your customers to seek you out to accomplish this? What's happening in their environments that requires this unique approach of the best of machine and crowdsourcing and experts? McReynolds
  • 3. McReynolds: There are two broader industry trends that have converged and created a space for a company like Alation. The first is just the immense volume and variety of data that we have in our organizations. If it weren’t the case that we're adding additional data storage systems into our enterprises, there wouldn't be a good groundwork laid for Alation, but I think more interestingly perhaps is a second trend and that is around self-service business intelligence (BI). So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible. That’s created this perfect storm for a system like Alation which helps catalog all the data in the organization and make it more accessible for humans to interpret in accurate ways. Gardner: And we often hear in the big data space the need to scale up to massive amounts, but it appears that Alation is able to scale down. You can apply these benefits to quite small companies. How does that work when you're able to help a very small organization with some typical use cases in that size organization? McReynolds: Even smaller organizations, or younger organizations, are beginning to drive their business based on data. Take an organization like Square, which is a great brand name in the financial services industry, but it’s not a huge organization in and of itself, or Inflection or Invoice2go, which are also Alation customers. We have many customers that have data analyst teams that maybe start with 5 people or 20 people. We also have customers like eBay that have closer to a thousand analysts on staff. What Alation provides to both of those very different sizes of organizations is a centralized place, where all of the information around their data is stored and made accessible. Even if you're only collaborating with three to five analysts, you need that ability to share your queries, to communicate on which queries addressed which business problems, which tables from your Vertica database were appropriate for that, and maybe what Hive tables on your Hadoop implementation you could easily join to those Vertica tables. That type of conversation is just as relevant in a 5-person analytics team as it is in a 1000-person analytics team. Gardner: Stephanie, if I understand it correctly, you have a fairly horizontal capability that could apply to almost any company and almost any industry. Is that fair, or is there more specialization or customization that you apply to make it more valuable, given the type of company or type of industry? Generalized technology McReynolds: The technology itself is a generalized technology. Our founders come from backgrounds at Google and Apple, companies that have developed very generalized computing platforms to address big problems. So the way the technology is structured is general.
  • 4. The organizations that are going to get the most value out of an Alation implementation are those that are data-driven organizations that have made a strategic investment to use analytics to make business decisions and incorporate that in the strategic vision for the company. So even if we're working with very small organizations, they are organizations that make data and the analysis of data a priority. Today, it’s not every organization out there. Not every mom- and-pop shop is going to have an Alation instance in their IT organization. Gardner: Fair enough. Given those organizations that are data-driven, have a real benefit to gain by doing this well, they also, as I understand it, want to get as much data involved as possible, regardless of its repository, its type, the silo, the platform, and so forth. What is it that you've had to do to be able to satisfy that need for disparity and variety across these data types? What was the challenge for being able to get to all the types of data that you can then apply your value to? Embed the HPE Big Data OEM Software McReynolds: At Alation, we see the variety of data as a huge asset, rather than a challenge. If you're going to segment the customers in your organization, every event and every interaction with those customers becomes relevant to understanding who that individual is and how you might be able to personalize offerings, marketing campaigns, or product development to those individuals. That does put some burden on our organization, as a technology organization, to be able to connect to lots of different types of databases, file structures, and places where data sits in an organization. So we focus on being able to crawl those source systems, whether they're places where data is stored or whether they're BI applications that use that data to execute queries. A third important data source for us that may be a bit hidden in some organizations is all the human information that’s created, the metadata that’s often stored in Wiki pages, business glossaries, or other documents that describe the data that’s being stored in various locations. We actually crawl all of those sources and provide an easy way for individuals to use that information on data within their daily interactions. Typically, our customers are analysts who are writing SQL queries. All of that context about how to use the data is surfaced to them automatically by Alation within their query-writing interface so that they can save anywhere from 20 percent to 50 percent of the time it takes them to write a new query during their day-to- day jobs. Gardner: How is your solution architected? Do you take advantage of cloud when appropriate? Are you mostly on-premises, using your own data centers, some combination, and where might that head to in the future?
  • 5. Agnostic system McReynolds: We're a young company. We were founded about three years ago and we designed the system to be agnostic as to where you want to run Alation. We have customers who are running Alation in concert with Redshift in the public cloud. We have customers that are financial services organizations that have a lot of personally identifiable information (PII) data and privacy and security concerns, and they are typically running an on-premise Alation instance. We architected the system to be able to operate in different environments and have an ability to catalog data that is both in the cloud and on-premise at the same time. The way that we do that from an architectural perspective is that we don’t replicate or store data within Alation systems. We use metadata to point to the location of that data. For any analyst who's going to run a query from our recommendations, that query is getting pushed down to the source systems to run on-premise or on the cloud, wherever that data is stored. Gardner: And how did HP Vertica come to play in that architecture? Did it play a role in the ability to be agnostic as you describe it? McReynolds: We use HP Vertica in one portion of our product that allows us to provide essentially BI on the BI that’s happening. Vertica is used as a fundamental component of our reporting capability called Alation Forensics that is used by IT teams to find out how queries are actually being run on data source systems, which backend database tables are being hit most often, and what that says about the organization and those physical systems. It gives the IT department insight. Day-to-day Alation is typically more of a business person’s tool for interacting with data. Gardner: We've heard from HP that they expect a lot more of that IT department specific ops efficiency role and use case to grow. Do you have any sense of what some of the benefits have been from your IT organization to get that sort of analysis? What's the ROI? McReynolds: The benefits of an approach like Alation include getting insight into the behaviors of individuals in the organization. What we’ve seen at some of our larger customers is that they may have dedicated themselves to a data-governance program where they want to document every database and every table in their system, hundreds of millions of data elements. Using the Alation system, they were able to identify within days the rank-order priority list of what they actually need to document, versus what they thought they had to document. The cost savings comes from taking a very data-driven realistic look at which projects are going to produce value to a majority of the business audience, and which projects maybe we could hold off on or spend our resources more wisely.
  • 6. One team that we were working with found that about 80 percent of their tables hadn't been used by more than one person in the last two years. In that case, if only one or two people are using those systems, you don't really need to document those systems. That individual or those two individuals probably know what's there. Spend your time documenting the 10 percent of the system that everybody's using and that everyone is going to receive value from. Where to go next Gardner: Before we close out, any sense of where Alation could go next? Is there another use case or application for this combination of crowdsourcing and machine learning, tapping into all the disparate data that you can and information including the human and tribal knowledge? Where might you go next in terms of where this is applicable and useful? McReynolds: If you look at what Alation is doing, it's very similar to what Google did for the Internet in terms of being available to catalog all of the webpages that were available to individuals and service them in meaningful ways. That's a huge vision for Alation, and we're just in the early part of that journey to be honest. We'll continue to move in that direction of being able to catalog data for an enterprise and make easily searchable, findable, and usable all of the information that is stored in that organization. Gardner: Well, very good. I'm afraid we will have to leave it there. We've been examining how Alation maps across disparate data while employing machine learning and crowdsourcing to help centralize and identify data knowledge. And we've learned how Alation makes data actionable by keeping it up-to-date and accessible using innovative means. Embed the HPE Big Data OEM Software So a big thank you to our guest. We've been joined by Stephanie McReynolds, Vice-President of Marketing at Alation in Redwood City, California. Thank you so much, Stephanie. McReynolds: Thank you. It was a pleasure to be here. Gardner: And a big thank you as well to our audience for joining us for this big data innovation case study discussion. I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and come back next time. Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett Packard Enterprise.
  • 7. Transcript of a sponsored discussion on how Alation makes data actionable by keeping it up-to- date and accessible using innovative means. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved. You may also be interested in: • Intralinks Uses Hybrid Cloud to Blaze a Compliance Trail Across the Regulatory Minefield of Data Soveriegnty • Redmonk analysts on best navigating the tricky path to DevOps adoption • DevOps by design--A practical guide to effectively ushering DevOps into any organization • Need for Fast Analytics in Healthcare Spurs Sogeti Converged Solutions Partnership Model • HPE's composable infrastructure sets stage for hybrid market brokering role • Nottingham Trent University Elevates Big Data's role to Improving Student Retention in Higher Education • Forrester analyst Kurt Bittner on the inevitability of DevOps • Agile on fire: IT enters the new era of 'continuous' everything • Big data enables top user experiences and extreme personalization for Intuit TurboTax • Feedback loops: The confluence of DevOps and big data • IoT brings on development demands that DevOps manages best, say experts • Big data generates new insights into what’s happening in the world's tropical ecosystems • DevOps and security, a match made in heaven