SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
Article

Enabling Big Data Analytics with Modeling Workbench
Authors: Ravishankar Rajagopalan
and Dhanesh Padmanabhan
Data Science Infrastructure (DSI) Team
Data Sciences Group
[24]7 Innovation Labs
Bangalore, India
The data scientists at DSG are
required to analyze enormous
amounts of data to develop new
insights and models that can
accurately predict customer
intent.

[24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR
channels every day. Innovation Labs (iLabs), the technology division of [24]7,
provides predictive analytics solutions to improve customer experience. Data
Sciences Group (DSG) of the iLabs is primarily responsible for developing
statistical and machine learning models that predict customer intent. These
models are used to offer contextual chat, self-serve application on the web
channel or contextual IVR menu on the IVR channel, driving down the time
required for a customer to locate the information they are seeking, thereby
improving the overall experience.
The data scientists at DSG are required to analyze enormous amounts of data
to develop new insights and models that can accurately predict customer intent.
There is also a constant need to improve the models due to evolving customer
behavior and changing business landscape of our customers, which requires
continual monitoring of models and model updates. The Data Science
Infrastructure (DSI) team is primarily responsible for building scalable analytics
products to equip the data scientists with tools to quickly analyze data, develop
models and monitor performance of models. Modeling Workbench is one such
tool developed by DSI.

Workbench is a web-based tool
for the data scientists to analyze
millions of online customer
journeys

What is the Modeling Workbench?
Modeling Workbench is one of the products DSI conceptualized and developed
in collaboration with the Platform Engineering (PE) team of iLabs and currently
being piloted for the web channel. Workbench is a web-based tool for the data
scientists to analyze millions of online customer journeys and develop quick
insights and build models at scale for improved online predictive targeting.
Workbench is expected to support Exploratory Data Analysis (EDA), Model
building/Validation and Simulation. Model deployment and model monitoring
are supported by other internal tools developed at iLabs. The feedback from the
production systems drives the model improvements.

Development

Production
Model
Building

Exploratory
Data
Analysis

Big Data

Model
Deployment

Model
Validation

Model
Monitoring

Model
Simulation

Modeling Life Cycle

Follow [24]7 India
www.247-inc.com
EDA is the process of using standardized statistical procedures such as
univariate and bivariate analysis to extract variables (features) of interest for the
problem at hand (predict online user’s purchase intent), which are then
subsequently used for model building. Model building and validation involves
implementing several advanced statistical/machine learning algorithms and
picking the best performing model. Simulation is used for understanding the
dynamics of the model in real time. These phases are iterative and a data
scientist typically goes through several iterations to identify the most effective
model.
Being highly scalable, the
workbench could be used to
analyze 100+ million customer
journeys in a few minutes.

The workbench provides customized data analytics functionalities at the click of
a button and it is expected to save considerable time and effort for the data
scientists. Being highly scalable, the workbench could be used to analyze 100+
million customer journeys in a few minutes. In addition, the workbench also
incorporates best practices to be adopted during different phases of modeling
and also facilitates standardization of analyses across DSG.

Productivity
Reduce time to analyze data and build models by
50-75%

Scalability
Provide ability to build and simulate models with
millions of customer journeys in a few minutes

Standardization
Standardize model building and analysis

Benefits of Modeling Workbench

What is the Technology behind the Workbench?
Data scientists at [24]7 in the past have traditionally used relational databases
in conjunction with statistical modeling and data mining software such as R and
Python for analyzing data. The process in the past involved writing custom SQL
scripts on relational databases to prepare the datasets and moving this
prepared datasets to other computing infrastructure where R and Python scripts
were used for analysis and model building. This traditional approach severely
limits the size of data one could analyze since most statistical modeling
software is memory dependent.

Follow [24]7 India
www.247-inc.com
Columnar DB

Weblogs

Big Data Stack

Workbench Backend

Java Front End

Data Scientists

The Modeling Workbench Architecture
The tight integration of R and
columnar database technology
allows
for
scalable
data
analytics

The workbench solves these issues by connecting users through a central
web-based application to an analytical database, which is based on a
distributed columnar database technology. The workbench exposes a standard
set of analyses that execute as server-side SQL or R scripts running directly on
the columnar database. The tight integration of R and columnar database
technology allows for scalable data analytics without the need for data
movement.
The distributed columnar database obtains the data from Hive tables where
weblogs are being transformed on a daily basis using Python Map-reduce
scripts within Hive. The workbench itself is a Java-based web application that
accesses the data from the distributed columnar database remotely. The
analyses performed by data scientists are cached in an application database
powered by Mongo DB, which ensures quick retrieval of results from
previously-saved analysis. The saved analyses are shareable across the team
for effective collaboration.

expected to include natural
language processing, text and
speech analytics

Modeling workbench provides a scalable analytics platform for quickly
crunching data, generating useful insights, and building advanced statistical &
machine learning models. The current version supports the analysis of web
channel data. Future versions are expected to include natural language
processing, text and speech analytics for data obtained from [24]7’s chat and
IVR platforms.

About the Authors
Dhanesh Padmanabhan leads the Data Science Infrastructure team with the
[24]7 Data Sciences Group (DSG). He holds the responsibilities of developing
the analytics infrastructure and the prediction platform for DSG. He has 10
years of experience in marketing analytics in R&D, KPO and Consulting
companies including General Motors R&D, HP Analytics and Marketics
Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from
the University of Notre Dame.
Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data
Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead
for the modeling workbench project. Prior to [24]7, he had worked with GE
Power and Water as part of their Advanced Analytics team and Mu Sigma. He
holds a Ph.D. in Applied Statistics from The Ohio State University.

Follow [24]7 India
www.247-inc.com

Mais conteúdo relacionado

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Enabling Big Data Analytics with Modeling Workbench

  • 1. Article Enabling Big Data Analytics with Modeling Workbench Authors: Ravishankar Rajagopalan and Dhanesh Padmanabhan Data Science Infrastructure (DSI) Team Data Sciences Group [24]7 Innovation Labs Bangalore, India
  • 2. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. [24]7 Inc accumulates several gigabytes of data from web, mobile, chat and IVR channels every day. Innovation Labs (iLabs), the technology division of [24]7, provides predictive analytics solutions to improve customer experience. Data Sciences Group (DSG) of the iLabs is primarily responsible for developing statistical and machine learning models that predict customer intent. These models are used to offer contextual chat, self-serve application on the web channel or contextual IVR menu on the IVR channel, driving down the time required for a customer to locate the information they are seeking, thereby improving the overall experience. The data scientists at DSG are required to analyze enormous amounts of data to develop new insights and models that can accurately predict customer intent. There is also a constant need to improve the models due to evolving customer behavior and changing business landscape of our customers, which requires continual monitoring of models and model updates. The Data Science Infrastructure (DSI) team is primarily responsible for building scalable analytics products to equip the data scientists with tools to quickly analyze data, develop models and monitor performance of models. Modeling Workbench is one such tool developed by DSI. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys What is the Modeling Workbench? Modeling Workbench is one of the products DSI conceptualized and developed in collaboration with the Platform Engineering (PE) team of iLabs and currently being piloted for the web channel. Workbench is a web-based tool for the data scientists to analyze millions of online customer journeys and develop quick insights and build models at scale for improved online predictive targeting. Workbench is expected to support Exploratory Data Analysis (EDA), Model building/Validation and Simulation. Model deployment and model monitoring are supported by other internal tools developed at iLabs. The feedback from the production systems drives the model improvements. Development Production Model Building Exploratory Data Analysis Big Data Model Deployment Model Validation Model Monitoring Model Simulation Modeling Life Cycle Follow [24]7 India www.247-inc.com
  • 3. EDA is the process of using standardized statistical procedures such as univariate and bivariate analysis to extract variables (features) of interest for the problem at hand (predict online user’s purchase intent), which are then subsequently used for model building. Model building and validation involves implementing several advanced statistical/machine learning algorithms and picking the best performing model. Simulation is used for understanding the dynamics of the model in real time. These phases are iterative and a data scientist typically goes through several iterations to identify the most effective model. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. The workbench provides customized data analytics functionalities at the click of a button and it is expected to save considerable time and effort for the data scientists. Being highly scalable, the workbench could be used to analyze 100+ million customer journeys in a few minutes. In addition, the workbench also incorporates best practices to be adopted during different phases of modeling and also facilitates standardization of analyses across DSG. Productivity Reduce time to analyze data and build models by 50-75% Scalability Provide ability to build and simulate models with millions of customer journeys in a few minutes Standardization Standardize model building and analysis Benefits of Modeling Workbench What is the Technology behind the Workbench? Data scientists at [24]7 in the past have traditionally used relational databases in conjunction with statistical modeling and data mining software such as R and Python for analyzing data. The process in the past involved writing custom SQL scripts on relational databases to prepare the datasets and moving this prepared datasets to other computing infrastructure where R and Python scripts were used for analysis and model building. This traditional approach severely limits the size of data one could analyze since most statistical modeling software is memory dependent. Follow [24]7 India www.247-inc.com
  • 4. Columnar DB Weblogs Big Data Stack Workbench Backend Java Front End Data Scientists The Modeling Workbench Architecture The tight integration of R and columnar database technology allows for scalable data analytics The workbench solves these issues by connecting users through a central web-based application to an analytical database, which is based on a distributed columnar database technology. The workbench exposes a standard set of analyses that execute as server-side SQL or R scripts running directly on the columnar database. The tight integration of R and columnar database technology allows for scalable data analytics without the need for data movement. The distributed columnar database obtains the data from Hive tables where weblogs are being transformed on a daily basis using Python Map-reduce scripts within Hive. The workbench itself is a Java-based web application that accesses the data from the distributed columnar database remotely. The analyses performed by data scientists are cached in an application database powered by Mongo DB, which ensures quick retrieval of results from previously-saved analysis. The saved analyses are shareable across the team for effective collaboration. expected to include natural language processing, text and speech analytics Modeling workbench provides a scalable analytics platform for quickly crunching data, generating useful insights, and building advanced statistical & machine learning models. The current version supports the analysis of web channel data. Future versions are expected to include natural language processing, text and speech analytics for data obtained from [24]7’s chat and IVR platforms. About the Authors Dhanesh Padmanabhan leads the Data Science Infrastructure team with the [24]7 Data Sciences Group (DSG). He holds the responsibilities of developing the analytics infrastructure and the prediction platform for DSG. He has 10 years of experience in marketing analytics in R&D, KPO and Consulting companies including General Motors R&D, HP Analytics and Marketics Technologies (now WNS). He holds a Ph.D. in Mechanical Engineering from the University of Notre Dame. Ravishankar Rajagopalan is a Principal Analytics Consultant in the [24]7 Data Science Infrastructure (DSI) team. He is the DSG (Data Sciences Group) lead for the modeling workbench project. Prior to [24]7, he had worked with GE Power and Water as part of their Advanced Analytics team and Mu Sigma. He holds a Ph.D. in Applied Statistics from The Ohio State University. Follow [24]7 India www.247-inc.com