This document discusses data science and related topics. It summarizes that data science involves deriving knowledge from large, structured and unstructured data using techniques like data mining, machine learning, and big data analytics. It provides examples of industries that use these approaches for applications such as fraud detection, sales predictions, and recommendations. The document also outlines Deteo's data science service offerings and expertise in areas like recommendation systems, machine learning, and analyzing structured and unstructured data using tools like Hadoop, R, and Python.
2. Data Science
Data science is the process of deriving valuable knowledge from "Big Data" consisting
of structured, unstructured or semi-structured data that large enterprises produce.
3. Big Data
Big data is a set of techniques and technologies which operates wits data sizes
beyond the ability of commonly used software tools to capture and manage within a
tolerable elapsed time.
4. Data Mining
Data mining is a process that analyzes a large amount of data to find new
and hidden information that improves business efficiency. Various industries
have been adopted data mining to their mission-critical business processes
to gain competitive advantages and help business to grow.
5. Machine Learning
Machine Learning is a process that gives computers the ability to learn without being
explicitly programmed.
Examples: spam filtering, recommendation systems, sales predictions.
6. Business domains
Any kind of data analyses is based on two major components:
technical tools and domain expertise. Deteo has significant practical
experience in the following industries proven by long term
cooperation with appropriate customers from:
• Banking sector
• Insurance
• Human resource management
• IT and Telecom
• Accounting
• Retail
7. Business challenges we can address
New possibility for growth depends on the ability to analyze, predict and make
decision based on existed data related to customers and market:
Retail
• Market basket analysis to provide information on what products or services
combinations were purchased or consumed together. This allows to promote and
optimize products and maximize profit.
• Analyze customer retention and locality based on recent purchases activities.
• Data mining helps detect fraudulent behavior with credit card or online
transactions
• Clustering/Segmentation for targeted marketing
8. Business challenges we can address
Bank and Insurance
• Detect risky behavior of customers
• Claim prediction based on information available from previous events
• Fraud detection
eCommerce
• Collaborative filtering and recommendation systems that make automatic
prediction about the interests of users by collecting preferences and tastes
information from many similar users of such systems.
• Mining social networks could be applied both to target marketing and sentiment
analysis
• Intranet search to provide capabilities to find and answer the questions based on
information available within corporation or organization networks
• Analysis on streaming/online data to prepare information for further processing
10. Approach
In scope of Data Science service offering we are able to complete the following
scope of activities:
• Comprehensive review of customers’ current business, plans and systems
• Recommendations on connecting Data science tools and approaches to
customers’ existing Business and IT infrastructure
• Perform Data Analysis
• Data Visualization and Advanced Reporting
• Support and Maintenance or Solution Hand Over
11. Initiation
•Project initiation
•Team setup
•Define business
needs
Analysis
•Define business goals in
technical metrics
•Analyze current
infrastructure
•Analyze existing data
•Analyze level of data
sensitivity
•Develop required
algorithms
•Validate algorithms on
small portion of data
Data Mining
•Prepare required
infrastructure
•Perform data
masking of sensitive
data
•Run data mining
algorithms
Results
Analysis
•Root-cause
analysis
•Risks assessment
•Recommenda-
tions to fix
Reporting
•Transform mined
data into graphics,
charts and tables
understandable
for stakeholders
•Plan meeting
where prepared
reports are
presented
Hand Over
•Prepare
knowledge
transfer plan
•Prepare technical
and business
documentation
•Provide training
for customers
experts
•Handover
developed
solution to
customer
Iteration cycle: 3-6 weeks
Regular status meetings
13. Case study: Car insurance
Business challenge
We received historical data about car accidents from insurance company for the last 5
years. Data was anonymized, so contained no personal information. Customer asked us
to analyze this data. There was an assumption that insurance risk was not equal for
different groups of cars.
Our solution
Using Microsoft cloud stack of technologies for data analysis we run several
experiments and have defined groups of cars with equal risk probability. Based on this
information Customer was able to adjust his insurance fee card, so for two car groups
insurance fee was decreased for 10% and customer proposition became more valuable
on the market.
14. Business challenge
We received unstructured logs from server farm that represented
servers and services activities. Idea was to analyze it and to find the
most problematic servers and try to analyze the reasons.
Our solution
Using Hadoop Apache technology stack we loaded and processed
about 500 GB of text files. As a result, we identified servers that failed
the most often and defined the most probable preconditions of the
fault.
Next step is to implement online logs processing and analysis in order
to predict server or service fault.
Case study: Logs analysis
15. • Recommendation systems
• Machine learning
• Visualization
• Data Mining
Stream processing
NoSQL databases Hadoop based infrastructure
• Microsoft HD Insight
• Oracle BigData appliance
• IBM InfoSphere BigInsights
Tools
• Hadoop, Spark, Hive, Pig
• Azure
• R, Python, Java
Vendors
• Oracle, Microsoft, IBM
• Apache
• QlikView, Tableau
Stream processing
• IBM InfoSphere Streams
• Oracle Real-Time Decisions
• Apache Storm in MS Azure
Data science
• Recommendation systems
• Machine learning
• Visualization
• Data Mining
• MongoDB
• Cassandra
• Neo4j
When the data becomes a real problem of its size and variety – it’s time for Big Data solutions
16. Trainings and certifications
Deteo’s data science team has passed following trainings and certifications
Coursera
• Machine Learning
• Mining Massive Datasets
• Computing for Data Analysis
• R Programming
Online Stanford University
• Statistical Learning
Other
• Hadoop: Map Reduce and Big Data
• MongoDB for Developers
• MongoDB for DBAs
17. Interested to know more about our abilities?
Please ping us at contact@deteo.info