5. What is the problem being solved?
Network threats are
growing ...
6. What is the problem being solved?
• 2 types of threats – Internal ( Social Unrest & Watch List ) &
External ( Hackers )
External hackers Internal Activists
11. SOC's cant see threat patterns …
running BLIND
• Being Blind = Risk
• BeingCannot be blind to patterns anymore
• The capability to “see” patterns previously not seen
• Network activity and behaviour – Firewalls , routers
• Saves lives, provides social stability – WL Chatter !
12. Capability to remove “data blind folds”
to “SEE” behavioural patterns key to
security
MACHINE
DATA
KEY TO
UNCOVERING
SECURITY
PATTERNS !
13. What are some “behavioural signatures” ?
1. Sudden increase
in you tube
uploads @ night
1. Viral Rate of
propagation of
MMS videos
14. So what does the data look like ?
National content filtering log – 1 billion events/day !
15. 16
1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37
1 2 3 4 5 6
Decoding 7 components of the Netsweeper log entry
7
EPOCH
Time stamp
URL requested Source IP
Client
subnet
Client group
name
0 allowed
1 denied
URL Category
Descp tbd
50 categories in the system
Education, Pornoraphy, Phishing,
Criminal Skills etc
23" - Its related to "Pornography
“45" - Its related to "GENERAL"
Timestamp
URL requested
Source IP
Client Subnet
Client Group Name
Denied flag
URL Categort
Decoding National content filtering logs
16. Expand to ingest variety of watched events
File Delete Events
User Login Failure Events
Root access Failures
2 Sigma events
Table Drop Events
Table Delete Events
Column Drop Events
Critical Proc recompilation
OS logs Database logs
Critical tsn value changes
Master data changes
App login failures
Login at unusual time windows
Application logs
Search for specific keywords
2 Sigma event for URL’s
Decomp tree- failed reqsts
Login Failure
Web server logs
Dropped call frequency
Watch List inbound/outbound
Cut calls - poor connection
Call Failure event frequency
Timeout event frequency
Swarm event detected
Dropped IP calls frequency
Failed IP call frequency
CDR logs IPR logs
SMS Capacity events
Unusual sms traffic events
User defined router events
Compliance related router event
Router logs
Odd hour Unsuccessful logins
X happens Y times in Z time
User defined firewall events
Compliance oriented firewall e
Firewall logs
Frequency of login failures high in a certain pockets
Recency of late night events noticed in certain pockets
Certain corridors experiencing high dropped calls
17. Converting raw data Actionable Intelligence
INTEGRATED
EVENT 360
REPOSITORY
SENSE &
RESPOND
LAYER
LOG FILE
INGESTION
MACHINE LEARNING
ALGORITHMS ON
GRANULAR LOG
EVENT DATA
INFER INTENT FROM
PATTERNS
AND CREATE EVENT
PROFILES
LOAD RISK /
BEHAVIOR PROFILE
TO RULES ENGINE
DB
INTERCEPT OR
OFFLINE REVIEW OF
EVENTS
CONSOLIDATE & REVIEW
EVENT INTERCEPTS TO
ASSESS EVENT RULE
EFFECTIVENESS
MEASURE PATTERN RULE
EFFECTIVENESS
- TRUE POSITIVE / FALSE
POSITIVES
CASE MANAGEMENT
WORKFLOW
TELECOM SWITCHES
OTHER DEVICES
•CDR LOG FILES
•IP LOG FILES
•MISC LOG FILES
Holistic Value Chain
BIG DATA
REPOSITORY
19. What's the problem we are trying to solve ?
• Travellers are “signalling” to
us thru the behaviour they
exhibit
• OTA is unable to sense n
respond to these varied
behaviour
20. Why is it important to solve this problem ?
• Impacts look to book
• Increase revenue from cross sell
26. Does Srikanth have a bias towards any
airline ?
Those small clicks reveal a lot !
27. So who is Srikanth?
Do we 'know' him ?
What's his behavorial DNA ?
Key vectors ?
Early bird ( days = 21 )
Price insensitive ( click % = 89 %)
Prefers American Airlines
Most valuable customer ( Decile-1 )
Intra visit interval = 17 days
Visit dispersion = 12 % International
Churn propensity = 0
Bargain hunter = No ( 3 % coupon)
Roadie = Yes ( 28000 miles per qtr )
Sentiment index = 73 %
28. How do we respond in real time to Srikanths
experience and behavioural patterns we’ve seen ?
• If Srikanth is a high value customer
• If he does not book within 8 min window
• In real time route to high performing agent
• Short circuit the queue
• Extra 10 % discount since he is vulnerable
• If search response time velocity is trending downward
• Signal to beef up infrastructure
• Optimise code base
• Property recommendations
30. What is the problem being solved?
• Internal watch lists
• Can we get e signals in their behavior ?
Call patterns ?
SMS patterns ?
Youtube upload patterns ?
Watched countries ?
Intrawatch list chatter ?
Late night communication behavior ?
• Watch list activity intelligence takes 6
weeks
• Bring it down to < day
• Enhance it to make it real time
31. Why is it important to solve this
problem ?
• Threat signals are
there in telecom
and communication
logs
• Saves lives !
• Ensures national
security !
32. Under the hood
• Remote Authentication Dial-In User Service (RADIUS) provide authentication,
authorization and accounting for network access.
• When a user wants to get access to the Internet he will first have to give his users
credentials (in most cases username and password) to a local RADIUS client.
33. Deconstructing Radius Logs
The IP address of the NAS ( Network Access server )
that is sending the request
The framed address to be configured for the user
3 time stamps
User Identity
34. Radius logs Netsweeper logs
Subscriber
database
Rich Security
intelligence !
Triangulate from 3 event data pools
35. Access/Device
Framed IP
address
Customer
ethnicity
URL accessed
Date/time
Day
Week
Client IP
address
Customer type
Customer
browse
location
Post paid
Subscriber
Database
1329031890 http://photogallery.indiatimes.com/photo/4686985.cms 94.200.107.14 94.200.0.0 Du_Public_IP_Address 0 37
Status
Enterprise
Residential
Asian
European
Dubai
Smart Phone
Desktop
Ipad
Others
URL Type
Gaming sites
News sites
Others
?
? Yes
No
Business rule to
derive access device
to be elicited from
SME
Location mapping
business logic to be
elicited from SME
Social Networking
Blogs
P2P sites
VPN/VOIP
NAS Port Id
Username Nas port id RADIUS Logs
Co-relating fragmented telecom log files-Info model
36. Calls to watched countries
Intra Watch list Chatter velocity is high
Call patterns reveal malicious intent
37. 38
Entity on watch list
NOT on watched list
but high level of
interactions
Are people ‘n’ degrees away from watched list performing 2 sigma activity across multiple
Call dimensions – sms, voice, conference and other behavioral activity ?
CDR
From BTN
To TN
Date/Time
Duration
Call type,
Approximate tower location which carried
call
Watch List Recommender Data Product
Modeling Unique behavioural signature
40. Mobile funnel data
Analyzing Mobile Sub
Channel Behavioural
shift to Drive revenues
for a leading online
travel company
41. What's the problem being solved ?
• More applications becoming mobile
• There is a dip in transaction completion rate
• Friction points and hot spots exist
• No way to “see” these hot spots and patterns
49. Graph analysis to monitor money
transmission patterns
• Each account can be modelled as a node in a
graph
• Behaviour across nodes can be analyzed
• Proxy behaviours can be easily discerned
51. Lesson-1 : Think “Polyglot persistence”
Asset
Sensor
Parameters
Asset tags Sensor tags
Events
Column family
( Hbase/Cassandra)
Document db
( Mongo)
Graph db
( Neo4js)
RDBMS
( Oracle )
Heavy duty write workloads
Photos, Videos, text Inter relationships
Low velocity self service
Logical Business Model
“Different strokes for different folks”
52. Lesson-2 : Think “pattern extraction”
1. Collaborative filtering
2. Text Mining
3. Scoring Models (
Logistic etc )
Embedding one ML process can help SPOT patterns not previously
seen
53. Lesson-3 : Think “Baby steps”
• 60-90 day Hadoop
Sandbox
• Build quick wins to
build momentum
• Pick a few low
hanging use cases to
demonstrate impact
No Big Bang !
54. Lesson-4 : Think “Data Products”
• Data Product = “Action an end user takes”
• EXAMPLE
• Watch List recommender vs tons of “feel good” graphs
• Next best action vs lots of dials, graphs
•
Focus on Outcomes more than Analysis
55. Lesson-5 : Think “MVP-Minimum Viable Product”
• Minimalist ... Key is to start simple
• Only core features ... No bells and whistles
• Get feedback from early adopters and enrich features
•
56. How can Big Data co-exist with existing DW solutions ?
Big DataExisting DW
57. Existing DW
OSS BSS CRM
ETL
Existing BI tools
Radius logs
IP traffic
logs
Comments
File copy / Bulk load / Agent based
Operational App Integration
Existing DW
OSS BSS CRM
ETL
Existing BI tools
Radius logs
IP traffic
logs
Comments
File copy / Bulk load / Agent based
Operational App Integration
Lesson-6 : Gracefully Co-exist
58. Lesson-7 : Think “Biz backward … NOT Tech
forward”
1. What is the business problem you
are solving ? Tightly framed ?
2. Why is important to solve this
problem ?
3. What happens if we dont solve this
problem ?
4. Is status quo an option ?
5. Is the business pain acknowledged ?
6. How would the end user “feel” when
the product is deployed ?
7. Are budgets allocated ?
8. What is the actual use case to solve
the pain ?
Connect with business @ a deeper level !