We all want to analyse and visualise streaming data for real-time operational insights into our applications and infrastructure, and to make more informed decisions. But streaming analytics are hard at scale. Before you realize you can end up with a very sophisticated architecture with different moving parts you need to secure, monitor, and scale independently. In this session, you will learn how to work with real-time data, from ingestion to visualisation and monitoring, at any scale, by leveraging the managed services provided by AWS.
36. 43
We are trusted by
companies for which cyber
security is absolutely
critical
5/5
Top UK Banks
3/5
Top US Banks
3/5
Top Singapore
Banks
4/5
Top South African
Banks
5/5
Top Nordic Banks
Endpoint protection
New cyber
security
solutions
F-SECURE• Founded in 1988
• +1600 employees
• Listed on NASDAQ OMX, Helsinki
• ~30 offices around the globe
• Revenue of €190 million in 2018
• +100,000 corporate customers and tens of millions of consumer
customers.
39. DETECT ATTACKS IN MINUTES
WITHOUT DROWNING IN ALERTS
2 billionDATA EVENTS/MONTH
• Endpoint sensors
• Network sensors
• Decoy sensors
Average number from a customer
organization with ~1300 endpoints
25DETECTIONS
Detections of
which customer
was notified
After threat hunters have
analyzed the machine filtered
detections
15REAL THREATS
Customer confirmed
that these were
real threats
900,000SUSPICIOUS EVENTS
Real-time behavioral analysis of
the raw data events supported by
AI and machine learning
Training set:
True / false positive
decisions by the hunters
Event
Enrichment
Host & User
Profiling
Anomaly
Detection
Detection
Significance
Analysis
4 minutes (for slides 22 and 23)
Hopefully the value of data streaming is very clear at this stage, however it is very important to notice that companies face many challenges as they attempt to build out real-time data streaming capabilities, and embark on generating real-time analytics.
Data streams are difficult to setup, tricky to scale, hard to achieve high availability, complex to integrate into broader ecosystems, error prone and complex to manage over time, and can become very expensive to maintain. These challenges have often been enough of a reason for many companies to shy away from such projects.
At AWS it has been our core focus over the last 5 years to build a solution that removes these challenges.
The AWS solution is easy to setup and use, has high availability and durability (default being across 3 regions), is full-managed and scalable reducing the complexity of managing the system over time and scaling as demands increase, and also comes with seamless integration into other core AWS services such as Elasticsearch for Log Analytics, S3 for data lake storage, Redshift for data warehousing purposes, Lambda for serverless processing etc. etc. Finally with AWS you only pay for what you use making the solution very cost effective.
Purpose of the slide – breaks out the top 6 benefits of the service.
Supports Open-Source APIs and Tools - We provide an open source compatible version of Elasticsearch. If you are currently using self managed Elasticsearch, you can easily migrate it to the service. We take care of the management of the cluster (undifferentiated heavy lifting), and you can continue to use the same open source tools and APIs that you are already using.
Easy to Use - You use the console, sdk, cli to easily create a cluster, we then do the work of deploying the cluster and making it available via an endpoint that you can access via a REST api
Scalable - We make it very easy to scale your clusters. With just a few commands we will seamlessly deploy a new cluster for you and you can continue to run uninterrupted.
Secure - We provide a number of different security options. You can use IAM and VPC to secure access to your cluster.
Highly available - We provide 100% data redundancy in two availability zones.
Tightly Integrated with Other AWS Services - On the ingest side you can easily send CWL to Amazon ES, you can use Kinesis Firehose to stream data to Amazon ES, and we also offer integration with AWS IoT. For cluster creation, CF also supports Amazon ES.
Purpose of the slide – Gives them the confidence that they are not alone if they use Amazon ES regardless of their vertical.
Key takeaway is that Amazon Elasticsearch Service usage is not isolated to a few verticals, or high-tech companies. Almost all enterprises today are using some form of log analytics and operational monitoring to ensure the success of their business.
4 minutes
So finally – I would like to introduce the AWS services that we have built to enable real-time analytics for our customers. The Kinesis family consists of 3 core services for data streaming (note we also have a fourth service Kinesis Video Streaming enabling our customers to stream and analyze video and audio in real-time – although we are not covering that today it is a very exciting capability).
Kinesis Data Streams enables customers to capture and store data
Kinesis Data Analytics allows customers to build real-time applications in SQL or Java (with fully-managed Flink)
And Kinesis Data Firehose enables customers to load streaming data into streams, data lakes and or warehouses and is a very effective way of conducting ETL on continuous, high velocity data. We will go into the details of these services tomorrow during Damian Wylie’s session.
Finally we are very excited to announce the latest service that we announced at Re:Invent 2018 and is currently in public preview, and has already achieved a run rate of $5million. Amazon Managed Streaming for Kafka is a fully-managed service for Apache Kafka, a highly popular open-source framework for data streaming. Customers, who chose to use Kafka, currently either managed clusters on premise or on EC2, with many of the challenges that we spoke about before. My introducing Amazon MSK customers can now lift and shift their existing workloads and get full benefits of a fully-managed service where clusters are setup automatically and can be created or torn down on demand. This is a very exciting opportunity this year, and if you hear of any customer who use Amazon Kafka do mention Amazon MSK and convince them to give it a go.
Another huge advantage of these 4 services is that it provides our customers with the flexibility to choose the right streaming technology depending on their use case, needs and preferences. Damian will discuss this in depth tomorrow, but we are certainly excited to be able to offer our customers choice in this space.
We often get questions from customers regarding when to use Amazon Kinesis Data Streams or Amazon Kinesis Data Firehose.
Amazon Kinesis Data Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks
Amazon Kinesis Data Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift, and Amazon ES, and a data latency of 60 seconds or higher
In many cases customers leverage both services. KDS for real-time, event processing, and then KDF to load the streaming data into data stores for more thorough analysis.
3 minutes
In order to understand the basics of real-time analytics and data streaming capabilities there are 5 core stages to understand.
Firstly the source of data – essentially where is the data coming from? Mobile, web click-stream, log analytics, IoT devices, smart devices etc. etc.
Data then needs to be ingested into the stream. This requires the ability to scale a solution that can capture data coming from hundreds of thousands of devices, in a reliable manner, into one stream for analysis. Damian will dig into some of the details of this tomorrow.
Data is then stored in the order it was received for a set duration of time, and can be replayed indefinitely during this time.
As the data is stored in the stream it can be processed by real-time applications to generate real-time analytics, execute real-time ETL and then deliver the continuous data to an end destination such as a data lake (S3, and then analyzed by Athena), a warehouse (Redshift) or other data bases such as DynamoDB.
3 minutes
In order to understand the basics of real-time analytics and data streaming capabilities there are 5 core stages to understand.
Firstly the source of data – essentially where is the data coming from? Mobile, web click-stream, log analytics, IoT devices, smart devices etc. etc.
Data then needs to be ingested into the stream. This requires the ability to scale a solution that can capture data coming from hundreds of thousands of devices, in a reliable manner, into one stream for analysis. Damian will dig into some of the details of this tomorrow.
Data is then stored in the order it was received for a set duration of time, and can be replayed indefinitely during this time.
As the data is stored in the stream it can be processed by real-time applications to generate real-time analytics, execute real-time ETL and then deliver the continuous data to an end destination such as a data lake (S3, and then analyzed by Athena), a warehouse (Redshift) or other data bases such as DynamoDB.
Expected questions:
What is time-based seek?
How exactly does replay work?
3 minutes
In order to understand the basics of real-time analytics and data streaming capabilities there are 5 core stages to understand.
Firstly the source of data – essentially where is the data coming from? Mobile, web click-stream, log analytics, IoT devices, smart devices etc. etc.
Data then needs to be ingested into the stream. This requires the ability to scale a solution that can capture data coming from hundreds of thousands of devices, in a reliable manner, into one stream for analysis. Damian will dig into some of the details of this tomorrow.
Data is then stored in the order it was received for a set duration of time, and can be replayed indefinitely during this time.
As the data is stored in the stream it can be processed by real-time applications to generate real-time analytics, execute real-time ETL and then deliver the continuous data to an end destination such as a data lake (S3, and then analyzed by Athena), a warehouse (Redshift) or other data bases such as DynamoDB.
3 minutes
In order to understand the basics of real-time analytics and data streaming capabilities there are 5 core stages to understand.
Firstly the source of data – essentially where is the data coming from? Mobile, web click-stream, log analytics, IoT devices, smart devices etc. etc.
Data then needs to be ingested into the stream. This requires the ability to scale a solution that can capture data coming from hundreds of thousands of devices, in a reliable manner, into one stream for analysis. Damian will dig into some of the details of this tomorrow.
Data is then stored in the order it was received for a set duration of time, and can be replayed indefinitely during this time.
As the data is stored in the stream it can be processed by real-time applications to generate real-time analytics, execute real-time ETL and then deliver the continuous data to an end destination such as a data lake (S3, and then analyzed by Athena), a warehouse (Redshift) or other data bases such as DynamoDB.
Highlight that we have earned the trust of some of the most demanding industries, such as finance
Explain the multiple reasons why banks focus on security, and what’s driving them to do more today
Explain how broadly we do business with them, and how we can help them
Share first story here: some cool case from the a renowned bank (anonymous of course), with an incident response & forensics angle
UNMATCHED NETWORK VISIBILITY BY NETWORK, DECOY & ENDPOINT SENSORS
We have the sensors collecting the relevant data and sending the data to our cloud. We have real time behavioral analytics, and big data analytics to process the data, we will look for anomalies from two perspectives: known bad behavior and unknown bad behavior. All anomalies will be raised to our experts in Rapid Detection Center, they will further verify the anomalies, and alert the customer or partner in less than 30 minutes when something critical is discovered. Threat Analysts will walk the customer or partner through necessary steps to contain and remediate the threat.
Alerts are in 2 high-level categories:
High level alerts which are critical, e.g. strong indication of an ongoing breach. Customer/partner is alerted via phone and email, case is verified and the customer’s/partner’s critical incident response process is initiated when needed.
Medium/low level alerts which are non-critical. Typically these are spy/adware or other potentially unwanted programs discovered from employee PCs.
With more details:
Your organization – what is deployed:
End-point sensors: Windows (7 and later, 2008 R2 or later), Mac (MacOS 10.11 (El Capitan) and MacOS 10.12 (Sierra)), Linux (CentOS 6, CentOS 7, RHEL 6, RHEL 7, Debian 7 and Debian 8)
We collect behavioral metadata – not the insides of e.g. document files
Collected data and privacy issues are described in Privacy Policy
We collect roughly 5 MB of data per typical Windows office user / day (use this to calculate network impact)
We are constantly working to reduce the amount of data collected
Honeypots (decoy sensors) are a good very low noise way to build traps for the attacker. Once someone is accessing honeypot, it is immediately correlated with the information from other sensors to filter out false alarms. If there’s clear pattern suggesting malicious behavior, the customer is alerted.
Honeypots are build on top of Linux including the necessary components to mimic critical assets. We provide several predefined flavors of honeypots, based on popular setups:
*NIX web server (HTTP, HTTPS, MySQL, SSH)
*NIX ftp (FTP, SSH)
*NIX VoIP server (SIP, SSH)
Windows server (SMB, MSSQL, TFTP)
Windows workstation (SMB services)
It is possible to configure a honeypot with any combination of the services mentioned above.
Threat intelligence – Global and industry specific
All the Threat Intelligence (internal and external) have been connected and implemented into the core of RDS, which is a AI assisted threat hunting platform. RDS has been build as native global, high-performance, low-latency cloud service. Following describes how it works in high-level:
When sensor is installed it looks for signs of compromise and then starts collecting relevant behavioral data
Data is send from sensors and is received by data ingestion front-ends (distributed globally)
Received data goes through very low latency data enrichment process where additional information is added to the events for example file/URL reputation
After data enrichment events go through real-time detection engine which is looking for anomalies based on observed behavior, this is done in multiple levels if needed with data correlation
If detection is triggered a baseliner algorithm (machine learning based) is driven to filter out potential false positives
If result from baseliner is malicious, then the detection is raised to RDC to be taken further by RDS threat analysts
Depending on the length of the observed behavior the steps from 1 to 3 typically takes from less than 1 minute to few minutes.
We can utilize industry/customer specific detection algorithms depending on the need
Once the data has been analyzed by real-time detection it goes into big data storage (if possible we always utilize pseudonymized data)
We utilize stored data for threat hunting in the following ways:
Data is being analyzed automatically by various algorithms (for example statistical analytics & machine learning) to find new anomalies and these are then further analyzed and correlated either by other algorithms or by the Threat Analysts. We use both organization specific and global analytics when applicable.
We utilize threat hunting driven by data science where new algorithms are tested against sets of data to discover new previously unknown threats. Gained insights are transferred to new detection algorithms and new competences for Threat Analysts.
We also utilize the data to improve our false positive rates and to improve the performance by e.g. collecting less data from the sensors
Rapid Detection Center
At the core of the RDS are the cyber security experts. We have 3 types of skills available:
Threat Analysts (24/7) act as the first level. They constantly work to monitor the service and hunt for threats. Once they get an indication that something suspicious is happening, they will first verify the case by collecting necessary evidence, then make the decision on the priority, if priority is high, then customer/partner is alerted immediately with necessary actionable intelligence. If the case is non-critical, then the case is described with guidance to remediate and send to the customer/partner. Threat Analyst also keep the customer/partner up-to-date on any ongoing investigations.
Incident Responders (24/7). IR personnel are typically involved in complex cases where customer/partner is not able to manage the case internally. Incident Responders will help the customer remotely or on-site to contain and remediate the case, and with evidence gathering for legal purposes. We offer both experienced case leaders and technical incident responders. We have also worked together with law enforcement and know how to collect evidence to be used in courts. RDS has been designed to be deployed during IR case as threat hunting service to quick gain visibility when the customer network has already been breached.
Forensic experts. We are one of very few organizations globally who can handle very wide range of forensic tasks ranging from internal networks to deep reverse engineering of unique malware samples. This allows us to handle even the most complicated nation state originated attacks and the investigations that ensue the breach attempt.
F-Secure has been applying machine learning over 10 years ago (2008) in malware detection engine called Hydra. There were other client-side components, including DeepGuard, BlackLight, and Gemini, all leveraging machine-learning based malware detection engine used in conjunction with client-side behavioral analysis logic. In 2017 F-Secure launched its AI Center of Excellence, currently applying techniques such as reinforcement learning, GANs, and federated learning.
F-Secure is also an active member of a pan-European SHERPA project to understand better adversarial attacks against machine learning, and potential malicious uses of machine learning.
Note: Technical service manager (TSM) is mandatory whenever reseller partner is not responsible of the deployments and the first line support.
TSM supports customer deployments, ensuring settings are configured correctly and support available production with customer care’s support.
TSM monitors service level (product, support), drive feature adoption and acts as a local escalation point.
This is the Processing Section
Are KDS and KDF the only streams that KDA can work with? (with MSK on the roadmap). Can the output be sent to any of the consumers on slide 17?
Would KDA ever be replaced another consumer completely, if so why/what use cases? What is the standard architecture here? KDS/KDF -> KDA -> Lambda/ES/EMR -> S3/Redshift/DyanmoDB? If so we should talk about multiple consumers working in a workflow to execute effectively across many use cases.
Is this specific to Java? Why is it in this section? What are the points we are making here? Two processes, one for S3 and one for DynamoDB? Keyby, window, filter need explanations.
RabbitMQ? Are these the only three sources? What is the message here?
Destinations could be a stream/messaging queue, a data lake/warehouse/database or an analytical service such as Elasticsearch. Anything else? (again is this specific to the Java application?)
We support the majority of the ANSI 2011 SQL standard.
St
Some customers don’t have normalized or easy to structure data in their streams. These capabilities provide mechanisms to transform data ahead of SQL code (pre-processing).
Some customers don’t have normalized or easy to structure data in their streams. These capabilities provide mechanisms to transform data ahead of SQL code (pre-processing).
Proof of Concepts typically take less than a day
We have a rich amount of content on the website including case studies, webinars, and many blogs and technical documentation.
Thank you for listening and I would now like to hand over to Ajit to provide some more insight into sales opportunities.