Massive amounts of data are being generated from various sources like cell phones, sensors, web logs etc. This ambient data needs to be processed in real-time to enable scenarios like fraud detection, manufacturing process control, network monitoring etc. SQL Server StreamInsight provides a platform to process data streams with low latency queries, enabling near real-time analytics and action. Key capabilities include filtering, correlating, aggregating events over windows using a LINQ-like declarative query language.
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Making sense of ambient data with SQL Server Stream Insight
1. Petabytes for Peanuts! Making sense of “Ambient Data” SQL Server Stream Insight Ing. Eduardo Castro, PhD Comunidad Windows ecastro@grupoasesor.net http://ecastrom.blogspot.com
2. Key Takeaways… Massive shift in how we process data Incredible data volumes Remaking how we discover Changing the Scientific Method Reducing latency & impedance Extreme Scale Data Processing Stream Processing (Several Views) From “programs” to “queries” What’s up with this “anti-SQL” stuff anyhow?
3. 1997 Storage Cost: $~1.00 Transfer Time: ½ hour 2009 Storage Cost: ~0.1₵ Transfer Time: 8 sec. 1982 Storage Cost: $~2000 Transfer Time: 1 day “Free” Storage Power
4. Ambient Data? Over 84 percent of Americans have cell phones, according to Steve Largent, president and CEO of CTIA. While two trillion minutes were used in 2007, an 18 percent increase over 2006 talk times. More than 48 billion text messages were sent in the month of December 2007, an average 1.6 billion messages per day. The rate of text messaging represented a 157 percent increase over December 2006 texting. http://www.clickz.com/3628985 Text Message Traffic in US: 160GB / day 58TB / year Voice traffic in US (GSM encoding) 200PB / year
5. The Old World Data volumes constrained by human typing speed App & Data formed closed system App Assume 200M people in US typing 8 hr / day @ 10K keystokes / hour: 2TB/hror ~6PB / year DB
6. The Old New World Available data exploded Available Data Questions toAnswer What data shouldwe throw out? Design Schema Design ETL What if we have a new question? DW Nirvana!
7. The New World of Abundant Data Save All Available Data Hypothesize Theorize Test New Question to Answer AlgorithmicProcessing Run “query” over data… Exploit Correlation… Correlation is Enough! Analyze reduced data The CMS front end of the Large Hadron Collider records 1TB/sec! http://blogs.discovermagazine.com/cosmicvariance/2006/09/27/lhc-factoids/ Interesting Read: The Petabyte Age: Because More Isn't Just More — More Is Different http://www.wired.com/science/discoveries/magazine/16-07/pb_intro
8. Analyze Model Monitor 1 Event Stream both stored and processed Event Processing Engine 4 Produce real time alerts and action Event Stream Alerts & Action 3 Models installed in event processing engine Correlation Model 2 Analysis produces event correlation models Analysis
9. Extreme Scale Data Processing Source DW Traditional Data Warehouse Source Source ETL Source Source Analysis / Reporting Source Source Extreme ScaleData Processing DW Non-traditional Sources 1 2 Majority of data filtered or discarded All data retained and reprocessed Analysis / Reporting Analysis
10. SQL Server 2008 R2 – StreamInsight Technology Data volumes are exploding with event data streaming from sources such as RFID, sensors and web logs The size and frequency of the data make it challenging to store for data mining and analysis. The ability to monitor, analyze and take business decisions in near real-time
11. SQL Server StreamInsight’s SQL Server StreamInsight’s ability to derive insights from data streams and act in near real time provides significant business benefits. Some of the possible scenarios include: Algorithmic trading and fraud detection for financial services Industrial process control (chemicals, oil and gas) for manufacturing Electric grid monitoring and advanced metering for utilities Click stream web analytics Network and data center system monitoring.
14. Events Represent the user payload along with temporal characteristics Streams Sequence of events Flows into (one or more) standing queries in StreamInsightengine Queries Operate on event streams Apply desired semantics on events Adapters Convert custom data from event sources to / from StreamInsight events Key Concepts
15. Event Complex Event Processing (CEP) is the continuous and incremental processing of event streams from multiple sources based on declarative query and pattern specifications with near-zero latency. request output stream input stream response What is CEP?
16. Latency Relational Database Applications CEP Target Scenarios Operational Analytics Applications, Logistics, etc. Data Warehousing Applications Web Analytics Applications Manufacturing Applications Financial Trading Applications Monitoring Applications Aggregate Data Rate (Events/sec) Event Processing Scenarios
17. Use Case: Customer Segmentation Analysis of Click Streams on MSN.com Web Server log streamed into StreamInsight Categorizing user behavior based on URL: Click targets Search keywords Segmentation of user IDs into markets Adapting navigational structure and ad placement in real time Patterns over time windows: user first clicks PageA, then PageB, then PageC within X seconds High performance requirements Millions of online users Low latency (seconds) Possible late events
18.
19. Use Case: NBC Sunday Night Football 1 Telemetry Receiver 4 StreamInsight Listener Adapter GeoTag and group by region SQL Adapter PerfCounter Adapter 2 Count total events Count session starts Count active sessions 3
20. Use Case: Data Center Power Consumption Visualize Process Information Complex Aggregations/ Correlations Central time series archive Query ETW Input Adapter Query 2 1 Query Power Meter Input Adapter 3
21. ChallengesHow do I … detect interesting patterns? reason about temporal semantics? correlate data? aggregate data? avoid writing custom imperative code? create a runtime environment for continuous and event-driven processing? As a developer, I need a platform!
22. Query Expressiveness Selection of events (filter) Calculations on the payload (project) Correlation of streams (join) Stream partitioning (group and apply) Aggregation (sum, count, …) over event windows Ranking over event windows (topK)
23. Projection Filter Correlation (Join) Aggregation over windows Group and Aggregate Query Expressiveness var result = from e ininputStream group e by e.id intoeachGroup from win ineachGroup.TumblingWindow( TimeSpan.FromSeconds(10)) selectnew { eachGroup.Key, avg = win.Avg(e => e.W) };
24. Conclusion CEP Platform & API Event-triggered, fast Computation API for Adapters, Queries, Applications Declarative LINQ Flexible Adapter API Extensible Supportability
Data volumes are exploding with event data streaming from sources such as RFID, sensors and web logs across industries including manufacturing, financial services and utilities. The size and frequency of the data make it challenging to store for data mining and analysis. The ability to monitor, analyze and act on the data in motion provides significant opportunity to make more informed business decisions in near real-time
NBC Sunday Night Football: live streaming through SilverlightRich client experience, multiple camera anglesNeeded: track, monitor, analyze user behavior, based on silverlight Media analytics