4. What is Big Data?
❏ The 3 Vs
❏ Velocity
❏ Volume
❏ Variety
Image Source: http://akrayasolutions.com/big-data/
5. Where does it originate from?
• Machine logs
• Social media
• Archives
• Traffic information
• Weather data
• Sensor data (IoT)
6. What do I do with it?
Create intelligence..
• Should I take an umbrella to work today?
• What is the best route to go back home?
• What are the current market trends?
• Are my servers running healthily?
7. Protocols used to publish data..
• HTTP
• MQTT
• Zigbee
• Thrift
• Avro
• ProtoBuf
8. How to store the data?
• Relational databases
• Block data stores
-> HDFS
• Column oriented
-> HBase
-> Cassandra
• Document based
-> MongoDB
-> CouchDB
• In-Memory
-> VoltDB
A
C P
9. How to analyse data?
• Two options:
-> Batch processing: Schedule data processing jobs
and receive the processed data later
-> Real-time processing: The queries are executed
and the results are retrieved instantly
10. Analysing data..
• Batch processing
-> Apache Hadoop: Map/Reduce processing system
and a distributed file system
11. Analysing data..
• Batch processing - Data Warehouse
-> Apache Hive - Hadoop based framework for working
on large scale data stores with SQL-like queries
INSERT OVERWRITE TABLE UserTable SELECT userName, COUNT(DISTINCT
orderID),SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0"
GROUP BY userName;
16. Big Data Architecture with WSO2..
• Data Streams
{
'name':'phone.retail.shop',
'version':'1.0.0',
'nickName': 'Phone_Retail_Shop',
'description': 'Phone Sales',
'metaData':[
{'name':'clientType','type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
The common stream format used in both CEP and BAM; The stream
definition contains the stream name, version and other attributes that
makes up the stream.
17. Big Data Architecture with WSO2..
• WSO2 BAM
-> Data Receiver - High performance binary format data
publishing with Apache Thrift, shared with WSO2 CEP
-> Data Storage - Cassandra for highly scalable data store
-> Data Analyzer - Hive based batch processing
18. Big Data Architecture with WSO2..
• WSO2 BAM..
-> Activity Monitoring: Implemented using a custom indexing
mechanism to instantly search for events of a specific activity in
the system
19. Big Data Architecture with WSO2..
• WSO2 BAM..
-> Incremental Data Processing - Customized Hive to support
incremental data processing:
@Incremental (name="salesAnalysis" , tables="PhoneSalesTable")
SELECT brandname,
Count(DISTINCT orderid),
Sum(quantity)
FROM phonesalestable
WHERE version = "1.0.0"
GROUP BY brandname;
20. Big Data Architecture with WSO2..
• WSO2 CEP
-> Same data receiver as BAM, where this is the point where the
same event is sent to both servers, where BAM for batch
processing and CEP for real-time processing of the same data
streams
-> Real-time in-memory processing, based on WSO2 Siddhi
engine, with data adapters for receiving and sending event with
different data types and transports, e.g. XML, JSON, Text, HTTP,
JMS, SMTP