SlideShare uma empresa Scribd logo
1 de 87
1
© Jerome Kehrli @ niceideas.ch
Introduction to
Modern Software Architecture
2
Part I – Software Architecture Models
1.1 Introduction to Software Architecture
1.2 Our illustration example
1.3 The Kruchten 5 + 1 View Model
1.4 The OCTO Matrix Approach
Part II - Modern Architectures
2.1 Big Data
2.2 The Death of the Moore Law
2.3 The CAP Theorem
2.4 NoSQL / NewSQL
2.5 Hadoop
2.6 Data Lake
2.7 Streaming Architecture
2.8 Lambda Architecture
2.9 Big Data 2.0 & Kubernetes
2.10 Microservices Architecture
Part III - Takeaways
Agenda
3
1.1 Introduction to Software Architecture
4
Definitions 1/3
A software system's architecture is the set of principal design decisions about the system
Software architecture is the blueprint for a system's construction and evolution
Design decisions encompass the following aspects of the system under development
Structure,
Behaviour,
Interactions,
Non-functional properties
Taylor 2010
"Principal” implies an a degree of importance that grants a design decision an
"architectural status".
This implies that not all design decisions are architectural. As such, these do not
necessarily impact a system's architecture.
How one defines principal depends on what the stakeholders define as the system
goals.
5
Definitions 2/3
An architecture is
the set of significant decisions about the organization of a software system,
the selection of the structural elements and their interfaces by which the system is
composed
together with their behavior as specified in the collaborations among those elements,
the composition of these structural and behavioral elements into progressively larger
subsystems,
and the architectural style that guides this organization, these elements and their
interfaces, their collaborations, and their composition.
RUP – Rational Unified Process
6
Definitions 3/3
In most successful software projects, the expert developers working on that project have a
shared understanding of the system design. This shared understanding is called
‘architecture’. This understanding includes how the system is divided into components and how
the components interact through interfaces.
Architecture is about stuff that’s hard to change later
Ralph Johnson
Neal Ford
Architecture is about the important stuff
Martin Fowler
7
Sidenotes
Any organization that designs a system (defined broadly) will produce a design whose structure
is a copy of the organization's communication structure.
Melvin E. Conway (Conway's law)
... all models are approximations. Essentially, all models are wrong, but some are useful.
However, the approximate nature of the model must always be borne in mind...
George Box
8
Software Architecture is
A Process : to design a high-level solution
A Product : schemas, models, documentation, prototypes
Means : frameworks, libraries, middleware, etc. to ease implementation
of large systems
A Reality : the working software or Information System
My View
9
Different Kind of Architectures
Enterprise Architecture Solution / Application Architecture
Enterprise Architecture defines the way the enterprise uses
several applications.
Metaphor : City Planning / City Map
Focus : Strategy / Business
Some Key Concerns:
- Uncover operational gaps
- Understand data-dependencies across the IT landscape
- Understand Interactions between Solutions / Applications
- Streamline the application landscape for optimal
performance
- Decommissioning of legacy solutions
- Eliminate redundancies
- Identify and avoid tech risks
Application architecture defines the various pieces that
compose an application
Metaphor : Building / House Architecture
Focus : Technology / Functional
Some Key Concerns:
- Define a best-fit solution for identified problems
- Ensure solution meets functional and non-functional
requirements
- Understand how application supports business
capabilities
- Understand functional fit, technical fit and risks
- Implement technical processes for Application
development
10
Architecture or Design
Architecture Design Implementation
Abstraction Fine Granularity / Reality
Process of creating High-level
structures of a software system
Converts the software
characteristics into a high-level
structure
Micro-services, serverless,
streaming, lambda are some
software architecture patterns
Helps define high-level structure
of the software system
Process of creating a form of
specification of a software artifact
that helps implement the software
Describes all units of a software
system to support coding
Creational, structural and
behavioural are some types of
software design-patterns
Helps implement the software
11
2 different visions of architecture
12
1.2 Our illustration example
13
Example – product vision canvas – RIA Organizer
14
Example – Story Map - RIA Organizer
15
1.3 The Kruchten 5 + 1 View Model
16
Philippe Kruchten defined a 4+1 Views Model to capture the description of Software
Architecture into multiple complementary views
in 1995 when he was working for Rational Software Corp.
The 4+1 views model is an information organization framework; it consists of logical,
process, development, and physical knowledge of an application, and end-user perspective
information.
A view is an aspect (subpart) of information.
A notion is a way of representing information.
The 4 + 1 Kruchten Views Model
Philippe Kruchten, Architectural Blueprints—The “4+1” View Model of Software Architecture
The “4+1” view model is rather “generic”: other notations and tools can be used, other design
methods can be used, especially for the logical and process decompositions, but we have
indicated the ones we have used with success.
Conceptual / Logic Physical / Operational
Non-functional
Functional
Logical / Structural View Implementation / Development View
Process / Behaviour View Deployment / Physical View
The logical view is concerned with the functionality
that the system provides to end-users.
UML Diagrams used to represent the logical view
include Class diagram, Communication diagram,
Sequence diagram.
The development view illustrates a system from
a programmer's perspective and is
concerned with software management. This
view is also known as the implementation view.
It uses the UML Component diagram to
describe system
components.
UML Diagrams used to
represent the development view
include the Package diagram.
The process view deals with the
dynamic aspects of the system,
explains the system processes and
how they communicate, and focuses on the runtime
behavior of the system. The process view addresses
concurrency, distribution, integrators, performance,
and scalability, etc. UML Diagrams to represent
process view include the Activity diagram.
The physical view depicts the system
from a system engineer's
point-of-view.
It is concerned with the topology of
software components on the physical layer, as
well as communication between these
components.
This view is also known as the deployment view.
UML Diagrams used to represent physical view
include the Deployment diagram.
Use Case / Scenario View
The description of an architecture is illustrated using a
small set of use cases, or scenarios which become a
fifth view. The scenarios describe sequences of
interactions between objects and / or processes. They
are used to identify architectural elements and to
illustrate and validate the architecture design. They also
serve as a starting point for tests of an architecture
prototype. UML Diagram(s) used to represent the
scenario view include the Use case diagram.
Conceptual / Logic Physical / Operational
Non-functional
Functional
Process / Behaviour View
Perspective: System Integrators
Stage: Design
Focus: Process decomposition
Concerns: Performances, Scalability,
Throughput, Synchronization, Concurrency
Artifacts:
- Sequence Diagrams / Activity Diagrams
- Communication / interactions diagrams
- State Machine Diagrams
- Timing Diagrams
Logical / Structural View
Perspective: End Users , Business
Analysts
Stage: Requirements Analysis
Focus: Components / Objects / Services
Model - Decomposition
Concerns: Functionality
Artifacts:
- Functions Schema
- Class / Objects Diagram
- (composite) Structure Diagram
- State Machine
Implementation / Development View
Perspective: Developers,
Designers
Stage: Design
Focus: Subsystem
decomposition
Concerns: Software /
Configuration Management
Artifacts:
- Components Diagram
- Package Diagram
Deployment / Physical View
Perspective: System Engineers
Stage: Design
Focus: Software mapping to
Hardware (deployment)
Concerns: System Topology,
Delivery, Installation,
Communication
Artifacts:
- Deployment diagram
- Network / Cluster topology
(not UML)
Use Case / Scenario View
Perspective: End User
Stage: Putting it all together
Focus: Understandability , usability
Concerns: Feature Decomposition
Artifacts:
- Use-case diagrams
- User Stories (not UML)
- Story Maps (not UML)
19
RIAO Logical View
20
RIAO Process View – Send Email
21
RIAO Process View – Fetch new Emails
22
RIAO Implementation View
23
User Computer RIA Server
Tomcat (Spring Boot)
RIAO Physical View
Apache
Proxy
Web browser
RIAO UI RIAO Backend
HTTPS
Courier Server
Courier / Debian
POP3
SMTP
HTTP
(User OS) Debian Linux
MongoDB
Node
MongoDB
Node
Mongo Node
Docker
Debian Linux
Integration
Processing / Business
Presentation
FirewallD
Open JDK 11 / JVM
Loc. Storage
Internet Internal Network
SystemD
Kubernetes Cluster
K8s
service
Locator
24
1.4 The OCTO Matrix Approach
25
OCTO Technology designed in 2010 a matrix that presents a 360 overview of
most-if-not-all questions, concerns and aspects that need to be answered and
addressed when defining a Software Architecture
The OCTO Architecture Matrix
The questions and concerns are
related to different levels of
architecture:
Functional
Application
Technical
System
They regroup different perspectives:
Security
Usage
Services
Data
Exchanges
Security Usage Services Data Exchanges
Procedures / Specifications Schema / Models / Catalogs Technical Documentation
Functional
Perspective CONFORMITY
Procedures and rules aimed at
ensuring the security of the
components
Code of conduct, role rights, user
groups, documented procedures,
disaster recovery plans,
geographical accesses strategy, …
USAGE
Use cases per persona: Customer,
Partners, Advisors, etc.
User profiles, User experience,
work ergonomics, internet and new
channels strategy, digital strategy,
Business cases, Business Use
Cases, etc.
FUNCTIONS
Functions and management
rules within the company
Functional architecture schema,
Functional map, operational
processes, management rules,
calculation rules, user guides,
etc.
INFORMATION
Information handled within the
company
Data architecture, Data
governance, functional dictionary,
data models, data modeling rules,
etc.
PROCESSES AND EXCHANGES
Processes and exchanges
internal and with partners.
Data flows, modeling, internal and
external exchanges, functional
workflows, operational data
workflows, etc.
Application
Perspective
SECURITY
Components aimed at ensuring the
security of the Information System
Authentication, Identification,
Authorization, management of
credentials and access rights,
provisioning, audit trails, security
procedures referential, etc.
USAGE FLOW
Applications / Modules /
Components accessed by users
Business Software Components
accessed by users, internet /
intranet / mobile portals, call
center, BI systems, mail and
internal services, etc.
PROCESSING
Components implementing the
processing of the IS
Application schema and map,
processes model reference,
Business Model, micro-services,
management rules model
reference, etc.
DATA SILOS
Data repositories, data
referentials, etc.
Data referential, Data dictionary,
Data warehouse, Data model
reference, Document model
reference, audit trails, archives,
etc.
DATA FLOWS
Data exchanges processes and
means
Interoperability dictionary,
exchanges standards and formats,
document edition, APIS reference,
External APIS, , etc.
SECURITY FRAMEWORK
Technical means and components
implementing security principles
Authentication mechanisms, Rights
management components,
confidentiality protocols and
means, cryptographic means etc.
Technical
Perspective
GUI FRAMEWORK
Technical means and components
providing the GUI and user tools
GUI technologies, Reporting
tools, GUI design model, tools and
standards, etc.
SERVICES FRAMEWORK
Technical means for executing the
services
Technological and app server
stacks, processing middleware(s),
technical layers, frameworks,
toolkits, libraries, rules engines,
external components, etc.
DATA FRAMEWORK
Technical means for accessing
and storing data
DBMS, LDAP technologies,
Data modeling tools,
etc.
EXCHANGES FRAMEWORK
Technical means for exchanges
EAI, ESB, ETL, Workflow engines,
API frameworks and libraries, file
transfers, flow design, etc.
System
Perspective
SYSTEM SECURITY
Physical means and tools
implementing the network and
security
LAN, WAN, remote access, VPN,
firewall, DMZ, proxy, I/O hub,
journals, supervision tools,
authentication dongles, etc.
USER DEVICES AND MEANS
User equipment (PC, IP phone,
tablets, ...)
Communication means, user
computer, Office servers, remote
access middleware and software,
etc.
PROCESSING
INFRASTRUCTURE
Processing servers
and middleware
Servers, Datacenters, Load
balancers, Proxies, Clusters,
Monitoring Systems, Clouds, SLAs,
DRP, etc.
STORAGE INFRASTRUCTURE
Data Servers and Middleware
Data Servers, SAN, Archiving,
Robots, Storage Servers, RAID,
etc.
EXCHANGES
INFRASTRUCTURE
Middleware and Tools
Exchange Servers, Clustering
Middleware, Big Data Engines,
SLA,s Transfer Monitors, Service
Contract, DRP, Flow Management,
Replay, Monitoring, etc.
Source : https://fr.slideshare.net/OCTOTechnology/2012-pdj-banque-du-futur-2020
© OCTO Technology
27
RIAO Functional Architecture
Email
Management
Contact
management
Email Search
Search
Email
Display
/
Edition Folder
Management
Global App. Email Application
Appointment
Display
/
Edition
Calendar
Display
/
Edition
Calendar Application
Folder
Display
/
Edition
Contact
Display /
Edition
Calendar
management
Appointment
Management
Contact
Search
Calendar
Search
Contact App.
Login
User
Management
Appointment
Mapping
Business /
Entry Points
User
Interactions
Services
&
Functions
Mgmt.
Search
Text
Compos.
HTML
Comp.
RTF
Compos.
Text
Display
HTML
Disp.
RTF
Display
Text
Compos.
HTML
Comp.
RTF
Compos.
Text
Display
HTML
Disp.
RTF
Display
Attachment
Management
Email
IO
28
GridFS
Folder
Model
Email
Docs.
Attachem-
ent files
Appoint-
ment Docs.
Calendar
Model
Contact
Documents
User
Model
SMTP
Server
POP3
Store
RIAO
Backend
RIAO
UI
RIAO Application Architecture
Search
Email IO
Appointment Mapping
User
Model
Folder
Model
Email
Model
Calendar
Model
Appointm.
Model
Contact
Model
Attachem.
Model
Appointm.
Search
Email
Search
Contact
Search
Search Mgmt.
User
Mgmt.
Folder
Mgmt.
Email
Mgmt.
Calendar
Mgmt.
Appointm.
Mgmt.
Contact
Mgmt.
Attachem.
Mgmt.
Email Synchronization
Deleg.
Search
Service
User
Service
Email Service Calendar Service
Contact
Service
Login
Page
Profile
Edition
Folder
View
/
Edit
Email
Compos.
Email
View
/
Edit
Calendar
View
/
Edit
Appoinlt.
Compos.
Appoint.
View
/
Edit
Contact
View
/
Edit
Contact
Model
CRUD Fetch
Send
Search Page
REST API
Data
/
Exchanges
Integration
Busi-
ness
Presentation
APIs and Process Orchestration
CRUD
RTF
Display
RTF
Compos.
HTML
Compos.
Deleg.
Email Application Calendar App.
Contact
App
Text
Compos
Email
Model
Calendar
Model
Email Controller Calendar Controller Contact Ctr.
Search Control.
User Ctrl.
Loc. Storage
Main Page
29
User
tier
Proces-
sing
tier
Integration
Tier
Web
browser
RIAO
UI
RIAO
Back.
RIAO Technical Architecture
HTTP
UI Controllers
JAX-RS / HTTPS
Java VM
Apache Proxy
Business managers
MongoDB Client
Views
JQuery CKEditor
Bootstrap
Business Services
DAOs
SMTP Client POP3 Client
Courier / Debian
IO Management
SMTP POP3
Spring
Boot
/
Tomcat
8
Runtime
Forms
Models
Linux
Debian
Spring
Security
Spring
Framework
Apache
Commons
SSL Cert.
Local
Store
Sess.
Ckie.
JAX-RS / HTTP
Main
Page
Obj./JSON
Map.
JSON / Object Mapping
30
User Computer RIA Server
Tomcat (Spring Boot)
RIAO System Architecture
Apache
Proxy
Web browser
RIAO UI RIAO Backend
HTTPS
Courier Server
Courier / Debian
POP3
SMTP
HTTP
(User OS) Debian Linux
MongoDB
Node
MongoDB
Node
Mongo Node
Docker
Debian Linux
Integration
Processing / Business
Presentation
FirewallD
Open JDK 11 / JVM
Loc. Storage
Internet Internal Network
SystemD
Kubernetes Cluster
K8s
service
Locator
31
2.1 Big Data
32
The era of power
Cray 2 / 1985 / ~1.9 GigaFlops Samsung S6 / 2015 / ~30 GigaFlops
Source : https://pages.experts-exchange.com/processing-power-compared
33
Origins of Big Data : the web giants !
34
Data deluge
5 exabytes of data
(5 billions of gigabytes)
has been generated
since the first measurements
until 2003,
In 2011, this
quantity was
generated in 2
days
In 2018, this
quantity was
generated in
2 minutes
Source: https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
35
Our architectures are 30 years old !
Corporate Operational Data
Internal GUI Space
Operational / Live Audit / Logs Archived Data …
Ext. Data
Staging Database
…
ETL
ETL
Datawarehouse
Storage
Cleaning / Cleansing / Enrichment / Remapping
Historize
Query
ETL
Reporting / Analytics / Querying
Data
Mart
Data
Mart
Data
Mart
Operational Application Space
Online
Business
Applications
Batch
Business
Applications
Monitoring /
Operation
Applications
External GUI Space
DMZ
Web
Apps
Desktop
Apps
Web
Apps
Mobile
Apps
Operational Information System Analytical Information System / Business Intelligence
36
2.2 The death of the Moore Law
37
The Moore law
“The number of transistors and
resistors on a chip doubles every
24 months”
- Gordon Moore, 1965
38
Technical capacitites evolution
For the 40 years, the IT component capabilties grew exponentially
The Moore law!
Source :
http://radar.oreilly.com/2011/08/building-data-startups.html
39
Storage cost evolution
While the unit cost is decreasing…
0.01 $
0.10 $
1.00 $
10.00 $
100.00 $
1,000.00 $
10,000.00 $
100,000.00 $
1,000,000.00 $
10,000,000.00 $
1975 1980 1985 1990 1995 2000 2005 2010 2015
Hard Drive
RAM
Source :http://www.mkomo.com/cost-per-gigabyte
2012
5$/GB
1982
5M$/GB
40
41
Disk throughput evolution
Issue : The throughput evolution is always lower than the capacity evolution
How read/write more and more data through an always thicker pipe?
Gain : x100 000
Capacity Gain:
x 10’000
In 15 years
Throughput Gain:
x 50
In 15 years
42
New architectures and paradigms
Key
Idea #1
Key Idea #2
Key Idea #3
Since the data is to big to
fit one computer,
distribute it among many
computer (partitioning /
sharding) !
Run transaction and computation in
parallel on multiple (many!) nodes
and scale at the multi-datacenter
level the grid of CPU, RAM and
HDD
Move the code to the
data node, not the data
to the computing node
(Data tier revolution)
43
2.3 The CAP Theorem
44
The early days of digital data …
Before 1960, the data within a Computer Information
System was mostly stored in rather flat files
(sometimes indexed) manipulated by top-level software
systems.
Directly using flat files was cumbersome and painful…
Various needs emerge at the time :
Data isolation
Access efficiency
Data integrity
Reducing the time required to develop brand new
applications
 Something else was required …
A bit of history …
45
The relational model rules for 40 years !
E.g. an Exam Grade management app :
Display the subject of a student on his profile
screen, one needs to
1. Extract the personal data from the
“student” table
2. Fetch its subject if from the relation table
3. Read the subject title from the “subject”
table.
Enters the Relational Model …
1969 / Edgar F. Codd - RDBMS
Entities as Tables & associations
The relational model reduces redundancy to optimize disk
space usage
At the time of its creation
Disk storage was very expensive and limited
The volume of data in the Information Systems was rather
small
 avoid redundancy to optimize disk space usage, thanks to
guaranties of :
Structure: using normal design forms and modeling
techniques
Coherence: using transaction principles and mechanisms
Why, oh why, to separate these 2 kind of information since in 95% of the use
cases around these data, both will always be used together ?!?
46
The mid and late 2000’s were times of major changes in the IT landscape
Hardware capabilities significantly increased
eCommerce and internet trade, in general, exploded
Some internet companies, so-called the “Web giants” (Yahoo!, Facebook, Google, Amazon,
Ebay, Twitter, …), pushed traditional databases to their limits. Those databases are by
design hard to scale
With relational DBMSes, the only way to improve performance is by scaling up, i.e. getting
bigger servers (more CPU, more RAM, more disk, …). One eventually hits a hard limit
imposed by the current technology
The origins of NoSQL
Faster
More storage
More reliable
Investments
Hard limit
From a certain point,
investments yield little
improvement
Database server
Scaling up:
47
By rethinking the architecture of databases, those companies were able to make
them scale at will, by adding more servers to clusters instead of upgrading the
servers.
The servers are not made of expensive, high-end hardware; they are qualified as
commodity servers (or commodity hardware)
The origins (cont’d)
Faster
More storage
More reliable
Investments
Power grows linearly
with the number of
servers (linear
scalability)
Scaling out:
Database cluster
48
This is the essence of Big Data !
With most NoSQL databases, the data is not stored in one place (i.e. on one server). It is distributed
among the nodes of the cluster. When created, an object A is assigned to a node in the cluster. This is
called sharding – the amount of data assigned to a node is called a shard (also called partition)
Having more cluster nodes implies a higher risk of having some nodes crash, or a network outage splitting
the cluster in two. For this reason, and to avoid data loss, objects are also replicated across the clusters
The number of copies, called replicas, can be tuned. 3 replicas is a common figure
Data distribution
A B
C
D
A
A
B
B
C
C
D
D
The objects may move, as nodes crash or new nodes join the cluster, ready to take charge of some of the
objects. Such events are usually handled automatically by the cluster; the operation of shuffling objects
around to keep a fair repartition of data is called rebalancing
49
The CAP Theorem
Consistency
All clients see the exactly the same
data at the same time, even in the
presence of an update (ACID
Properties)
Availability
The system continues
to operate and all
clients can see “a
version” of a replica,
even in the presence of
node failure
Partition-
tolerance
The system continues to
operate even when the
system is partitioned (some
nodes are unavailable)
AC CP
AP
Not
Possible
Availability
The cluster is available if a
request made by a client is always
acknowledged by the system, i.e.
it is guaranteed to be taken into
account
That doesn’t mean that the
request is processed
immediately. It may be put on
hold. An available system will
at a minimum acknowledge it
Client
Request
Acknowledgement
?
Partition tolerance
Partition Tolerance is verified
if a cluster can stand a
partition; if it continues to
operate when one or several
nodes disappear. (nodes crash,
network equipment down, etc.)
Partition tolerance is related to
availability and consistency, but
it is still different. It states that
the system continues to
function internally (e.g. ensuring
data distribution and
replication), whatever its
interactions with a client
Consistency
Consitency refers to the fact that all replicas
of an entity, identified by a key in the
database, have the same value
whatever the node queried
old version
new version
new version
new version
Client
Update
50
The previous 3 properties, Consistency, Availability and Partition tolerance, are not independent. The CAP
theorem - or Brewer’s theorem - states that a distributed system cannot guarantee all 3 properties at the
same time
This is a theorem. That means it is formally true, but in practice it is less severe than it seems
The system or a client can often choose CA, AP or CP according to the context, and “walk” along the chosen
edge by appropriate tuning
Partition splits happen, but they are rare events (hopefully)
Rule of thumb
Traditional relational DBMSes are CA or CP – consistency is a must, in case of a problem either bring the
cluster down or split it and expect heavy synchronization later
Many NoSQL DBMSes are AP – availability is a must, and with big clusters failures happen so better live with
it. Consistency is only eventual
The CAP theorem
Consistency
Availability Partition-
tolerance
AC CP
AP
Not
Possible
51
This is essential !
Consistency refers to the fact that all replicas of an entity, identified by a key in
the database, have the same value whatever the node queried
With many NoSQL databases, the prefered working mode is AP and all-the-time
consistency is sacrificed.
Favoring performance, updates take a little time to propagate across the cluster. When
an entity’s value has just been created or modified, there is a short span during which
the entity is not consistent.
However the cluster guarantees that it will eventually be, when replication has
occurred. This is called eventual consistency
Eventual Consistency
52
2.4 NoSQL / NewSQL
53
A NoSQL - originally referring to "non-SQL" for "non-relational“ - database provides a mechanism for storage
and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st
century, triggered by the needs of Web 2.0 companies.
NoSQL databases are increasingly used in Big Data and Real-Time Web applications.
NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query
languages or sit alongside SQL databases in polyglot-persistent architectures.
NoSQL / NewSQL
The fundamental idea behind NoSQL is as follows:
because of the need to distribute data (Big Data), the Web giants have abandoned the whole idea of
ACID transactions (only eventual consistency is possible)
So if we drop ACID Transactions - which we always deemed to be so fundamental - why wouldn't we
challenge all the rest - the relational model and table structure?
Wikipedia - https://en.wikipedia.org/wiki/NoSQL
54
For data fundamentally structured as tabular data et of a
manageable size, the relational model fits.
For instance:
Accounting Data
Customer information
But some other data are modeled in a much more complex way
Geospatial data
Molecular models
Some underlying notions there are fundamentally not relational
Hierarchical data
Several levels of interconnections
In addition, some data models have a high volatility and required
flexibility over time
Information available at the time of the creation of the model are
sometimes incomplete
Or there inherent structure changes over time
The relational model is not well suited for data experiencing constant structural changes
The relational model is not always well suited
55
NoSQL Database Types : 4 families
Document-oriented
(e.g. MongoDB, ElasticSearch)
Key/Value pairs
(e.g. Redis)
Graph
(e.g. Neo4J)
Column-family aka BigTable
(e.g. Cassandra)
56
NoSQL Database Types
Document-oriented (e.g. MongoDB, ES)
Key/Value pairs (e.g. Redis)
Graph (e.g. Neo4J)
Column-family aka BigTable (e.g. Cassandra)
One key has one (and only one) value
The Value type is not specified (Object value)
A Value may have different type
Issue : difficult to fit a model in this modeling pattern
Row = a set of columns
Sorted vertical storage
Operations
Query by key or set of key
Allowing query on secondary indexes
Selection of the resulted columns
The column-family model looks a bit like the relational model
For a given row, the contents of a column can thus be seen as a hash table
with arbitrary (key, value) pairs
Each row in a table is uniquely identified by a key
Documents are structured data in the form of
hierarchical trees (sub-documents)
Data can be of various types
Strings, numbers, arrays
Documents are self-supporting
It contains meta-data about the structure and the
corresponding values
Several storage formats for the document
XML, JSON, BSON
In this model, objects are documents, i.e. trees of
values
Each document has a root and attributes
Attribute values are scalars (integers, strings), lists
or other objects
Each object has a unique ID, a conventional
property whose value serves as a key
Objects are organized into collections. Objects in the
same collection don’t need to have the same schema
– there is no mandatory structure
Based on the interconnection of data (contrary to the other NoSQL
solutions which do not support relations)
Data are not only linked to nodes but also to edges (property graph)
57
Examples of NoSQL data models
Document-oriented (e.g. MongoDB)
{ ‘_id’: 123456,
'type': 'product',
'name': 'computer',
'features': {
'cpu_GHz': 3,
'ram_GB': 8,
'brand’: 'Dell'
} },
}
{ ‘_id’: 123457,
'type': 'product',
'name': 'blender',
'features': {
'rpm': 10000,
'voltage’: '220V 50 Hz'
} },
}
{ ‘_id’: 123458,
'type': 'user’,
'login': ’choupi92',
'password': 'AZnx403==',
'shopping_history': [...]
}
OBJECTS
Key/Value pairs (e.g. Redis)
obj_123456 “type=product;name=computer;cpu_GHz=3;…“
obj_123457 “type=product;name=blender;rpm=10000;…“
obj_123458 “type=user;login=choupi92;password=…“
Graph (e.g. Neo4J)
choupi92
computer
blender
hightech
kitchen
category
category
Column-family aka BigTable (e.g. Cassandra)
123456
123457
computer
blender
cpu_GHz=3 ram_GB=8
rpm=10000
brand=Dell
voltage=220V 50
Hz
name
_id
PRODUCTS
features
123458
login=
choupi92
password=A
Znx403==
08/09/13=… 10/09/13=…
_id
…
USERS
authent shopping_history
58
59
What is NewSQL ?
NewSQL refers to relational databases that have adopted upon some of the NoSQL genes, thus exposing
a relational data model and SQL interfaces to distributed, high volume databases
NewSQL, contrary to NoSQL, enables an application to keep
The relational view on the data
The SQL query language
Response times suited to transactional processing
Some were built from scratch (e.g. VoltDB), others are built on top of a NoSQL data store (e.g. SQLFire,
backed by GemFire, a key/value store)
The current trend is for some proven NoSQL databases, like Cassandra, to offer a thin SQL interface,
achieving the same purpose
Generally speaking, the frontier between NoSQL and NewSQL is a bit blurry… SQL compliance is often
sought for, as the key to integrating legacy SQL software (ETL, reporting) with modern No/NewSQL
databases
NewSQL?
60
2.5 Hadoop
61
Hadoop is an Open Source Platform providing
A distributed, scalable and fault tolerant storage system as a grid
Initially, a single parallelism paradigm : MapReduce to reuse the storage nodes as processing nodes
Since Hadoop V2 and YARN, other parallelization paradigms can be implemented on Hadoop
Schemaless and optimized sequential write once and read many times
Querying and processing DSL (Hive, Pig)
Hadoop ?
Hadoop is declined in
different distributions
Fondation Apache
Cloudera
HortonWorks
MapR
IBM
…
The Hadoop’s origins
Initiated by Doug Cutting, leader of Lucene
Based on the Google’s publications about their
indexing system (GFS / Map Reduce / BigTable )
Official Apache project since 2009
Hadoop was primarily intended for Big Data Analytics
Nowadays hadoop can be an infrastructure for much more
Microservices architecture (Hadoop V3)
Real-time Architectures
62
Hadoop Distribution
Hadoop overview
Distributed storage
MapReduce processing engine /
Parallel Computing Framework
Querying Orchestration
Machine learning /
Processing
IS
integration
Supervision
and
Management
Reporting
(Core)
63
Hadoop Distribution
Hadoop is an ecosystem
Hadoop
Console
Manager
(Core)
64
Hadoop Architecture
Client
Applications
Client
Applications
Client
Applications
Slave Node
HDFS Data Node
Map Reduce Task Tracker
YARN Node Manager
App Master
App
Container
R1 R2
P1
Secondary Master Node
Master Node
YARN
Resource Manager
HDFS
Name Node
Map Reduce
Job Tracker
HDFS Meta Data
YARN Meta Data
Slave Node
HDFS Data Node
Map Reduce Task Tracker
YARN Node Manager
App Master
App
Container
R1 R3
P2
Slave Node
HDFS Data Node
Map Reduce Task Tracker
YARN Node Manager
App Master
App
Container
R2 R3
P3
65
2.6 Data Lake
66
Vision of a data lake
With the continued growth in scope and scale of analytics applications using Hadoop and other data
sources, then the vision of an enterprise data lake can become a reality.
In a practical sense, a data lake is characterized by three key attributes:
Collect everything. A data lake contains all data, both raw sources over extended periods of time as well as
any processed data  big volumes
Dive in anywhere. A data lake enables users across multiple business units to refine, explore and enrich data
on their terms  you don’t know, a priori the analytical structures
Flexible access. A data lake enables multiple data access patterns across a shared infrastructure: batch,
interactive, online, search, in-memory and other processing engines.
As a result, a data lake delivers maximum scale and insight with the lowest possible friction and cost.
Data lake
A data lake is a system or repository of data stored in its natural/raw format
It's is usually a single store of data including raw copies of source system data, sensor data, social data
etc. and transformed data used for tasks such as reporting, visualization, advanced analytics and
machine learning.
It can include structured data from relational databases, semi-structured data (CSV, logs, XML,
JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).
Wikipedia - https://en.wikipedia.org/wiki/Data_lake
67
Datalake Application Architecture
Unstructured Data Storage
Semi-structured data storage
(NoSQL)
Structured Data storage (e.g.
relational)
Interactive Queyring Analytics / Processing Flow Processing
Machine Learning
Databases Raw files Application
logs
External Data / Open
APIs
Events /
Messages
Enterprise DWH Operational
System
Query /
Reporting
APIs / Services Events /
messages
DATA
LAKE
INGESTION
PUBLICATION
68
2.7 Streaming Architecture
69
Definition
A real time system is an event-driven system that is available, scalable and stable, able
to take decisions (actions) with a latency defined as … below the frequency of events
In a streaming architecture …
Historical data is regularly and consistently updated with live data
Live data is available to the end user
Both types or data (historical and live) are not necessarily presented consistently to the
end user
Both sets of data can have their own screens or even application
A consistent view on both sets of data would be proposed by Lambda Architecture (next topic in
this presentation)
Streaming Architectures
70
Complex Event Processing Engine
decision /
action
Transactional
Applications
BPM, ESB
Capture
Streaming Architecture
In memory states and
Calculations:
Time window,
operators, rules
Rules edition GUI
Cache / Distributed Cache
latency : 100 ms
Event/Condition/Action
Stream-based querying
multi-dimen. Analysis
…
Real-time Data GUI
Historical Data GUI
Structured
Events
Unstructured
Events
Reference Data, DWH,
Services Querying
Event
History
71
Complex Event Processing Engine
decision /
action
Transactional
Applications
BPM, ESB
Capture
Streaming Architecture
In memory states and
Calculations:
Time window,
operators, rules
Rules edition GUI
Cache / Distributed Cache
latency : 100 ms
Event/Condition/Action
Stream-based querying
multi-dimen. Analysis
…
Real-time Data GUI
Historical Data GUI
Structured
Events
Unstructured
Events
Reference Data, DWH,
Services Querying
Event
History
Stakes :
- Latency Management ( < 100 ms )
- Throughput( 10’000 msg / sec )
- Memory Consumption
- Balancing and Replication
- Fault Tolerance
- State coherence
- What about lost events ?
- Init from historical data
Stakes :
- Dynamical GUIs
- Data exploration and following axes and
criteria,
- Real-time GUI : event-driven of type « web-
push »
Stakes :
- High read performances in
respect to latency
- Good cache management
Stakes :
- High capacity
- High write performances
- High historical data querying
Performances
- Flexible Design abilities
Stakes:
- « WYSIWYG » editor, usable by business users
- « Hot » updates of rules
- Backtesting
Stakes
- Throughput (10’000 msg/sec )
- Fault tolerance : messages retry?
72
2.8 Lambda Architecture
73
Real-Time Analytics
What if I want real-time analytics ?
• Most Data Analytics software are batch processing solutions!
• So what happens with updates occurring while a batch is running?
• What happens between two of its executions ?
Objectives:
• Take all the data into account
• Be able to answer any kind of request
• Fault-tolerance
• Robustness to evolutions, errors
• Scalability !
• Low latency for writing AND reading
PROCESSED DATA
DATA THAT CAME AFTER THE
START OF THE CURRENT BATCH
Time
More or less a few
minutes to a few hours of
data
A few minutes to a
few hours of data
74
λ (Lambda) Architecture
CONSISTENT
BATCH ANALYTICS ON
COMPREHENSIVE DATA
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
STORAGE OF PRE-
COMPUTEDS RESULTS /
VIEWS OF THE DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
To Real-Time Analytics with Near-Real-Time background statistics and models
SPEED LAYER
BATCH LAYER Final latency
< 1second
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
The batch layer is responsible for consistency and data storage on the long term
The speed layer only analyzes the required time-window
The gap between the last batch execution and the latest real-time data  only most recent data.
Both layers produce the same output (unlike usual streaming architectures)
The serving layer provides a consolidated view on both results
75
λ (Lambda) Architecture
CONSISTENT
BATCH ANALYTICS ON
COMPREHENSIVE DATA
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
STORAGE OF PRE-
COMPUTEDS RESULTS /
VIEWS OF THE DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
Many solutions for all components
SPEED LAYER
BATCH LAYER
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
D3.js
HighCharts
Tableaux
Storm DRPC
Java API
Flink
76
κ (Kappa) Architecture
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
RELOAD OF PREVIOUS
RESULTS / VIEWS OF THE
DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
Recent Stream Processing Technologies render the batch layer less required
UNIFIED STREAMING LAYER / TECHNOLOGY Final latency
< 1second
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
Kappa architecture is a streaming-first architecture deployment pattern
With most recent Stream Processing technologies (Kafka Streams, Flink, etc.) the interest and relevance of the batch
layer tend to diminish. The streaming layer matches computation abilities of the batch layer (ML, statistics, etc.) and
stored data as it processes it.
A batch layer would only be needed to kick-start the system on historical data (Flink can do that)
77
2.9 Big Data 2.0 & Kubernetes
78
Big Data 2.0
2012
2011 2014
Nowadays in 2021 :
With Hadoop 3, these 3 technologies converge tend to converge to the same possibilities. Hadoop 3 supports
deploying jobs as docker containers just as Mesos and K8s
Mesos and Kubernetes can use alternatives to HDFS such as Ceph, GlusterFS, Minio, (of course Amazon,
Azure, …) etc.
However, Kubernetes (and/or technologies based on Kubernetes) emerge as a market standard for the
Operational IS just as Hadoop remains a market standard for the Analytical IS
79
Kubernetes is an Open Source Platform providing
Automated software applications deployment, scaling, failover and management across cluster
of nodes
Management of application runtime components as Docker containers and application units as Pods
Multiple common services required for service location, distributed volume management, etc. (pretty
much everything one requires to deploy application on a Big Data cluster)
Kubernetes
Kubernetes is emerging as a
standard as a
Cloud Operating System
Many distributions
PKS (Pivotal Container Service)
Red-Hat OpenShift
Canonical Kubernetes
Google / AWS / Azure …
…
Kubernetes origins
Based on Google Borg, (one of) Google’s
initial cluster management system(s)
Released as Open-Source component in
Google in 2014
First official release in 2015
80
Kubernetes Architecture
Client
Applications
Client
Applications
Client
Applications
(Secondary Master Node [HA])
(Master Node)
API Server
Control
Plane
Etcd
Key – Value Store
Controller Manager
Kubctl
Port
Forward
Load
Balanc.
Controller
Node
Kubelet
App
App App App App
POD
POD
Volumes
CR1 CR2 GR1 GR3
Ceph Gluster
Kube-Proxy
Docker
Node
App App
App App App
POD
POD
Volumes
CR2 CR3 GR1 GR2
Ceph Gluster
Docker
Node
App
App App App App
POD
POD
Volumes
CR1 CR3 GR2 GR3
Ceph Gluster
Docker
cAdvisor Kubelet Kube-Proxy
cAdvisor Kubelet Kube-Proxy
cAdvisor
KubeMQ
KubeMQ
KubeMQ
81
2.10 Microservices Architecture
82
Microservice architecture – a variant of the Service-Oriented Architecture (SOA) structural style – arranges an application
as a collection of loosely-coupled services. In a microservices architecture, services are fine-grained and the protocols are
lightweight. Its characteristics are as follows:
Services in a microservices architecture (MSA) are small in size, messaging-enabled, bounded by contexts,
autonomously developed, independently deployable, decentralized and built and released with automated
processes.
Services are often processes that communicate over a network to fulfill a goal using technology-agnostic protocols such
as HTTP.
Services are organized around business capabilities.
Services can be implemented using different programming languages, databases, hardware and software environment,
depending on what fits best.
Microservices Architecture
Origins of Micro-services:
As early as 2005, Peter Rodgers introduced the
term "Micro-Web-Services" during a presentation
at the Web Services Edge conference.
The architectural style name was really adopted
in 2012
Kubernetes democratized the architectural
approach
The two big players in this field are Spring
Cloud and Kubernetes
A Microservices-based architecture has the following properties:
Lends itself to a continuous delivery software development
process. A change to a small part of the application only
requires rebuilding and redeploying only one or a small
number of services.
Adheres to principles such as fine-grained interfaces (to
independently deployable services), business-driven
development (e.g. domain-driven design).
Wikipedia - https://en.wikipedia.org/wiki/Microservices
Martin Fowler
83
Microservices Architecture
Client
Applications
Client
Applications
Client
Applications
Master Node
API
Gateway
Service Catalog / Discovery
Management / Orchestration
Node
Node Mgmt.
Execution middleware
Service Proxy
Node Node
Distributed Storage
R1 R2
Distributed Storage
R1 R3
Distributed Storage
R2 R3
Execution middleware Execution middleware
Service B
Service C
Service A
Service D
Service E
Microservices
Node Mgmt. Service Proxy Node Mgmt. Service Proxy
MQ MQ MQ
Static Content
84
Ask yourself : do you need microservices ?
Microservices are NOT Big Data ! [co-local processing]
You don’t need microservices or Kubernetes to benefit from Docker
You’re not scaling anything with synchronous calls
Don’t do microservices unless:
You need independent service-level scalability (vs. storage / processing scalability – Big Data)
You need a strong SOA - Service-Oriented Architecture
You need independent services lifecycle management
Challenges
Distributed caching vs reloading the world all over again
Not all applications are fit for asynchronous communications (WYCIWYG)
Identifying the proper granularity for services
Enterprise architecture view is too big
Application architecture view is too fine
RIA Organizer : good candidates would be : EmailService, CalendarService, ContactService, SearchService
Data consistency without distributed transactions. Applications need to be designed with this in mind.
Weighting the overall memory and performance waste
A Spring boot stack + JVM + Linux Docker base for every single service ?
HTTP calls in between layers ?
Microservices discussion
85
3. Takeaways
86
The Strong frontier between Operational IS and Analytical IS vanishes
NoSQL, Streaming, Lambda and Kappa architectures are increasingly overflowing to the
Operational IS and as such provide a common ground for operational processes and
analytical processes.
Historically strong on the BI Side, Hadoop (V3) fits well nowadays for needs of the
Operational IS while Kubernetes can be useful on the Analytical IS
Kubernetes (also Mesos, etc.) is a cloud Operating System, but not only (distribution,
scaling  run your cloud locally)
Don’t do Micro-Services unless you need Micro-Services … otherwise just do services :-)
Final notes …
Operational Information System BI
X
87
Thanks for listening

Mais conteúdo relacionado

Mais procurados

Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu KrishnaswamyEvent Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Bob Rhubart
 

Mais procurados (20)

Technical Architecture
Technical ArchitectureTechnical Architecture
Technical Architecture
 
Software Architecture Patterns
Software Architecture PatternsSoftware Architecture Patterns
Software Architecture Patterns
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
 
Microservice architecture design principles
Microservice architecture design principlesMicroservice architecture design principles
Microservice architecture design principles
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
A Pattern Language for Microservices
A Pattern Language for MicroservicesA Pattern Language for Microservices
A Pattern Language for Microservices
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
Non functional requirements. do we really care…?
Non functional requirements. do we really care…?Non functional requirements. do we really care…?
Non functional requirements. do we really care…?
 
Domain driven design
Domain driven designDomain driven design
Domain driven design
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 
Event Driven Software Architecture Pattern
Event Driven Software Architecture PatternEvent Driven Software Architecture Pattern
Event Driven Software Architecture Pattern
 
Software architecture patterns
Software architecture patternsSoftware architecture patterns
Software architecture patterns
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
 
Service Oriented Architecture & Beyond
Service Oriented Architecture & BeyondService Oriented Architecture & Beyond
Service Oriented Architecture & Beyond
 
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu KrishnaswamyEvent Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
Event Driven Architecture (EDA) Reference Architecture | Anbu Krishnaswamy
 
Communication in a Microservice Architecture
Communication in a Microservice ArchitectureCommunication in a Microservice Architecture
Communication in a Microservice Architecture
 
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
 
Domain Driven Design - Strategic Patterns and Microservices
Domain Driven Design - Strategic Patterns and MicroservicesDomain Driven Design - Strategic Patterns and Microservices
Domain Driven Design - Strategic Patterns and Microservices
 
Monoliths and Microservices
Monoliths and Microservices Monoliths and Microservices
Monoliths and Microservices
 

Semelhante a Introduction to Modern Software Architecture

_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
do_2013
 
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
do_2013
 
Various Approaches Of System Analysis
Various Approaches Of System AnalysisVarious Approaches Of System Analysis
Various Approaches Of System Analysis
Laura Torres
 
Chapter 7 Design Architecture and Methodology1.docx
Chapter 7 Design Architecture and Methodology1.docxChapter 7 Design Architecture and Methodology1.docx
Chapter 7 Design Architecture and Methodology1.docx
mccormicknadine86
 

Semelhante a Introduction to Modern Software Architecture (20)

_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
 
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
_773d48108e2dda1c1a731bf69b06c3be_Software-Architecture_Course-Notes.pdf
 
Software-Architecture_Course-Notes.pdf
Software-Architecture_Course-Notes.pdfSoftware-Architecture_Course-Notes.pdf
Software-Architecture_Course-Notes.pdf
 
4+1 view model
4+1 view model4+1 view model
4+1 view model
 
Software Patterns
Software PatternsSoftware Patterns
Software Patterns
 
SA_UNIT_1.pptx
SA_UNIT_1.pptxSA_UNIT_1.pptx
SA_UNIT_1.pptx
 
Chapter1
Chapter1Chapter1
Chapter1
 
software architecture
software architecturesoftware architecture
software architecture
 
Various Approaches Of System Analysis
Various Approaches Of System AnalysisVarious Approaches Of System Analysis
Various Approaches Of System Analysis
 
4+1archi
4+1archi4+1archi
4+1archi
 
UNIT3 PART2.pptx dhfdifhdsfvgudf dhfbdhbffdvf
UNIT3 PART2.pptx dhfdifhdsfvgudf dhfbdhbffdvfUNIT3 PART2.pptx dhfdifhdsfvgudf dhfbdhbffdvf
UNIT3 PART2.pptx dhfdifhdsfvgudf dhfbdhbffdvf
 
Chapter 7 Design Architecture and Methodology1.docx
Chapter 7 Design Architecture and Methodology1.docxChapter 7 Design Architecture and Methodology1.docx
Chapter 7 Design Architecture and Methodology1.docx
 
Arch06 1
Arch06 1Arch06 1
Arch06 1
 
Oop final project documentation jose pagan v2.1
Oop final project documentation  jose pagan v2.1Oop final project documentation  jose pagan v2.1
Oop final project documentation jose pagan v2.1
 
Oop final project documentation jose pagan v2.1
Oop final project documentation  jose pagan v2.1Oop final project documentation  jose pagan v2.1
Oop final project documentation jose pagan v2.1
 
Chapter9
Chapter9Chapter9
Chapter9
 
chapter-6-Software_Engineering_P1_MohamedElhawy_19135002.pptx
chapter-6-Software_Engineering_P1_MohamedElhawy_19135002.pptxchapter-6-Software_Engineering_P1_MohamedElhawy_19135002.pptx
chapter-6-Software_Engineering_P1_MohamedElhawy_19135002.pptx
 
Unit_4_Software_Design.pptx
Unit_4_Software_Design.pptxUnit_4_Software_Design.pptx
Unit_4_Software_Design.pptx
 
Model Based Software Architectures
Model Based Software ArchitecturesModel Based Software Architectures
Model Based Software Architectures
 
Software architecture in practice unit1 1
Software architecture in practice unit1 1Software architecture in practice unit1 1
Software architecture in practice unit1 1
 

Mais de Jérôme Kehrli

Mais de Jérôme Kehrli (18)

Introduction to Operating Systems
 Introduction to Operating Systems Introduction to Operating Systems
Introduction to Operating Systems
 
A proposed framework for Agile Roadmap Design and Maintenance
A proposed framework for Agile Roadmap Design and MaintenanceA proposed framework for Agile Roadmap Design and Maintenance
A proposed framework for Agile Roadmap Design and Maintenance
 
The search for Product-Market Fit
The search for Product-Market FitThe search for Product-Market Fit
The search for Product-Market Fit
 
Big data in Private Banking
Big data in Private BankingBig data in Private Banking
Big data in Private Banking
 
From Product Vision to Story Map - Lean / Agile Product shaping
From Product Vision to Story Map - Lean / Agile Product shapingFrom Product Vision to Story Map - Lean / Agile Product shaping
From Product Vision to Story Map - Lean / Agile Product shaping
 
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?
 
Artificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionArtificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud Prevention
 
Linux and Java - Understanding and Troubleshooting
Linux and Java - Understanding and TroubleshootingLinux and Java - Understanding and Troubleshooting
Linux and Java - Understanding and Troubleshooting
 
Deciphering the Bengladesh bank heist
Deciphering the Bengladesh bank heistDeciphering the Bengladesh bank heist
Deciphering the Bengladesh bank heist
 
Introduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software StackIntroduction to NetGuardians' Big Data Software Stack
Introduction to NetGuardians' Big Data Software Stack
 
Periodic Table of Agile Principles and Practices
Periodic Table of Agile Principles and PracticesPeriodic Table of Agile Principles and Practices
Periodic Table of Agile Principles and Practices
 
Agility and planning : tools and processes
Agility and planning  : tools and processesAgility and planning  : tools and processes
Agility and planning : tools and processes
 
Bytecode manipulation with Javassist for fun and profit
Bytecode manipulation with Javassist for fun and profitBytecode manipulation with Javassist for fun and profit
Bytecode manipulation with Javassist for fun and profit
 
DevOps explained
DevOps explainedDevOps explained
DevOps explained
 
Digitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for BanksDigitalization: A Challenge and An Opportunity for Banks
Digitalization: A Challenge and An Opportunity for Banks
 
Lean startup
Lean startupLean startup
Lean startup
 
Blockchain 2.0
Blockchain 2.0Blockchain 2.0
Blockchain 2.0
 
The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin
 

Último

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Último (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 

Introduction to Modern Software Architecture

  • 1. 1 © Jerome Kehrli @ niceideas.ch Introduction to Modern Software Architecture
  • 2. 2 Part I – Software Architecture Models 1.1 Introduction to Software Architecture 1.2 Our illustration example 1.3 The Kruchten 5 + 1 View Model 1.4 The OCTO Matrix Approach Part II - Modern Architectures 2.1 Big Data 2.2 The Death of the Moore Law 2.3 The CAP Theorem 2.4 NoSQL / NewSQL 2.5 Hadoop 2.6 Data Lake 2.7 Streaming Architecture 2.8 Lambda Architecture 2.9 Big Data 2.0 & Kubernetes 2.10 Microservices Architecture Part III - Takeaways Agenda
  • 3. 3 1.1 Introduction to Software Architecture
  • 4. 4 Definitions 1/3 A software system's architecture is the set of principal design decisions about the system Software architecture is the blueprint for a system's construction and evolution Design decisions encompass the following aspects of the system under development Structure, Behaviour, Interactions, Non-functional properties Taylor 2010 "Principal” implies an a degree of importance that grants a design decision an "architectural status". This implies that not all design decisions are architectural. As such, these do not necessarily impact a system's architecture. How one defines principal depends on what the stakeholders define as the system goals.
  • 5. 5 Definitions 2/3 An architecture is the set of significant decisions about the organization of a software system, the selection of the structural elements and their interfaces by which the system is composed together with their behavior as specified in the collaborations among those elements, the composition of these structural and behavioral elements into progressively larger subsystems, and the architectural style that guides this organization, these elements and their interfaces, their collaborations, and their composition. RUP – Rational Unified Process
  • 6. 6 Definitions 3/3 In most successful software projects, the expert developers working on that project have a shared understanding of the system design. This shared understanding is called ‘architecture’. This understanding includes how the system is divided into components and how the components interact through interfaces. Architecture is about stuff that’s hard to change later Ralph Johnson Neal Ford Architecture is about the important stuff Martin Fowler
  • 7. 7 Sidenotes Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure. Melvin E. Conway (Conway's law) ... all models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind... George Box
  • 8. 8 Software Architecture is A Process : to design a high-level solution A Product : schemas, models, documentation, prototypes Means : frameworks, libraries, middleware, etc. to ease implementation of large systems A Reality : the working software or Information System My View
  • 9. 9 Different Kind of Architectures Enterprise Architecture Solution / Application Architecture Enterprise Architecture defines the way the enterprise uses several applications. Metaphor : City Planning / City Map Focus : Strategy / Business Some Key Concerns: - Uncover operational gaps - Understand data-dependencies across the IT landscape - Understand Interactions between Solutions / Applications - Streamline the application landscape for optimal performance - Decommissioning of legacy solutions - Eliminate redundancies - Identify and avoid tech risks Application architecture defines the various pieces that compose an application Metaphor : Building / House Architecture Focus : Technology / Functional Some Key Concerns: - Define a best-fit solution for identified problems - Ensure solution meets functional and non-functional requirements - Understand how application supports business capabilities - Understand functional fit, technical fit and risks - Implement technical processes for Application development
  • 10. 10 Architecture or Design Architecture Design Implementation Abstraction Fine Granularity / Reality Process of creating High-level structures of a software system Converts the software characteristics into a high-level structure Micro-services, serverless, streaming, lambda are some software architecture patterns Helps define high-level structure of the software system Process of creating a form of specification of a software artifact that helps implement the software Describes all units of a software system to support coding Creational, structural and behavioural are some types of software design-patterns Helps implement the software
  • 11. 11 2 different visions of architecture
  • 13. 13 Example – product vision canvas – RIA Organizer
  • 14. 14 Example – Story Map - RIA Organizer
  • 15. 15 1.3 The Kruchten 5 + 1 View Model
  • 16. 16 Philippe Kruchten defined a 4+1 Views Model to capture the description of Software Architecture into multiple complementary views in 1995 when he was working for Rational Software Corp. The 4+1 views model is an information organization framework; it consists of logical, process, development, and physical knowledge of an application, and end-user perspective information. A view is an aspect (subpart) of information. A notion is a way of representing information. The 4 + 1 Kruchten Views Model Philippe Kruchten, Architectural Blueprints—The “4+1” View Model of Software Architecture The “4+1” view model is rather “generic”: other notations and tools can be used, other design methods can be used, especially for the logical and process decompositions, but we have indicated the ones we have used with success.
  • 17. Conceptual / Logic Physical / Operational Non-functional Functional Logical / Structural View Implementation / Development View Process / Behaviour View Deployment / Physical View The logical view is concerned with the functionality that the system provides to end-users. UML Diagrams used to represent the logical view include Class diagram, Communication diagram, Sequence diagram. The development view illustrates a system from a programmer's perspective and is concerned with software management. This view is also known as the implementation view. It uses the UML Component diagram to describe system components. UML Diagrams used to represent the development view include the Package diagram. The process view deals with the dynamic aspects of the system, explains the system processes and how they communicate, and focuses on the runtime behavior of the system. The process view addresses concurrency, distribution, integrators, performance, and scalability, etc. UML Diagrams to represent process view include the Activity diagram. The physical view depicts the system from a system engineer's point-of-view. It is concerned with the topology of software components on the physical layer, as well as communication between these components. This view is also known as the deployment view. UML Diagrams used to represent physical view include the Deployment diagram. Use Case / Scenario View The description of an architecture is illustrated using a small set of use cases, or scenarios which become a fifth view. The scenarios describe sequences of interactions between objects and / or processes. They are used to identify architectural elements and to illustrate and validate the architecture design. They also serve as a starting point for tests of an architecture prototype. UML Diagram(s) used to represent the scenario view include the Use case diagram.
  • 18. Conceptual / Logic Physical / Operational Non-functional Functional Process / Behaviour View Perspective: System Integrators Stage: Design Focus: Process decomposition Concerns: Performances, Scalability, Throughput, Synchronization, Concurrency Artifacts: - Sequence Diagrams / Activity Diagrams - Communication / interactions diagrams - State Machine Diagrams - Timing Diagrams Logical / Structural View Perspective: End Users , Business Analysts Stage: Requirements Analysis Focus: Components / Objects / Services Model - Decomposition Concerns: Functionality Artifacts: - Functions Schema - Class / Objects Diagram - (composite) Structure Diagram - State Machine Implementation / Development View Perspective: Developers, Designers Stage: Design Focus: Subsystem decomposition Concerns: Software / Configuration Management Artifacts: - Components Diagram - Package Diagram Deployment / Physical View Perspective: System Engineers Stage: Design Focus: Software mapping to Hardware (deployment) Concerns: System Topology, Delivery, Installation, Communication Artifacts: - Deployment diagram - Network / Cluster topology (not UML) Use Case / Scenario View Perspective: End User Stage: Putting it all together Focus: Understandability , usability Concerns: Feature Decomposition Artifacts: - Use-case diagrams - User Stories (not UML) - Story Maps (not UML)
  • 20. 20 RIAO Process View – Send Email
  • 21. 21 RIAO Process View – Fetch new Emails
  • 23. 23 User Computer RIA Server Tomcat (Spring Boot) RIAO Physical View Apache Proxy Web browser RIAO UI RIAO Backend HTTPS Courier Server Courier / Debian POP3 SMTP HTTP (User OS) Debian Linux MongoDB Node MongoDB Node Mongo Node Docker Debian Linux Integration Processing / Business Presentation FirewallD Open JDK 11 / JVM Loc. Storage Internet Internal Network SystemD Kubernetes Cluster K8s service Locator
  • 24. 24 1.4 The OCTO Matrix Approach
  • 25. 25 OCTO Technology designed in 2010 a matrix that presents a 360 overview of most-if-not-all questions, concerns and aspects that need to be answered and addressed when defining a Software Architecture The OCTO Architecture Matrix The questions and concerns are related to different levels of architecture: Functional Application Technical System They regroup different perspectives: Security Usage Services Data Exchanges
  • 26. Security Usage Services Data Exchanges Procedures / Specifications Schema / Models / Catalogs Technical Documentation Functional Perspective CONFORMITY Procedures and rules aimed at ensuring the security of the components Code of conduct, role rights, user groups, documented procedures, disaster recovery plans, geographical accesses strategy, … USAGE Use cases per persona: Customer, Partners, Advisors, etc. User profiles, User experience, work ergonomics, internet and new channels strategy, digital strategy, Business cases, Business Use Cases, etc. FUNCTIONS Functions and management rules within the company Functional architecture schema, Functional map, operational processes, management rules, calculation rules, user guides, etc. INFORMATION Information handled within the company Data architecture, Data governance, functional dictionary, data models, data modeling rules, etc. PROCESSES AND EXCHANGES Processes and exchanges internal and with partners. Data flows, modeling, internal and external exchanges, functional workflows, operational data workflows, etc. Application Perspective SECURITY Components aimed at ensuring the security of the Information System Authentication, Identification, Authorization, management of credentials and access rights, provisioning, audit trails, security procedures referential, etc. USAGE FLOW Applications / Modules / Components accessed by users Business Software Components accessed by users, internet / intranet / mobile portals, call center, BI systems, mail and internal services, etc. PROCESSING Components implementing the processing of the IS Application schema and map, processes model reference, Business Model, micro-services, management rules model reference, etc. DATA SILOS Data repositories, data referentials, etc. Data referential, Data dictionary, Data warehouse, Data model reference, Document model reference, audit trails, archives, etc. DATA FLOWS Data exchanges processes and means Interoperability dictionary, exchanges standards and formats, document edition, APIS reference, External APIS, , etc. SECURITY FRAMEWORK Technical means and components implementing security principles Authentication mechanisms, Rights management components, confidentiality protocols and means, cryptographic means etc. Technical Perspective GUI FRAMEWORK Technical means and components providing the GUI and user tools GUI technologies, Reporting tools, GUI design model, tools and standards, etc. SERVICES FRAMEWORK Technical means for executing the services Technological and app server stacks, processing middleware(s), technical layers, frameworks, toolkits, libraries, rules engines, external components, etc. DATA FRAMEWORK Technical means for accessing and storing data DBMS, LDAP technologies, Data modeling tools, etc. EXCHANGES FRAMEWORK Technical means for exchanges EAI, ESB, ETL, Workflow engines, API frameworks and libraries, file transfers, flow design, etc. System Perspective SYSTEM SECURITY Physical means and tools implementing the network and security LAN, WAN, remote access, VPN, firewall, DMZ, proxy, I/O hub, journals, supervision tools, authentication dongles, etc. USER DEVICES AND MEANS User equipment (PC, IP phone, tablets, ...) Communication means, user computer, Office servers, remote access middleware and software, etc. PROCESSING INFRASTRUCTURE Processing servers and middleware Servers, Datacenters, Load balancers, Proxies, Clusters, Monitoring Systems, Clouds, SLAs, DRP, etc. STORAGE INFRASTRUCTURE Data Servers and Middleware Data Servers, SAN, Archiving, Robots, Storage Servers, RAID, etc. EXCHANGES INFRASTRUCTURE Middleware and Tools Exchange Servers, Clustering Middleware, Big Data Engines, SLA,s Transfer Monitors, Service Contract, DRP, Flow Management, Replay, Monitoring, etc. Source : https://fr.slideshare.net/OCTOTechnology/2012-pdj-banque-du-futur-2020 © OCTO Technology
  • 27. 27 RIAO Functional Architecture Email Management Contact management Email Search Search Email Display / Edition Folder Management Global App. Email Application Appointment Display / Edition Calendar Display / Edition Calendar Application Folder Display / Edition Contact Display / Edition Calendar management Appointment Management Contact Search Calendar Search Contact App. Login User Management Appointment Mapping Business / Entry Points User Interactions Services & Functions Mgmt. Search Text Compos. HTML Comp. RTF Compos. Text Display HTML Disp. RTF Display Text Compos. HTML Comp. RTF Compos. Text Display HTML Disp. RTF Display Attachment Management Email IO
  • 28. 28 GridFS Folder Model Email Docs. Attachem- ent files Appoint- ment Docs. Calendar Model Contact Documents User Model SMTP Server POP3 Store RIAO Backend RIAO UI RIAO Application Architecture Search Email IO Appointment Mapping User Model Folder Model Email Model Calendar Model Appointm. Model Contact Model Attachem. Model Appointm. Search Email Search Contact Search Search Mgmt. User Mgmt. Folder Mgmt. Email Mgmt. Calendar Mgmt. Appointm. Mgmt. Contact Mgmt. Attachem. Mgmt. Email Synchronization Deleg. Search Service User Service Email Service Calendar Service Contact Service Login Page Profile Edition Folder View / Edit Email Compos. Email View / Edit Calendar View / Edit Appoinlt. Compos. Appoint. View / Edit Contact View / Edit Contact Model CRUD Fetch Send Search Page REST API Data / Exchanges Integration Busi- ness Presentation APIs and Process Orchestration CRUD RTF Display RTF Compos. HTML Compos. Deleg. Email Application Calendar App. Contact App Text Compos Email Model Calendar Model Email Controller Calendar Controller Contact Ctr. Search Control. User Ctrl. Loc. Storage Main Page
  • 29. 29 User tier Proces- sing tier Integration Tier Web browser RIAO UI RIAO Back. RIAO Technical Architecture HTTP UI Controllers JAX-RS / HTTPS Java VM Apache Proxy Business managers MongoDB Client Views JQuery CKEditor Bootstrap Business Services DAOs SMTP Client POP3 Client Courier / Debian IO Management SMTP POP3 Spring Boot / Tomcat 8 Runtime Forms Models Linux Debian Spring Security Spring Framework Apache Commons SSL Cert. Local Store Sess. Ckie. JAX-RS / HTTP Main Page Obj./JSON Map. JSON / Object Mapping
  • 30. 30 User Computer RIA Server Tomcat (Spring Boot) RIAO System Architecture Apache Proxy Web browser RIAO UI RIAO Backend HTTPS Courier Server Courier / Debian POP3 SMTP HTTP (User OS) Debian Linux MongoDB Node MongoDB Node Mongo Node Docker Debian Linux Integration Processing / Business Presentation FirewallD Open JDK 11 / JVM Loc. Storage Internet Internal Network SystemD Kubernetes Cluster K8s service Locator
  • 32. 32 The era of power Cray 2 / 1985 / ~1.9 GigaFlops Samsung S6 / 2015 / ~30 GigaFlops Source : https://pages.experts-exchange.com/processing-power-compared
  • 33. 33 Origins of Big Data : the web giants !
  • 34. 34 Data deluge 5 exabytes of data (5 billions of gigabytes) has been generated since the first measurements until 2003, In 2011, this quantity was generated in 2 days In 2018, this quantity was generated in 2 minutes Source: https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
  • 35. 35 Our architectures are 30 years old ! Corporate Operational Data Internal GUI Space Operational / Live Audit / Logs Archived Data … Ext. Data Staging Database … ETL ETL Datawarehouse Storage Cleaning / Cleansing / Enrichment / Remapping Historize Query ETL Reporting / Analytics / Querying Data Mart Data Mart Data Mart Operational Application Space Online Business Applications Batch Business Applications Monitoring / Operation Applications External GUI Space DMZ Web Apps Desktop Apps Web Apps Mobile Apps Operational Information System Analytical Information System / Business Intelligence
  • 36. 36 2.2 The death of the Moore Law
  • 37. 37 The Moore law “The number of transistors and resistors on a chip doubles every 24 months” - Gordon Moore, 1965
  • 38. 38 Technical capacitites evolution For the 40 years, the IT component capabilties grew exponentially The Moore law! Source : http://radar.oreilly.com/2011/08/building-data-startups.html
  • 39. 39 Storage cost evolution While the unit cost is decreasing… 0.01 $ 0.10 $ 1.00 $ 10.00 $ 100.00 $ 1,000.00 $ 10,000.00 $ 100,000.00 $ 1,000,000.00 $ 10,000,000.00 $ 1975 1980 1985 1990 1995 2000 2005 2010 2015 Hard Drive RAM Source :http://www.mkomo.com/cost-per-gigabyte 2012 5$/GB 1982 5M$/GB
  • 40. 40
  • 41. 41 Disk throughput evolution Issue : The throughput evolution is always lower than the capacity evolution How read/write more and more data through an always thicker pipe? Gain : x100 000 Capacity Gain: x 10’000 In 15 years Throughput Gain: x 50 In 15 years
  • 42. 42 New architectures and paradigms Key Idea #1 Key Idea #2 Key Idea #3 Since the data is to big to fit one computer, distribute it among many computer (partitioning / sharding) ! Run transaction and computation in parallel on multiple (many!) nodes and scale at the multi-datacenter level the grid of CPU, RAM and HDD Move the code to the data node, not the data to the computing node (Data tier revolution)
  • 43. 43 2.3 The CAP Theorem
  • 44. 44 The early days of digital data … Before 1960, the data within a Computer Information System was mostly stored in rather flat files (sometimes indexed) manipulated by top-level software systems. Directly using flat files was cumbersome and painful… Various needs emerge at the time : Data isolation Access efficiency Data integrity Reducing the time required to develop brand new applications  Something else was required … A bit of history …
  • 45. 45 The relational model rules for 40 years ! E.g. an Exam Grade management app : Display the subject of a student on his profile screen, one needs to 1. Extract the personal data from the “student” table 2. Fetch its subject if from the relation table 3. Read the subject title from the “subject” table. Enters the Relational Model … 1969 / Edgar F. Codd - RDBMS Entities as Tables & associations The relational model reduces redundancy to optimize disk space usage At the time of its creation Disk storage was very expensive and limited The volume of data in the Information Systems was rather small  avoid redundancy to optimize disk space usage, thanks to guaranties of : Structure: using normal design forms and modeling techniques Coherence: using transaction principles and mechanisms Why, oh why, to separate these 2 kind of information since in 95% of the use cases around these data, both will always be used together ?!?
  • 46. 46 The mid and late 2000’s were times of major changes in the IT landscape Hardware capabilities significantly increased eCommerce and internet trade, in general, exploded Some internet companies, so-called the “Web giants” (Yahoo!, Facebook, Google, Amazon, Ebay, Twitter, …), pushed traditional databases to their limits. Those databases are by design hard to scale With relational DBMSes, the only way to improve performance is by scaling up, i.e. getting bigger servers (more CPU, more RAM, more disk, …). One eventually hits a hard limit imposed by the current technology The origins of NoSQL Faster More storage More reliable Investments Hard limit From a certain point, investments yield little improvement Database server Scaling up:
  • 47. 47 By rethinking the architecture of databases, those companies were able to make them scale at will, by adding more servers to clusters instead of upgrading the servers. The servers are not made of expensive, high-end hardware; they are qualified as commodity servers (or commodity hardware) The origins (cont’d) Faster More storage More reliable Investments Power grows linearly with the number of servers (linear scalability) Scaling out: Database cluster
  • 48. 48 This is the essence of Big Data ! With most NoSQL databases, the data is not stored in one place (i.e. on one server). It is distributed among the nodes of the cluster. When created, an object A is assigned to a node in the cluster. This is called sharding – the amount of data assigned to a node is called a shard (also called partition) Having more cluster nodes implies a higher risk of having some nodes crash, or a network outage splitting the cluster in two. For this reason, and to avoid data loss, objects are also replicated across the clusters The number of copies, called replicas, can be tuned. 3 replicas is a common figure Data distribution A B C D A A B B C C D D The objects may move, as nodes crash or new nodes join the cluster, ready to take charge of some of the objects. Such events are usually handled automatically by the cluster; the operation of shuffling objects around to keep a fair repartition of data is called rebalancing
  • 49. 49 The CAP Theorem Consistency All clients see the exactly the same data at the same time, even in the presence of an update (ACID Properties) Availability The system continues to operate and all clients can see “a version” of a replica, even in the presence of node failure Partition- tolerance The system continues to operate even when the system is partitioned (some nodes are unavailable) AC CP AP Not Possible Availability The cluster is available if a request made by a client is always acknowledged by the system, i.e. it is guaranteed to be taken into account That doesn’t mean that the request is processed immediately. It may be put on hold. An available system will at a minimum acknowledge it Client Request Acknowledgement ? Partition tolerance Partition Tolerance is verified if a cluster can stand a partition; if it continues to operate when one or several nodes disappear. (nodes crash, network equipment down, etc.) Partition tolerance is related to availability and consistency, but it is still different. It states that the system continues to function internally (e.g. ensuring data distribution and replication), whatever its interactions with a client Consistency Consitency refers to the fact that all replicas of an entity, identified by a key in the database, have the same value whatever the node queried old version new version new version new version Client Update
  • 50. 50 The previous 3 properties, Consistency, Availability and Partition tolerance, are not independent. The CAP theorem - or Brewer’s theorem - states that a distributed system cannot guarantee all 3 properties at the same time This is a theorem. That means it is formally true, but in practice it is less severe than it seems The system or a client can often choose CA, AP or CP according to the context, and “walk” along the chosen edge by appropriate tuning Partition splits happen, but they are rare events (hopefully) Rule of thumb Traditional relational DBMSes are CA or CP – consistency is a must, in case of a problem either bring the cluster down or split it and expect heavy synchronization later Many NoSQL DBMSes are AP – availability is a must, and with big clusters failures happen so better live with it. Consistency is only eventual The CAP theorem Consistency Availability Partition- tolerance AC CP AP Not Possible
  • 51. 51 This is essential ! Consistency refers to the fact that all replicas of an entity, identified by a key in the database, have the same value whatever the node queried With many NoSQL databases, the prefered working mode is AP and all-the-time consistency is sacrificed. Favoring performance, updates take a little time to propagate across the cluster. When an entity’s value has just been created or modified, there is a short span during which the entity is not consistent. However the cluster guarantees that it will eventually be, when replication has occurred. This is called eventual consistency Eventual Consistency
  • 52. 52 2.4 NoSQL / NewSQL
  • 53. 53 A NoSQL - originally referring to "non-SQL" for "non-relational“ - database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st century, triggered by the needs of Web 2.0 companies. NoSQL databases are increasingly used in Big Data and Real-Time Web applications. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures. NoSQL / NewSQL The fundamental idea behind NoSQL is as follows: because of the need to distribute data (Big Data), the Web giants have abandoned the whole idea of ACID transactions (only eventual consistency is possible) So if we drop ACID Transactions - which we always deemed to be so fundamental - why wouldn't we challenge all the rest - the relational model and table structure? Wikipedia - https://en.wikipedia.org/wiki/NoSQL
  • 54. 54 For data fundamentally structured as tabular data et of a manageable size, the relational model fits. For instance: Accounting Data Customer information But some other data are modeled in a much more complex way Geospatial data Molecular models Some underlying notions there are fundamentally not relational Hierarchical data Several levels of interconnections In addition, some data models have a high volatility and required flexibility over time Information available at the time of the creation of the model are sometimes incomplete Or there inherent structure changes over time The relational model is not well suited for data experiencing constant structural changes The relational model is not always well suited
  • 55. 55 NoSQL Database Types : 4 families Document-oriented (e.g. MongoDB, ElasticSearch) Key/Value pairs (e.g. Redis) Graph (e.g. Neo4J) Column-family aka BigTable (e.g. Cassandra)
  • 56. 56 NoSQL Database Types Document-oriented (e.g. MongoDB, ES) Key/Value pairs (e.g. Redis) Graph (e.g. Neo4J) Column-family aka BigTable (e.g. Cassandra) One key has one (and only one) value The Value type is not specified (Object value) A Value may have different type Issue : difficult to fit a model in this modeling pattern Row = a set of columns Sorted vertical storage Operations Query by key or set of key Allowing query on secondary indexes Selection of the resulted columns The column-family model looks a bit like the relational model For a given row, the contents of a column can thus be seen as a hash table with arbitrary (key, value) pairs Each row in a table is uniquely identified by a key Documents are structured data in the form of hierarchical trees (sub-documents) Data can be of various types Strings, numbers, arrays Documents are self-supporting It contains meta-data about the structure and the corresponding values Several storage formats for the document XML, JSON, BSON In this model, objects are documents, i.e. trees of values Each document has a root and attributes Attribute values are scalars (integers, strings), lists or other objects Each object has a unique ID, a conventional property whose value serves as a key Objects are organized into collections. Objects in the same collection don’t need to have the same schema – there is no mandatory structure Based on the interconnection of data (contrary to the other NoSQL solutions which do not support relations) Data are not only linked to nodes but also to edges (property graph)
  • 57. 57 Examples of NoSQL data models Document-oriented (e.g. MongoDB) { ‘_id’: 123456, 'type': 'product', 'name': 'computer', 'features': { 'cpu_GHz': 3, 'ram_GB': 8, 'brand’: 'Dell' } }, } { ‘_id’: 123457, 'type': 'product', 'name': 'blender', 'features': { 'rpm': 10000, 'voltage’: '220V 50 Hz' } }, } { ‘_id’: 123458, 'type': 'user’, 'login': ’choupi92', 'password': 'AZnx403==', 'shopping_history': [...] } OBJECTS Key/Value pairs (e.g. Redis) obj_123456 “type=product;name=computer;cpu_GHz=3;…“ obj_123457 “type=product;name=blender;rpm=10000;…“ obj_123458 “type=user;login=choupi92;password=…“ Graph (e.g. Neo4J) choupi92 computer blender hightech kitchen category category Column-family aka BigTable (e.g. Cassandra) 123456 123457 computer blender cpu_GHz=3 ram_GB=8 rpm=10000 brand=Dell voltage=220V 50 Hz name _id PRODUCTS features 123458 login= choupi92 password=A Znx403== 08/09/13=… 10/09/13=… _id … USERS authent shopping_history
  • 58. 58
  • 59. 59 What is NewSQL ? NewSQL refers to relational databases that have adopted upon some of the NoSQL genes, thus exposing a relational data model and SQL interfaces to distributed, high volume databases NewSQL, contrary to NoSQL, enables an application to keep The relational view on the data The SQL query language Response times suited to transactional processing Some were built from scratch (e.g. VoltDB), others are built on top of a NoSQL data store (e.g. SQLFire, backed by GemFire, a key/value store) The current trend is for some proven NoSQL databases, like Cassandra, to offer a thin SQL interface, achieving the same purpose Generally speaking, the frontier between NoSQL and NewSQL is a bit blurry… SQL compliance is often sought for, as the key to integrating legacy SQL software (ETL, reporting) with modern No/NewSQL databases NewSQL?
  • 61. 61 Hadoop is an Open Source Platform providing A distributed, scalable and fault tolerant storage system as a grid Initially, a single parallelism paradigm : MapReduce to reuse the storage nodes as processing nodes Since Hadoop V2 and YARN, other parallelization paradigms can be implemented on Hadoop Schemaless and optimized sequential write once and read many times Querying and processing DSL (Hive, Pig) Hadoop ? Hadoop is declined in different distributions Fondation Apache Cloudera HortonWorks MapR IBM … The Hadoop’s origins Initiated by Doug Cutting, leader of Lucene Based on the Google’s publications about their indexing system (GFS / Map Reduce / BigTable ) Official Apache project since 2009 Hadoop was primarily intended for Big Data Analytics Nowadays hadoop can be an infrastructure for much more Microservices architecture (Hadoop V3) Real-time Architectures
  • 62. 62 Hadoop Distribution Hadoop overview Distributed storage MapReduce processing engine / Parallel Computing Framework Querying Orchestration Machine learning / Processing IS integration Supervision and Management Reporting (Core)
  • 63. 63 Hadoop Distribution Hadoop is an ecosystem Hadoop Console Manager (Core)
  • 64. 64 Hadoop Architecture Client Applications Client Applications Client Applications Slave Node HDFS Data Node Map Reduce Task Tracker YARN Node Manager App Master App Container R1 R2 P1 Secondary Master Node Master Node YARN Resource Manager HDFS Name Node Map Reduce Job Tracker HDFS Meta Data YARN Meta Data Slave Node HDFS Data Node Map Reduce Task Tracker YARN Node Manager App Master App Container R1 R3 P2 Slave Node HDFS Data Node Map Reduce Task Tracker YARN Node Manager App Master App Container R2 R3 P3
  • 66. 66 Vision of a data lake With the continued growth in scope and scale of analytics applications using Hadoop and other data sources, then the vision of an enterprise data lake can become a reality. In a practical sense, a data lake is characterized by three key attributes: Collect everything. A data lake contains all data, both raw sources over extended periods of time as well as any processed data  big volumes Dive in anywhere. A data lake enables users across multiple business units to refine, explore and enrich data on their terms  you don’t know, a priori the analytical structures Flexible access. A data lake enables multiple data access patterns across a shared infrastructure: batch, interactive, online, search, in-memory and other processing engines. As a result, a data lake delivers maximum scale and insight with the lowest possible friction and cost. Data lake A data lake is a system or repository of data stored in its natural/raw format It's is usually a single store of data including raw copies of source system data, sensor data, social data etc. and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. It can include structured data from relational databases, semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). Wikipedia - https://en.wikipedia.org/wiki/Data_lake
  • 67. 67 Datalake Application Architecture Unstructured Data Storage Semi-structured data storage (NoSQL) Structured Data storage (e.g. relational) Interactive Queyring Analytics / Processing Flow Processing Machine Learning Databases Raw files Application logs External Data / Open APIs Events / Messages Enterprise DWH Operational System Query / Reporting APIs / Services Events / messages DATA LAKE INGESTION PUBLICATION
  • 69. 69 Definition A real time system is an event-driven system that is available, scalable and stable, able to take decisions (actions) with a latency defined as … below the frequency of events In a streaming architecture … Historical data is regularly and consistently updated with live data Live data is available to the end user Both types or data (historical and live) are not necessarily presented consistently to the end user Both sets of data can have their own screens or even application A consistent view on both sets of data would be proposed by Lambda Architecture (next topic in this presentation) Streaming Architectures
  • 70. 70 Complex Event Processing Engine decision / action Transactional Applications BPM, ESB Capture Streaming Architecture In memory states and Calculations: Time window, operators, rules Rules edition GUI Cache / Distributed Cache latency : 100 ms Event/Condition/Action Stream-based querying multi-dimen. Analysis … Real-time Data GUI Historical Data GUI Structured Events Unstructured Events Reference Data, DWH, Services Querying Event History
  • 71. 71 Complex Event Processing Engine decision / action Transactional Applications BPM, ESB Capture Streaming Architecture In memory states and Calculations: Time window, operators, rules Rules edition GUI Cache / Distributed Cache latency : 100 ms Event/Condition/Action Stream-based querying multi-dimen. Analysis … Real-time Data GUI Historical Data GUI Structured Events Unstructured Events Reference Data, DWH, Services Querying Event History Stakes : - Latency Management ( < 100 ms ) - Throughput( 10’000 msg / sec ) - Memory Consumption - Balancing and Replication - Fault Tolerance - State coherence - What about lost events ? - Init from historical data Stakes : - Dynamical GUIs - Data exploration and following axes and criteria, - Real-time GUI : event-driven of type « web- push » Stakes : - High read performances in respect to latency - Good cache management Stakes : - High capacity - High write performances - High historical data querying Performances - Flexible Design abilities Stakes: - « WYSIWYG » editor, usable by business users - « Hot » updates of rules - Backtesting Stakes - Throughput (10’000 msg/sec ) - Fault tolerance : messages retry?
  • 73. 73 Real-Time Analytics What if I want real-time analytics ? • Most Data Analytics software are batch processing solutions! • So what happens with updates occurring while a batch is running? • What happens between two of its executions ? Objectives: • Take all the data into account • Be able to answer any kind of request • Fault-tolerance • Robustness to evolutions, errors • Scalability ! • Low latency for writing AND reading PROCESSED DATA DATA THAT CAME AFTER THE START OF THE CURRENT BATCH Time More or less a few minutes to a few hours of data A few minutes to a few hours of data
  • 74. 74 λ (Lambda) Architecture CONSISTENT BATCH ANALYTICS ON COMPREHENSIVE DATA REAL-TIME / STREAMING ANALYTICS ON INCREMENTAL DATA DATA STREAM STORAGE OF PRE- COMPUTEDS RESULTS / VIEWS OF THE DATA STORAGE OF INCREMENTAL RESULTS / VIEWS OF THE DATA To Real-Time Analytics with Near-Real-Time background statistics and models SPEED LAYER BATCH LAYER Final latency < 1second QUERYING AND REPORTING TOOL AGGREGATION, MERGING AND CONSOLIDATION SERVING LAYER The batch layer is responsible for consistency and data storage on the long term The speed layer only analyzes the required time-window The gap between the last batch execution and the latest real-time data  only most recent data. Both layers produce the same output (unlike usual streaming architectures) The serving layer provides a consolidated view on both results
  • 75. 75 λ (Lambda) Architecture CONSISTENT BATCH ANALYTICS ON COMPREHENSIVE DATA REAL-TIME / STREAMING ANALYTICS ON INCREMENTAL DATA DATA STREAM STORAGE OF PRE- COMPUTEDS RESULTS / VIEWS OF THE DATA STORAGE OF INCREMENTAL RESULTS / VIEWS OF THE DATA Many solutions for all components SPEED LAYER BATCH LAYER QUERYING AND REPORTING TOOL AGGREGATION, MERGING AND CONSOLIDATION SERVING LAYER D3.js HighCharts Tableaux Storm DRPC Java API Flink
  • 76. 76 κ (Kappa) Architecture REAL-TIME / STREAMING ANALYTICS ON INCREMENTAL DATA DATA STREAM RELOAD OF PREVIOUS RESULTS / VIEWS OF THE DATA STORAGE OF INCREMENTAL RESULTS / VIEWS OF THE DATA Recent Stream Processing Technologies render the batch layer less required UNIFIED STREAMING LAYER / TECHNOLOGY Final latency < 1second QUERYING AND REPORTING TOOL AGGREGATION, MERGING AND CONSOLIDATION SERVING LAYER Kappa architecture is a streaming-first architecture deployment pattern With most recent Stream Processing technologies (Kafka Streams, Flink, etc.) the interest and relevance of the batch layer tend to diminish. The streaming layer matches computation abilities of the batch layer (ML, statistics, etc.) and stored data as it processes it. A batch layer would only be needed to kick-start the system on historical data (Flink can do that)
  • 77. 77 2.9 Big Data 2.0 & Kubernetes
  • 78. 78 Big Data 2.0 2012 2011 2014 Nowadays in 2021 : With Hadoop 3, these 3 technologies converge tend to converge to the same possibilities. Hadoop 3 supports deploying jobs as docker containers just as Mesos and K8s Mesos and Kubernetes can use alternatives to HDFS such as Ceph, GlusterFS, Minio, (of course Amazon, Azure, …) etc. However, Kubernetes (and/or technologies based on Kubernetes) emerge as a market standard for the Operational IS just as Hadoop remains a market standard for the Analytical IS
  • 79. 79 Kubernetes is an Open Source Platform providing Automated software applications deployment, scaling, failover and management across cluster of nodes Management of application runtime components as Docker containers and application units as Pods Multiple common services required for service location, distributed volume management, etc. (pretty much everything one requires to deploy application on a Big Data cluster) Kubernetes Kubernetes is emerging as a standard as a Cloud Operating System Many distributions PKS (Pivotal Container Service) Red-Hat OpenShift Canonical Kubernetes Google / AWS / Azure … … Kubernetes origins Based on Google Borg, (one of) Google’s initial cluster management system(s) Released as Open-Source component in Google in 2014 First official release in 2015
  • 80. 80 Kubernetes Architecture Client Applications Client Applications Client Applications (Secondary Master Node [HA]) (Master Node) API Server Control Plane Etcd Key – Value Store Controller Manager Kubctl Port Forward Load Balanc. Controller Node Kubelet App App App App App POD POD Volumes CR1 CR2 GR1 GR3 Ceph Gluster Kube-Proxy Docker Node App App App App App POD POD Volumes CR2 CR3 GR1 GR2 Ceph Gluster Docker Node App App App App App POD POD Volumes CR1 CR3 GR2 GR3 Ceph Gluster Docker cAdvisor Kubelet Kube-Proxy cAdvisor Kubelet Kube-Proxy cAdvisor KubeMQ KubeMQ KubeMQ
  • 82. 82 Microservice architecture – a variant of the Service-Oriented Architecture (SOA) structural style – arranges an application as a collection of loosely-coupled services. In a microservices architecture, services are fine-grained and the protocols are lightweight. Its characteristics are as follows: Services in a microservices architecture (MSA) are small in size, messaging-enabled, bounded by contexts, autonomously developed, independently deployable, decentralized and built and released with automated processes. Services are often processes that communicate over a network to fulfill a goal using technology-agnostic protocols such as HTTP. Services are organized around business capabilities. Services can be implemented using different programming languages, databases, hardware and software environment, depending on what fits best. Microservices Architecture Origins of Micro-services: As early as 2005, Peter Rodgers introduced the term "Micro-Web-Services" during a presentation at the Web Services Edge conference. The architectural style name was really adopted in 2012 Kubernetes democratized the architectural approach The two big players in this field are Spring Cloud and Kubernetes A Microservices-based architecture has the following properties: Lends itself to a continuous delivery software development process. A change to a small part of the application only requires rebuilding and redeploying only one or a small number of services. Adheres to principles such as fine-grained interfaces (to independently deployable services), business-driven development (e.g. domain-driven design). Wikipedia - https://en.wikipedia.org/wiki/Microservices Martin Fowler
  • 83. 83 Microservices Architecture Client Applications Client Applications Client Applications Master Node API Gateway Service Catalog / Discovery Management / Orchestration Node Node Mgmt. Execution middleware Service Proxy Node Node Distributed Storage R1 R2 Distributed Storage R1 R3 Distributed Storage R2 R3 Execution middleware Execution middleware Service B Service C Service A Service D Service E Microservices Node Mgmt. Service Proxy Node Mgmt. Service Proxy MQ MQ MQ Static Content
  • 84. 84 Ask yourself : do you need microservices ? Microservices are NOT Big Data ! [co-local processing] You don’t need microservices or Kubernetes to benefit from Docker You’re not scaling anything with synchronous calls Don’t do microservices unless: You need independent service-level scalability (vs. storage / processing scalability – Big Data) You need a strong SOA - Service-Oriented Architecture You need independent services lifecycle management Challenges Distributed caching vs reloading the world all over again Not all applications are fit for asynchronous communications (WYCIWYG) Identifying the proper granularity for services Enterprise architecture view is too big Application architecture view is too fine RIA Organizer : good candidates would be : EmailService, CalendarService, ContactService, SearchService Data consistency without distributed transactions. Applications need to be designed with this in mind. Weighting the overall memory and performance waste A Spring boot stack + JVM + Linux Docker base for every single service ? HTTP calls in between layers ? Microservices discussion
  • 86. 86 The Strong frontier between Operational IS and Analytical IS vanishes NoSQL, Streaming, Lambda and Kappa architectures are increasingly overflowing to the Operational IS and as such provide a common ground for operational processes and analytical processes. Historically strong on the BI Side, Hadoop (V3) fits well nowadays for needs of the Operational IS while Kubernetes can be useful on the Analytical IS Kubernetes (also Mesos, etc.) is a cloud Operating System, but not only (distribution, scaling  run your cloud locally) Don’t do Micro-Services unless you need Micro-Services … otherwise just do services :-) Final notes … Operational Information System BI X

Notas do Editor

  1. Motivation : D’un côté : IS operationel et son modèle 3 tiers et IS analytics avec son modèle push a J – 1 De l’autre : les micro-services à tort et à travers Prendre un peu de recul et comprendre ce que la technologie permet et apporte D’abord sommaiement parcourir des modèles de description d’architecture et introduire un outil qui m’accompagne depuis de nombreuses années dans mon travail d’architecte  Agenda
  2. Typically, the Architectural Design decisions are related to key aspects : Structural : Typically, "The architectural elements should be organized like this ...” Behavioural : For instance, "Data processing, storage and visualization will be performed in strict sequence. Interaction : For instance, "Communication among all system elements should occur only using event notification.” Non-functional : For instance, "The system's reliability will be ensured by replicating modules."
  3. A process to design a high level solution – un process qui malheureusement n’est pas documenté sur wikipedia – la compréhension de ce process provient de l’expérience mais est supporté par les deux outils qu’on va voir dans un moment Un produit – la description de l’architecture d’un système çA ne peut pas être un schéma. C’est souvent plusieurs schémas, parfois plusieurs fois le même mais avec des perspectives variants un peu, des spécifications fonctionnelles et non-fonctionnelles, des documentations techniques, etc. Des moyens, des socles techniques, des librairies technoiques ou fonctionnelles, des middlewares etc Mais c’est avant tout une réalité. L’architecture d’un système se définit avant tout par le système en fonctionnement  et l’architecte est la personne qui construit ce système, pas la personne qui fait des schémas dans son bureau
  4. Enterprise Architecture vs Application Architecture L’architecture d’entreprise identifiie comment les différentes applications d’un Système d’information se comportent ensemble contre comme les différents composants se comportent au sein d’une application pour l’architecture applicative. La meilleure image pour comprendre ceci est de considérer l’architecture d’entreprise comme le plan d’une ville tandis que l’architecture d’une application serait le plan d’un immeuble Il y a des différences entre ces deux métiers comme les challenges à adresser, le scope et les sujets traités Mais il y a aussi des grandes similarités, commes les outils à dispositions pour les décrire et les questions à se poser pour identifier les éléments décisionnels
  5. … L’architecture n’est pas tout à fait du design et le design n’est pas tout à fait de l’architecture. Mais la frontière entre ces deux mondes est subtil et surtout floue. Aussi, cette frontière dépend de la perspective, de son interprétation au sein d’une équipe, etc. Neal Ford : “Architecture is about stuff that’s hard to change later” Moi ça me parle. Pour moi l’architecture s’arrête aux décisions structurantes – aussi bien fonctionnelle que non-fonctionnelles - sur le produit à construire ou le système d’information dans son ensemble. Les éléments qu’n peut changer plus tard, qu’on peur refactorer, sont du design, pas de l’architecture.
  6. Logical View … - Fonctionalités et découpe en fonctionalités => identifier les blocs fonctionnels et leur matérialisation. Décrire ou matérlaiser les relations entre blocs fonctionnels Pour moi, la vue logique est intimement liée à la story map même si la granularité et la cardinalité peuvent varier Process View … concrètement, on va chercher à identifier comment les blocs technico-fonctionnels intéragissent entre eux pour réaliser les fonctionalités attendues. Pour ce faire, on va prendre en compte les contraintes fonctionnels mais aussi non-fonctionnelles (performances, scalabilité, la distribution, etc.) Implementation View … Vue du développeut ou on va vouloir voir les packages, les stéréotypes mais aussi répondre à des éléments de gestion du source code. De mon point de vue, c’est la seule vue du modèle de Kruchten qui a l’époque des IntelliJ, de git et de maven n’est peut être plus tout à fait pertinente et on va voir une approche alternative dans un moment Physical View … C’est vraiment l’architecture système … celle où on pose les composants logiciels et système sur les machines sur lesquels on déploie l’applicatif Scenario View Montrer comment tous ces éléments des vues précédentes fonctionnent ensemble pour réaliser les fonctionalités De plus en plus, la vue scénaio est une dérivation de la story map… ou on la laissse même complptement tomber au profit d’une description des user stories. => je ne vais pas plus m’y attarder => On trouve bcp de documentation sur les vues de Kruchten et le 4 + 1 View model online => Donner quelques exemples de vue et du design attenant
  7. … - Regrouper les composants fonctionnels par catégorie business/backend ou presentation/UI Utiliser un code couleur pour la famille fonctionnelle Montrer les associations les plus importantes Montrer des layers – c’est un choix, pas forcément pertinent sur de l’archi fonc. Aussi décider de montrer quelques composants techniques car ils réalisent des éléments fonctionnels importants Au final, j’ai décidé de réaliser un schéma qui me permet de Présenter une découpe fonctionnelle des composants logiciels Communiquer sur la façon don’t ces composants vont porter les fonctionalités essentiels : éditer un email, afficher un email, sauvegarder un email, envoyer un email , etc.
  8. … Kruchten take aways. - Les 4 + 1 vues de Kruchten forment une formalisation des perspectives à décrire en software archtecture. Un outil intéressant et tjrs d’actualité (à pa peut-être la vue implémentation …) Ma critique serait : Bcp de gens se sont evertués à discuter le formalisme en étudiant Kruchten Le formalisme n’a aucun intérêt … cercles --- ASCII art … Un bon outil pour faire de l’architecture doit permettre de se poser les bonnes questions Proposer un autre outil La vue implémentation me déplait, l’architecture est une formalisme abstrait pour communiquer, pas nécessairement qq chose qui s’évertue à décrire une réalité technique Finalement, le formalisme du modèle à 4 + 1 vues (basé sur UML) tend naturellement à déborder de l’architecture sur le design (au niveau applicatif)
  9. Consumerization : new information technologies emerge first in the consumer market and then spread into businesses This is a change compared to the previous situation Companies used to have better servers/desktop/applications/... than those employees could buy at home Now, new solutions emerge every month : companies can't keep up New trend : employees are hired with their devices and their applications  BYOD trend : employees are more comfortable and more efficient with their own devices Same power in an iPad now than in a Cray a few years back This consumerization can be found in infrastructures too and is an enabler for the consumer market A direct consequence of the consumerization: use of a mix of professional and personnal tools by employees (Office Suite, Gmail, Google+, Twitter, Facebook, Dropbox, evernote, ...) Nowadays several companies are still blocking acccess to these tools from their employees (private banks). Tomorrow, that won’t be possible anymore. People are used to be connected all the time, with highly efficient devices on highly responsive services, everywhere and for all kind of uses.
  10. The revolution came from the web giants. They had to find technical answers to business challenges like : GGL : Index the whole web, and keep a response time to any below one second - or how to keep the search free for the user ? LINK : understand how millions of users use their website ? AMZ : how to build a product recommendation engine for millions of customers, on millions of products ? EBAY : how to do a search in ebay ads, even with misspelling ?
  11. Since we started estimating and measuring the amount of produced data until 2003, 5 exabytes (5 billions gigabytes) have been produced. In 2011, that quantity was generated in 2 days (think of facebook, twitter, google searches logs, financial transaction logs, etc.) In 2014, this quantity is generated in 10 minutes. Not only do we generate more and more data We have the means and the technology to analyze, exploit and mine it and extract meaningful business insights The data generated by the company’s own systems can be a very interesting source if information regarding customer behaviours, profiles, trends, desires, etc. But also external data, facebook, twitter logs, etc. Twitter story : Uber car transportation system in Paris. A driver has refused to carry a customer because the customer was gay. That customer twitted his misadventure. The driver got excluded by Uber only a few hours later. Instead of harming Uber’s reputation, the story rather gave it credit. Just an example on how a company can get significant advantages by monitoring social network feeds
  12. For a long time, the increasing volume of data to be handled has not been an issue The volume of data rises, the number of user rises The processing abilities rise as well, sometimes even more See the Moore low above This model has hold for a very long time. The cost are going down, the computing capacities are rising, one simply needs to buy a new machine to absorb the load increase. This is especially true in the mainframe There wasn’t even any need to make the architecture of the systems (COBOL, etc.) evolve for 30 years Even outside the mainframe world The architecture patterns and styles we are using in the operational IS world haven’t really evolve for the last 15 years Despite new technologies such as Web, Web 2.0, Java, etc. of course I’m just speaking about architecture and styles The analytical systems architecture hasn’t evolve for the last 20 years So everything’s fine ? No ! As we’ll see, at least two problems emerged relatively recently
  13. 1st concern : the throughput We are able to store more and more data, no problem Yet we are more and more unable to manipulate this data efficiently Specifically, fetching all the data on a computation machine to process it is becoming more and more difficult
  14. One challenge : how to handle the massive computation needs / massive amount of data ? -> New architecture and paradigms are required 3 ideas …
  15. Availability Availability (or lack thereof) is a property of the database cluster. The cluster is available if a request made by a client is always acknowledged by the system, i.e. it is guaranteed to be taken into account That doesn’t mean that the request is processed immediately. It may be put on hold. An available system will at a minimum acknowledge it Practically speaking, availability is usually measured in percents. For instance, 99.99% availability means that the system is unavailable at most 0.01% of the time, that is, at most 53 min per year Partition tolerance Partition Tolerance is verified if a system made of several interconnected nodes can stand a partition of the cluster; if it continues to operate when one or several nodes disappear. This happens when nodes crash or when a network equipment is shut down, taking a whole portion of the cluster away Partition tolerance is related to availability and consistency, but it is still different. It states that the system continues to function internally (e.g. ensuring data distribution and replication), whatever its interactions with a client Consistency When talking about distributed databases, like NoSQL, consistency has a meaning that is somewhat more precise than in the relational context It refers to the fact that all replicas of an entity, identified by a key in the database, have the same value whatever the node queried With many NoSQL databases, updates take a little time to propagate across the cluster. When an entity’s value has just been created or modified, there is a short span during which the entity is not consistent. However the cluster guarantees that it will eventually be, when replication has occurred. This is called eventual consistency
  16. GFS / Map Reduce – 2002 / BigTable 2006
  17. Ex gisement de données/ réservoir de donénes, ou hub de données
  18. Monde de la décision opérationnelle. Potentiellement bcp de règles, à faire évoluer fréquemment. Hors de question de renvoyer le tout à la MOE 3 mois : on doit aller vite = analyste côté métier doit pouvoir les faire évoluer (= pas du dev) pouvoir imaginer de nouvelles règles, les simuler sur l’historique (backtesting)
  19. Monde de la décision opérationnelle. Potentiellement bcp de règles, à faire évoluer fréquemment. Hors de question de renvoyer le tout à la MOE 3 mois : on doit aller vite = analyste côté métier doit pouvoir les faire évoluer (= pas du dev) pouvoir imaginer de nouvelles règles, les simuler sur l’historique (backtesting)
  20. Dans la suite on va regarder en détail HDFS et MR