1. An Overview of Modern
Scalable Web Development
Septeni Technology
tung_nt
&
Letâs take a tour of morden tech trends
2. Agenda
⢠Motivation and Challenges
⢠The evolution of Software architecture
⢠Big Data
⢠AI - Machine Learning
⢠Cloud Computing
⢠Septeni Techstack
3. Motivation & Challenges
⢠90 percent of the data in the world today has been created in the
last two years alone, creating 2.5 quintillion(10^18) bytes of data
every day (*)
⢠Faster & Concurrency (Realtime or Near Realtime)
⢠Resilient (~100% uptime)
⢠Large-scale
â According to a report from IBM Marketing Cloud (2016)
5. Reactive System design
principles
⢠Responsive, even in the face of failure
⢠Elastic, responsive under load
⢠Resilient, expect failure, programmatic and systemic
⢠Message-driven, the only way to communicate asynchronously in a
distributed environment
8. Problems with monolithic
architecture
⢠Pros:
â Simple to develop
â Simple to test
â Simple to deploy
⢠Cons:
- Hard to scale (too large and
complex)
- Leading to âBig ball of Mudâ
- Is a barrier to adopting new
technologies
9. The evolution of software
architecture
⢠Scale-up vs Scale-out (or Vertical Scale vs Horizontal scale)
⢠MVC Monolith Distributed Services Oriented (SOA, Microservices)
10. Services-Oriented Architecture
⢠Pros:
â Tackling Complexity in Large-Scale
Systems
â Easy to scale-out (Scalability)
â Distributed & Containers friendly
â Develop, test, deploy independently
â âŚ
⢠Cons:
- System testing is much more
complex.
- Not suitable with small application
11. The Traditional Microservices Architecture
Components:
⢠Load balancer
⢠API Gateway
⢠Service Discovery
⢠Independent self-container
services with comunication
endpoint (RestAPI,
Messaging)
⢠âŚ
13. Concepts
⢠Data Warehouse & Data Mart
⢠OLTP vs OLAP
⢠HDFS, MapReduce
⢠Big Data architecture
â Batch processing
â Real-time processing
14. Data Warehouse
⢠Is a database that is designed for query and analysis data
⢠Characteristics:
⣠Subject oriented
⣠Integrated
⣠Time Variant
⣠Non-volatile
⣠Separated from Operational Databases
⢠Schema:
⣠Star
⣠SnowďŹake
⣠Galaxy
15. Data Mart
⢠The data mart is a subset of the data warehouse
⢠Is usually oriented to a speciďŹc business line or team
⢠Improve end-user response time
⢠Types:
1. Dependent: created from an existing data warehouse.
2. Independent: Data is extracted from internal or external data
sources (or both).
3. Hybrid: combines data from an existing data warehouse and
other operational source system
16. Why Data Warehouse?
⢠Make better business decisions:
⢠Develop data-driven strategies
⢠Make decisions consulting the facts
⢠Quick access to organization's
historical activities:
⢠Evaluate initiatives that have been
successful â or unsuccessful â in
the past
17. OLAP vs OLTP
OLAP - Online analytical processing:
⢠Data Warehouse
⢠Historical processing
⢠Used to analyze the business.
⢠Schemas: Star, SnowďŹake, Galaxy
⢠Contains historical data
⢠Highly ďŹexible
OLTP - Online transactional processing:
⢠Operational Database
⢠Day-to-day processing
⢠Used to run the business
⢠Schemas: Entity Relationship Model
⢠Contains current data
⢠High performance
18. Building a Data Warehouse
(aka Data Warehousing)
Some steps that are needed for building a data warehouse are as
following below:
1. Extract the data from different data sources.
2. Transform the data.
3. Load the data into the dimensional database.
Extract - Transform - Load (ETL) Task
19. Problems with traditional data warehousing
⢠Only handles structure data (relational or not relational)
⢠Processing is based on schema-on-write concepts
⢠Top-down approach (extract data by requirements)
⢠Suitable for data with small volume and itâs too much expensive for
large volume data
21. What is HDFS and MapReduce?
⢠Hadoop Distributed File System (HDFS):
Is the ďŹle system used by Hadoop to store data among different
clusters of machine
⢠MapReduce:
Is a processing technique and a program model for distributed
computing
22. Why Hadoop and Data Lake?
⢠Dealing with semi-structured (JSON, XML, Avro) and unstructured
data (plaintext)
⢠Schema-on-Read
⢠Using analytics engine (Hadoop)
⢠Bottom-up approach
⢠Data hoarding
â all data has potential value
⢠Dealing with large volume data
26. An example of a real-life ML system
Flow:
1. Manage data
2. Train models
3. Evaluate models
4. Deploy models
5. Make predictions
6. Monitor predictions
Uber Michelangelo - ML End to End Platform
27. Roles - Skill in a ML project
⢠Software Engineer:
â Build system to collect data, avoid
bottlenecks and let ML algorithms
scale well with increasing volumes of
data
â Deploy & Integrate ML model to system
⢠Applied ML Engineer:
â Strong knowledge about ML framework
(TensorďŹow, scikit-learn, PyTorch,
CaffeâŚ) and ML algorithms to tuning
hyper-parameter and train new model
⢠Core ML Engineer:
â Modeling, visualize and evaluate data
and monitor them
⢠Data scientist:
â Analyzing data in order to tell a story
29. Cloud Computing Type
⢠Infrastructure as a Service (IaaS):
⢠Virtualized hardware resource as a service
⢠Platform as a Service (PaaS):
⢠Virtualized OS, runtime, middleware, etc as a service
⢠Software as a Service (SaaS):
31. Why Cloud Computing?
⢠Easy to scale
⢠Reliability
⢠Cost on-demand
⢠Securities
⢠Focus to application
Cloud computing economies of scale.
32. Most popular cloud provider
⢠Amazon Web Services (AWS)
⢠Google Cloud Platform (GCP)
⢠Microsoft Azure
⢠IBM Cloud
⢠Oracle Cloud
⢠âŚ