A Short Course on the Internet of Things

A Short Course on the Internet of Things
Prasant Misra, Ph.D.
W: https://sites.google.com/site/prasantmisra

Course Content (240 mins)
Gateway
Field Area Network
Back-haul Network
CloudInfraFog
Networked
FieldDevice
“Last-mile”
 IoT Primer (40 mins.)
 History of Computing and Trends
 Industrial IoT and Industry 4.0
 IoT Architecture Primer (20 mins.)
 Functional Architecture
 IoT “Last-mile” Considerations (60 mins.)
 Field Devices & Platforms
 Field Device Stack
(PHY, MAC, NTWK, ROUTING, TRANSPORT, APP)
 IoT “Last-mile” Communication Nuances
(30 mins.)
 IoT “Last-mile” Existing and Upcoming
Standards (30 mins.)
 Derivatives for Intelligence (60 mins.)
 Nature of Data Analysis
 Intelligence with Machine Learning
2

1960 - 70
1980 - 90
2000 -10 and
beyond
Year
Size
History of Computing
Accessibility to cyber end points have increased drastically …
5

Trend-I: Device/Data Proliferation (by Moore’s Law)
Wireless Sensor Networks (WSN) Medical Devices
Industrial Systems Portable Smart DevicesRFID
6

Trend-I: Device Proliferation
http://www.onethatmatters.com/wp-content/uploads/2015/12/Internet-of-Things-why.png
NAIVE
7

Trend-I: DATA Proliferation
Web & Social
Media
Enter-
prises
Gov.
8

Trend-II: Integration at Scale (Isolation has cost !!!)
(World Wide) Sensor Web
(Feng Zhao)
Future Combat Systems
Ubiquitous embedded devices
• Large scale network embedded systems
• Seamless integration with the physical environment
Complex system with global integration
9

Trend-III: Evolution: Man vs. Machine
The exponential proliferation of embedded devices (courtesy of Moore’s Law) is NOT
matched by a corresponding increase in human ability to consume information !
Increase in Machine Autonomy !!! 10

Confluence of Trends
Distributed,
Information
Distillation and
Control Systems of
Embedded Devices
Trend-1:
Data & Device
Proliferation
Trend-3:
Autonomy
Trend-2:
Integration at
Scale
11

Confluence of Technologies
CPS
Trend-1:
Sensing &
Actuation
Trend-3:
Computation &
Control
Trend-2:
Communication &
Networking
A cyber-physical system (CPS) refers to a tightly integrated system that is engineered with a
collection of technologies, and is designed to drive an application in a principled manner.
12

Functional Blocks of CPS
Enormous SCALE : both in space and time
13

Enormous SCALE : both in space and time
Functional Blocks of CPS
14

Casting CPS Technology into Application Requirement
Use Case: Adaptive Lighting in Road Tunnels
Problem: Control the tunnel lighting levels in a manner that ensures continuity of light conditions
from the outside to the inside (or vice-versa) such that drivers do not perceive the tunnel as too
bright or dark.
Solution: Design a system that is able to account for the change in light intensity (i.e., detect physical
conditions and interpret), and adjust the illumination levels of the tunnel lamps (i.e., respond) till a
point along the length of the tunnel where this change is indiscernible to the drivers (i.e., reason and
control in an optimal manner). 15

Casting CPS Technology into Application Requirement
Use Case: Smart Buildings/Homes
Problem: How to make buildings/homes (both new and existing) ‘smarter’ ?
• Energy efficient
• Damage prevention
• Increased comfort
16

Beaming from CPS to IoT : The SCALE is even BIGGER !!!
C1 C2 Cn
P1 P2 Pn
CPS
Internet
CyberworldPhysicalworld
NoT
IoT = CPS + People ‘in-the-loop’ (that act as sensors, actuators, controllers)
IoT = CPS + Hybrid (tight and loose) sense of control 17

CPS & IoT
 Gives us the ability to look more broadly (SCALE), deeply (PRECISION) and
over extended periods of time at the physical world
 As a result, our interactions with the physical world has increased !!!
Example of a Killer APP: Navigation System
18

Navigation System - I
Context Service Example
Current
Location
Local business
19

Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
Navigation System - II
20

Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
 Take 520 East
Navigation System - III
21

Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
35% people pick
the scenic route
Navigation System - IV
22

Alert: Bad
Traffic
Consider
Alternate
route
Current
Location
Local business and
directions
Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
+
Push
alerts, triggers,
reminders
Navigation System - V
23

Some formalism and SYSTEMS feel …
24

IoT: Vision and Value Proposition
Vision:
Build a ubiquitous society where everyone (“people”) and everything (“systems,
machines, equipment and devices") is immersively connected.
Value Proposition:
 Connected “Things” will provide utility to “People”
 Digital shadow of “People” will provide value to the “Enterprise”
25

The FORTUNE TELLER or NOT …
IIoT and Industry 4.O is ALL about re-imagination !!!
 Improve flexibility, reliability and time to market/scale
 Improve customer intimacy and profitability
 Improve revenue and market position
28

Is the Internet of Things disruptive?
OR
Are they repackaging known technologies
and making them a little better?
What is your take ?
29

Internet of Things :
Architectural Design Primer

High-level Functional Architecture
DATA @ REST (VOLUME)
Archival/Static data (TBs) in Data
stores
DATA @ MOTION (VELOCITY)
Streaming data
DATA @ MANY FORMS (VARIETY)
Structured/Unstructured, Text,
Multimedia, Audio, Video
DATA @ DOUBT (VERACITY)
Data with uncertainty that may be
due to incompleteness, missing
points, etc.,
PRESCRIPTIVE
What are the best outcomes ?
PREDICTIVE
What could happen ?
DESCRIPTIVE
What has happened ?
DISCOVERY
What do we have ?
NATURE of INGESTED DATA
NATURE of ANALYSIS
DATA
KNOWLEDGE
31

Detailed Functional Architecture
CloudInfra
Fog
Networked
FieldDevice
“Last-mile”
Gateway
Field Area Network
Back-haul Network
Gateway
Field Area Network
Back-haul Network
Gateway
Field Area Network
Back-haul Network
32

Functional Architecture Layers and their Key Physical Attributes
Gateway
Physical Attribute
Field Devices
(with Sensing, Compute and
Actuation HW)
Functionality
Sense
Actuate
Control
Physical Attribute
Last-mile connectivity
PAN, HAN, FAN, NAN, CAN,
WAN, etc.,
Functionality
Connection
Management
Routing
Physical Attribute
Data Storage
Functionality
Ingestion
Semantics
Transformation
Functionality
Interoperability
Security
Access Control
Functionality
Business Logic
Orchestration
Functionality
Input
Output
Transform
Physical Attribute
Common Service Functions
Physical Attribute
Business Logic & Related
Functions
Physical Attribute
Users
33

Recap : Functional Architecture
Service Oriented
Approach
Application & Business Architecture
Describing the service strategy, the organizational,
functional process information and geographic aspects of
the environment based on the strategic goals and
strategic drivers
Information Systems Architecture
Describing information/data structure and semantics,
types and sources of data necessary to support various
smart application
Data Access Architecture
Describing technical components (software, hardware),
access technology and data aggregation policies
Information Security
characterized by:
• Availability
• Integrity
• Confidentiality
Interoperability
characterized by:
• Syntactic
• Semantics
34

“Last-mile” Considerations

What ROUTE are we going to take ?
36

Popular Communication, Networking, and Control Standards
for Industrial Systems
37

“GO Wireless” and “GO w or w/o IP” !!!
39

“Last-mile” Consideration
w.r.t.
Low-power, Wireless,
Constrained Field Devices &
Networks
Gateway
Field Area Network
Back-haul Network
CloudInfraFog
Networked
FieldDevice
“Last-mile”
40

 Consist of many embedded units called sensor nodes, motes etc. ,.
 Sensors (and actuators)
 Small microcontroller
 Limited memory
 Radio for wireless communication
 Power source (often battery)
 Communication centric systems
 Motes form networks, and in a one hop or multi-hop fashion transport sensor data to
base station
Background: Wireless Sensor Networks (WSN)
41

• Processing speed ?
• Memory ?
• Storage ?
• Power consumption ?
BTNodeMicaZ dotMote Fleck Tmote Sky
Radio
Sensors/
Actuators
Microcontroller Storage
Power Source
Architecture: WSN platforms
42

WSN Node: Core Features
Limited Energy Reserves
– PREMIUM resource
Under MAC Control
(bit) RISC
KBytes
KBytes
(bit)
KBytes
#
#
43

Sensor Web: Field Device Stack
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
Do we need a LAYERED approach @
the Field Device level ?
44

PHYsical Layer Technologies
45

(Popular) Short and Medium Range Low Power Wireless Technology
Technology Standard Body
Frequency
Band
Max
Range
Max
Data Rate
Max
Power
Network
Type
Bluetooth Bluetooth SIG 2.4 GHz ISM 100 m 1-3 Mbps 1 W WPAN
Bluetooth
Smart
IoT Interconnect 2.4 GHz ISM 35 m 1 Mbps 10 mW WPAN
ZigBee
IEEE 802.15.4,
Zigbee Alliance
2.4 GHz ISM 160 m 250 Kbps 100 mW Star, Mesh
Wi-Fi
IEEE 802.11
g/n/ac/ad
2.4/5/60 GHz 100 m
6-780 Mbps,
6 Gbps @ 60
GHz
1 W
Star, Mesh
Zwave Zwave 908 MHz 30 m 100 Kbps 1 mW
Star, Mesh
ANT+ ANT Alliance 2.4 GHz 100 m 1 Mbps 1 mW
Star, Mesh
Rubee
IEEE 1902.1, IEEE
1902.2
131 kHz 5 m 1.2 Kbps 40-50 nW P2P
46

Low Power Wide Area Networking Technology
Technology
Standards/
Governing
Body
Frequency
Band
Max Range
Max
Data Rate
Topology
Devices /
Access
Point
Weightless -
SubGHz ISM,
TV Whitespaces
2-5 k (urban)
200 bps –
100 Kbps,
W: 1 Kbps –
10 Mbps
Star Unlimited
LoraWAN LoRa Alliance
433/780/868/9
15 MHz ISM
2.5 -15 km
0.3 – 50
Kbps
Star 1 million
SigFox SigFox
Ultra narrow
Band
30-50 km
(rural), 3-10
km (urban)
100 bps Star 1 million
WiFi LowPower
IEEE
P802.11ah
SubGHz
1 km
(outdoor)
150 - 340
kbps
Start, Tree
-
Dash7 Dash7 Alliance
433/868/915
MHz
2 km
9.6/56/167
Kbps
Star, Tree
-
LTE-Cat 0 3GPP R-13 Cellular 2.5 -5 km 200 kbps Start
> 20,000
UMTS (3G),
HSDPA / HSUPA
3GPP Cellular
27 km, 10
km
0.73 - 56
Mbps
Star
Hundreds
per cell
47

Taxonomy of Key IoT Wireless Technologies
48

Low Power Communication Technologies: Frequency
49

Low Power Communication Technologies: Data Rate
50

Low Power Communication Technologies: Range
51

Low Power Communication Technologies: Energy
52

Internet of Things : “Last-mile” Considerations
Case study with IEEE 802.15.4

54
Existing Stack using IEEE 802.15.4 as the PHY Layer

IEEE 802.15.4 IEEE 802.15.4
IEEE 802.15.4 PHY
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY 55

IEEE 802.15.4: Quick Facts
IEEE 802.15.4
 Offers physical and media access control layers for low-speed, low-power wireless personal
area networks (WPANs)
 16 non-overlapping channels, spaced 5 MHz apart; and occupy frequencies 2405-2480 MHz
 Provides a physical layer bandwidth of 250kbps
 Shares the same frequency band as IEEE 802.11 and Bluetooth
56

IEEE 802.15.4: Radio Characteristics
57

IEEE 802.15.4: Device Classes
Full Function Device (FFD)
 Any topology
 PAN coordinator capable
 Talks to any other device
 Implements complete protocol set
Reduced Function Device (RFD)
 Reduced protocol set
 Very simple implementation
 Cannot become a PAN coordinator
 Limited to leafs in more complex topologies
58

IEEE 802.15.4: Topology Types
Star Topology
 All nodes communicate via the central PAN coordinator
 Leafs may be any combination of FFD and RFD devices
 PAN coordinator is usually having a reliable power source
Peer-to-Peer Topology
 Nodes can communicate via the central PAN coordinator
and via additional point-to-point links
 Extension of the pure star topology
Cluster Tree Topology
 Leafs connect to a network of coordinators (FFDs)
 One of the coordinators serves as the PAN coordinator
 Clustered star topologies are an important case
(e.g., each hotel room forms a star in a HVAC system)
59

IEEE 802.15.4: Frame Formats
 Max. frame size: 127 octets
 Max. frame header: 25 octets
60

IEEE 802.15.4: Frame Formats
 Beacon Frames
Broadcasted by the coordinator to organize the network
 Command Frames
Used for association, disassociation, data and beacon requests, conflict notification, . . .
 Data Frames
Carrying user data
 Acknowledgement Frames
Acknowledges successful data transmission (if requested)
61

Link Layer Protocols L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IEEE 802.15.4 IEEE 802.15.4
62

 Why do we need MAC ?
 Wireless channel is a shared medium
 Radios, within the communication range of each other and operating in the same
frequency band, interfere with each others transmission
 Interference -> Collision -> Packet Loss -> Retransmission -> Increase in net energy
 The role of MAC
 Co-ordinate access to and transmission over the common, shared (wireless) medium
 Can traditional MAC methods be directly applied to WSN ?
 Control -> often decentralized
 Data -> low load but convergecast communication pattern
 Links -> highly volatile/dynamic
 Nodes/Hops -> Scale is much larger
 Energy is the BIGGEST concern
 Network longetivity, reliability, fairness, scalability and latency
are more important than throughput
MAC is Crucial !!!
63

MAC Family
Reservation
(Scheduled, Synchronous)
Contention
(Unscheduled, Asynchronous)
 Reservation-based
 Nodes access the channel based on a schedule
 Examples: TDMA
 Limits collisions, idle listening, overhearing
 Bounded latency, fairness, good throughput (in loaded traffic conditions)
 Saves node power by pointing them to sleep until needed
 Low idle listening
 Dependencies: time synchronization and knowledge of network topology
 Not flexible under conditions of node mobility, node redeployment and node death:
complicates schedule maintenance
 Contention-based
 Nodes compete (in probabilistic coordination) to access the channel
 Examples: ALOHA (pure & slotted), CSMA
 Time synchronization “NOT” required
 Robust to network changes
 High idle listening and overhearing overheads
Taxonomy
64

MAC: Reservation vs. Contention
65

 Collisions
 Node(s) is/are within the range of nodes that are transmitting at the same time -> retransmissions
 Overhearing
 The receiver of a packet is not the intended receiver of that packet
 Overhead
 Arising from control packets such as RTS/CTS
 E.g.: exchange of RTS/CTS induces high overheads in the range of 40-75% of the channel capacity
 Idle Listening
 Listening to possible traffic that is not sent
 Most significant source of energy consumption
Function Protocols
Reduce Collisions CSMA/CA, MACA, Sift
Reduce Overheads CSMA/ARC
Reduce Overhearing PAMAS
Reduce Idle Listening PSM
Causes of Energy Consumption
66

Low-power, Constrained Field Devices MAC Family
Scheduled
(periodic, high-load traffic)
Common Active Periods
(medium-load traffic)
Preamble Sampling
(rare reporting events)
67

 Build a schedule for all nodes
 Time schedule
 no collisions
 no overhearing
 minimized idle listening
 bounded latency, fairness, good throughput (in loaded traffic conditions)
 BUT: how to setup and maintain the schedule ?
Function Protocols
Canonical Solution TSMP, IEEE 802.15.4
Centralized Scheduling Arisha, PEDAMACS, BitMAC, G-MAC
Distributed Scheduling SMACS
Localization-based Scheduling TRAMA, FLAMA, uMAC, EMACs, PMAC
Rotating Node Roles PACT, BMA
Handling Node Mobility MMAC, FlexiMAC
Adapting to Traffic Changes PMAC
Receiver Oriented Slot Assignment O-MAC
Using different frequencies PicoRadio, Wavenis, f-MAC, Multichannel LMAC,
MMSN, Y-MAC, Practical Multichannel MAC
Other functionalities LMAC, AI-LMAC, SS-TDMA, RMAC
Scheduled MAC Protocols
68

Time Synchronized Mesh Protocol (TSMP): Overview
 Goal: High end-to-end reliability
 Major Components
 time synchronized communication (medium access)
 TDMA-based: uses timeslots and time frames
 Synchronization is achieved by exchanging offset information (and not by
beaconing strategies)
 frequency hopping (medium access)
 automatic node joining and network formation (network)
 redundant mesh routing (network)
 secure message transfer (network)
 Limitations
 Complexity in infrastructure-less
networks
 Scaling is a challenge
 Finding a collision free
schedule is a two-hop
coloring problem
 Reduced flexibility to adapt to
dynamic topologies 69

 Nodes define common active/sleep periods
 active period -> communication, where nodes contend for the channel
 sleep period -> saving energy
 need to maintain a common time reference across all nodes
Function Protocols
Canonical Solution SMAC
Increasing Flexibility TMAC, E2MAC, SWMAC
Minimizing Sleep Delay Adaptive listening, nanoMAC, DSMAC, FPA, DMAC, Q-MAC
Handling Mobility MSMAC
Minimizing Schedules GSA
Statistical Approaches RL-MAC, U-MAC
Using Wake-up Radio RMAC, E2RMAC
Common Active Period MAC Protocols
70

 Goal: reduce energy consumption, while supporting good scalability and collision
avoidance
 Major Components
 periodic listen and sleep
 Copes with idle listening: uses a scheme of active (listen) and sleep periods
 Active periods are fixed; Sleep periods depend on a predefined duty-cycle param
 Synchronization is used to form virtual clusters of nodes on the same sleep schedule
 Schedules coordinate nodes to minimize additional latency
 collision and overhearing avoidance
 Adopts a contention-based scheme
 In-channel signaling is used to put each node to sleep when its neighbor is transmitting to
another node; thus, avoids the overhearing problem but does not require an additional
channel
 message passing
 Small packets transmitted in bursts
 RTS/CTS reserves the channel for the whole burst duration rather than for each packet;
hence unfair from a per-hop MAC level
Sensor MAC (S-MAC): Overview
71

 Periodic Listen and Sleep
 Each node goes to sleep for some time, and then wakes up and listens to see if any other
node wants to talk to it. During sleep, the node turns off its radio, and sets a timer to awake
itself later.
 Maintain Schedules
 Maintain Synchronization
S-MAC - I
72

 Collision and Overhearing Avoidance
 Adopts a contention based scheme
 Collision Avoidance
 Overhearing Avoidance
 Basic Idea
 A node can go to sleep whenever its neighbor is talking with another node
 Who should sleep?
 The immediate neighbors of sender and receiver
 How to they know when to sleep?
 By overhearing RTS or CTS
 Hog long should they sleep?
 Network Address Vector (NAV)
 Message Passing
 How to transmit a long message?
 Transmit it as a single long packet
 Easy to be corrupted
 Transmit as many independent packets
 Higher control overhead & longer delay
 Divide into fragments, but transmit all in burst
S-MAC - II
73

 Adaptive duty cycle: duration of the active period is no longer fixed but varies according
to traffic
 Prematurely ends an active period if no traffic occurs for a duration of TA
Timeout MAC (TMAC): Overview
74

 Goal: minimize idle listening -> minimize energy consumption
 Operation
 Node periodically wakes up, turns radio on and checks channel
 Wakeup time fixed (time spend sampling RSSI?)
 “Check interval” variable
 If energy is detected, node powers up in order to receive the packet
 Node goes back to sleep
 If a packet is received
 After a timeout
 Preamble length matches channel “checking interval”
 No explicit synchronization required
 Noise floor estimation used to detect channel activity during LPL
Preamble Sampling MAC Protocols
75

Function Protocols
Canonical Solution Preamble-Sampling ALOHA, Preamble-Sampling CSMA, Cycled
Receiver, LPL, Channel polling
Improving CCA BMAC
Adaptive Duty Cycle EA-ALPL
Reducing Preamble Length by
Packetization
X-MAC, CSMA-MPS, TICER, WOR, MH-MAC, DPS-MAC, CMAC,
GeRAF, 1-hopMAC, RICER, SpeckMAC-D, MX-MAC
Reducing Preamble Length by
Piggybacking Synchronization
Information
WiseMAC, RATE EST, SP, SyncWUF
Use Separate Channels STEM
Avoiding Unnecessary
reception
MFP, 1-hopMAC
Drawbacks:
 Costly collisions
 Longer preamble leads to higher probability of collision in applications with considerate traffic
 Limited duty cycle
 “Check interval” period cannot be arbitrarily increased -> longer preamble length
 Overhearing problem
 The target receiver has to wait for the full preamble before receiving the data packet: the per-
hop latency is lower bounded by the preamble length. Over a multi-hop path, this latency can
accumulate to become quite substantial.
Preamble Sampling MAC Protocols
76

Goals:
 Simple and predictable; Effective collision avoidance by improving CCA
 Tolerable to changing RF/networking conditions
 Low power operation; Scalable to large numbers of nodes; Small code size and RAM usage
CCA
 MAC must accurately determine if channel is clear
 Need to tell what is noise and what is a signal
 Ambient noise is prone to environmental changes
 BMAC solution: ‘software automatic gain control’
 Signal strength samples taken when channel is assumed to be free – When?
 immediately after transmitting a packet
 when the data path of the radio stack is not receiving valid data
 Samples go in a FIFO queue (sliding window)
 Median added to an EWMA (exponentially weighted moving average with decay α) filter
 Once noise floor is established (What is a good estimate?), a TX requests starts monitoring
RSSI from the radio
CCA: Thresholding vs. Outlier Detection
 Common approach: take single sample, compare to noise floor
 Large number of false negatives
 BMAC: search for outliers in RSSI
 If a sample has significantly lower energy than the noise floor during the sampling period, then
channel is clear
Berkeley MAC (BMAC): Overview
77

 0=busy, 1=clear
 Packet arrives between 22 and 54 ms
 Single-sample thresholding produces several false ‘busy’ signals
BMAC
78

 Series of short preamble packets each containing target address information
 Minimize overhearing problem
 Reduce latency and reduce energy consumption
 Strobed preamble: pauses in the series of short preamble packets
 Target receiver can shorten the strobed preamble via an early ACK
 Small pauses between preamble packets permit the target receiver to send an early ACK
 Reduces latency for the case where destination is awake before preamble completes
 Non-target receivers that
overhear the strobed preamble
can go back to sleep immediately
 Preamble period must
be greater than sleep period
 Reduces per-hop latency and energy
XMAC: Overview
79

Wireless Sensor (Wise) MAC: Overview
WiseMAC uses a scheme that learns the sampling schedule of direct neighbors and exploits
this knowledge to minimize the wake-up preamble length
 ACK packets, in addition to a carrying the acknowledgement for a received data packet, also have
information about the next sampling time of that node
 Node keeps a table of the sampling time offsets of all its usual destinations up-to-date
 Node transmits a packet just at the right time, with a wake-up preamble of minimized size
80

Wireless Sensor (Wise) MAC: I
How does the system cope with Clock drifts ?
 Clock drifts may make the transmitter lose accuracy about the receiver’s wakeup time.
 Transmitter uses a preamble that is just long enough to make up for the estimated maximum clock
drift.
 The length of the preamble used in this case depends on clock drifts: the smaller the clock drift, the
shorter the preamble the transmitter has to use.
What if the node has no information about the wakeup time of a neighbor node ?
 Node uses a full-length preamble
81

Function Protocols
Flexible MAC Structure IEEE 802.15.4
CSMA inside TDMA Slots ZMAC
Minimizing Convergecast Effect Funneling MAC, MH-MAC
Slotted and Sampling SCP
Receiver based Scheduling Crankshaft
Hybrid Protocols
82

Funneling MAC: Overview
ConvergcastComms
High traffic intensity:
80% of packet loss happens in the
2-hop region from the SINK
83

IEEE 802.15.4 MAC: Overview
 Two different channel access methods
 Beacon-Enabled duty-cycled mode (typically, used in FFD networks)
 Non-Beacon Enabled mode (aka Beacon Disabled mode)
84

IEEE 802.15.4 Beacon Enabled Mode
CAP: Contention Access Period | CFP: Collision Free Period | GTS: Guaranteed Time Slot
 Node listen to Beacon and check IF GTS is reserved
 If YES: remain powered off until GTS is scheduled
 If NO: Performs CSMA/CA during CAP
 Synchronization
 Sync with Tracking Mode
 Sync with Non Tracking Mode
85

A Tribute to Fieldbus Technology …
86

Milestones of Fieldbus Evolution and Related Fields
87

MAC Strategies in Fieldbus systems
88

IP over IEEE 802.15.4
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IPv6 over
IEEE 802.15.4
IPv6 over
IEEE 802.15.4
90

Field Devices: Network Topology Planning
 STAR topologies are the easiest to setup and manage
 STAR will simply the network design, and if there is just 1-hop communication between
the field devices and gateway, then the need for the "routing layer" on the stack of the
field devices many not arise ... thereby making it more energy efficient and lightweight.
 TREE and MESH are also interesting concepts, but they are very tedious to manage. 91

IPv6 over IEEE 802.15.4 (6LoWPAN)
Benefits of IP over 802.15.4 (RFC 4919)
 The pervasive nature of IP networks allows use of existing infrastructure
 IP-based technologies already exist, are well-known, and proven to be working
 Open and freely available specifications vs. closed proprietary solutions
 Tools for diagnostics, management, and commissioning of IP networks already exist
 IP-based devices can be connected readily to other IP-based networks, without the need
for intermediate entities like translation gateways or proxies
92

6LoWPAN Challenge
Header Size Calculation
 IPv6 header is 40 octets, UDP header is 8 octets
 802.15.4 MAC header can be up to:
 25 octets (null security)
 25+21=46 octets (AES-CCM-128)
 With the 802.15.4 frame size of 127 octets, the following space left for application data:
 127-25-40-8 = 54 octets (null security)
 127-46-40-8 = 33 octets (AES-CCM-128)
IPv6 MTU Requirements
 IPv6 requires that links support an MTU of 1280 octets
 Link-layer fragmentation / reassembly is needed
93

6LoWPAN Overview (RFC 4944)
Overview
 An adaptation layer allowing transport of IPv6 packets over 802.15.4 links
 Uses 802.15.4 in unslotted CSMA/CA
 Based on IEEE standard 802.15.4-2003
 Fragmentation / reassembly of IPv6 packets
 Compression of IPv6 and UDP/ICMP headers
 Mesh routing support (mesh under)
 Low processing / storage costs
94

6LoWPAN Dispatch Codes
 All 6LoWPAN encapsulated datagrams are prefixed by an encapsulation header stack
 Each header in the stack starts with a header type field followed by zero or more
header fields
95

6LoWPAN Frame Formats
Uncompressed IPv6/UDP (worst case scenario)
 Dispatch code (010000012) indicates no compression
 Up to 54 / 33 octets left for payload with a max. size MAC header with null / AES-CCM-128
security
 The relationship of header information to application payload is obviously really bad
96

6LoWPAN Frame Formats
Compressed Link-local IPv6/UDP (best case scenario)
 Dispatch code (010000102) indicates HC1 compression
 HC1 compression may indicate HC2 compression follows
 This shows the maximum compression achievable for link-local addresses (does not work
for global addresses)
 Any non-compressible header fields are carried after the HC1 or HC1/HC2 tags (partial
compression)
97

Header Compression, Fragmentation & Reassembly
Compression Principles (RFC 4944)
 Omit any header fields that can be calculated from the context, send the remaining
fields unmodified
 Nodes do not have to maintain compression state (stateless compression)
 Support (almost) arbitrary combinations of compressed / uncompressed header fields
Fragmentation Principles (RFC 4944)
 IPv6 packets to large to fit into a single 802.15.4 frame are fragmented
 A first fragment carries a header that includes the datagram size (11 bits) and a
datagram tag (16 bits)
 Subsequent fragments carry a header that includes the datagram size, the datagram
tag, and the offset (8 bits)
 Time limit for reassembly is 60 seconds
98

Routing Layer Protocol
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
99

How “Lossy” is Lossy ?
 LLN Link Characteristics:
 High BER
 Frequency packet drops
 High instability
 LLN failures are frequent and
usually transient
100

Routing Protocol for Low-power Lossy Links (RPL): Key Highlights
RPL :
 Highly modular
 (Core + Additional) modules
 Designed specifically for “lossy” networks
 Under-reacts to LLN link changes
 Agnostic to underlying link layer technology
 Is a proactive IPv6 distance vector protocol
 Builds a Destination Oriented Directed Acyclic Graph (DODAG) based on an objective
 Supports many-to-one, one-to-many, point-to-point communication
 Supports different LLN application requirements
 Urban (RFC 5548)
 Industrial (RFC 5673)
 Home (RFC 5826)
 Building (RFC 5867)
101

 RPL builds DODAGs
 DODAG: set of vertices connected by directed edges with no directed cycles
 In contrast to trees, DODAGs offer redundant paths
 RPL supports DODAGs instance
 Concept similar to multi-topology routing (MTR) as done in OSPF
 Allows a node to join multiple DODAGs according to different Objective Functions (OF)
 There can be multiple DODAGs within a RPL instance
 A node can, therefore, belong to multiple RPL instances
 Identifications:
 DODAG -> {RPLInstanceID}
 Unique identity of DODAG: {RPLInstanceID, DODAGID}
RPL: DODAG and Instances
102

RPL: DODAG and Instances
Traffic moves either up towards the DODAG root or down towards the DODAG leafs
DODAG Properties
 Many-to-one communication: upwards
 One-to-many communication: downwards
 Point-to-point communication: upwards-downwards
RPL Instance Properties
 RPL Instance has an optimization objective
 Multiple RPL Instances with different optimization objectives can coexist
A typical example would be an energy-efficient topology for background traffic along with
a low-latency topology for delay-sensitive alarms.
103

RPL: TerminologyRPL: Terminology
A node’s Rank defines the node’s individual position
relative to other nodes with respect to a DODAG root.
The scope of Rank is a DODAG Version.
Route Construction
 Up routes towards nodes of decreasing rank (parents)
 Down routes towards nodes of increasing rank
 Nodes inform parents of their presence and reachability to
descendants
 Source route for nodes that cannot maintain down routes
Forwarding Rules
 All routes go upwards and/or downwards along a
DODAG
 When going up, always forward to lower rank when
possible, may forward to sibling if no lower rank
exists
 When going down, forward based on down routes
Once a non-root node selects its parent set, it can
use the following table to covert the path cost of a
| Node/link Metric | Rank |
| Hop-Count | Cost |
| Latency | Cost/65536 |
| ETX | Cost |
104

RPL: Control Messages
DAG Information Object (DIO)
A DIO carries information that allows a node to discover an RPL Instance, learn its
configuration parameters and select DODAG parents
DAG Information Solicitation (DIS)
A DIS solicits a DODAG Information Object from an RPL node
Destination Advertisement Object (DAO)
A DAO propagates destination information upwards along the DODAG
105

RPL: DODAG Construction
Construction
 Nodes periodically send link-local multicast DIO messages
 Stability or detection of routing inconsistencies influence the rate of DIO messages
 Nodes listen for DIOs and use their information to join a new DODAG, or to maintain an
existing DODAG
 Nodes may use a DIS message to solicit a DIO
 Based on information in the DIOs the node chooses parents that minimize path cost to the
DODAG root
Essentially a distance vector routing protocol with ranks to prevent count-to-infinity problems
106

Application Layer Protocols
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IPv6 over
IEEE 802.15.4
IPv6 over
IEEE 802.15.4
CoAP
107

Constrained Application Protocol CoAP: Key Features
CoAP (RFC 7252):
 Web transfer protocol (coap://) for use with constrained nodes and networks
 Based on RESTful protocol design minimizing the complexity of mapping with HTTP
 Asynchronous transaction model
 Default bound to UDP, and optionally to DTLS
 Low header overhead and parsing complexity
 URI and content-type support
 Subset of MIME types and HTTP response codes
 Has GET, POST, PUT, DELETE methods
108

CoAP: Transaction Model
UDP DTLS …
CoAP
Message Sub-layer
Reliability
Request/Response Sub-layer
RESTful interaction
 Transport
 UDP ( + DTLS)
 Base Messaging
 Simple message exchange between endpoints
 Confirmable or Non-Confirmable message answered by Acknowledgment or Reset
message
 REST Semantics
 REST Request/Response piggybacked on CoAP messages
 Method, Response code and Options (URI, content-type, etc.,)
109

CoAP: Message Format
 Header (4 Bytes)
 Ver - Version (1)
 T – Message type (Confirmable, Non-Confirmable, Acknowledgment, Reset)
 TKL – Token length, if any, number of token bytes after the header
 Code – Request method (1-10), Response code (40-255)
 Message ID – Identifier for matching response
 Token (0-8 Bytes)
110

CoAP: Dealing with Packet Loss
112

Other Popular App Layer Protocols
113

Putting it all together …
114

A “High-Level” Technology Suite
115

“Last-mile” Communication Nuances

Lessons learnt from WSN deployments “at-scale” …
117

SEEDLING @ UNSW, Sydney
URL : http://cgi.cse.unsw.edu.au/~sensar/seedling/Seedling.html
Objective:
1. Show-case a basic prototype of a WSN System in precision agriculture
2. Understand sensornet deployment challenges
3. Increase the interest of high-school students in ICT
118

 Choosing a radio transceiver that gave low-power, long-range links
 A robust MAC protocol
 Simple network topology and planning
 Easy network reconfiguration
 Simple uniform data representation
 Early adoption of solar power for sensor networks
Factors CRITICAL to the SUCCESS of Deployments
Limited Energy Reserves
– PREMIUM Resource
Under MAC Control
119

These lessons are also RELEVANT today …
120

Low POWER Low ENERGY
Wireless Communication Links: Power is NOT Energy
POWER
TIME
E1
E2
 Message Passing / Time to Transmit
ALSO governs Energy
 Transmit it as a single long packet
 Easy to be corrupted
 Transmit as many independent packets
 Higher control overhead & longer delay
 Divide into fragments, but transmit all in burst 122

Wireless Communication Links: “Longer the Better”
Reduced hops help to obtain better PRR with lesser field devices
Configuration - 1
Configuration - 2
124

IP Adaptation
MAC
PHY
Routing
Transport
App
IP
A Routing Layer can be AVOIDED with Smart Network Planning
If a single hop
(with long link)
suffices the
purpose, then
a routing layer
may not be
required …
save ENERGY
IP Adaptation
MAC
PHY
Transport
App
IP
126

Long Power, Long Links are “GREY”
 Approximately 70% of low power, long range links are GREY
(i.e., neither good or bad)
 Very difficult to predict link behavior
128

Characterizing Low Power Links – Tx Variation
Tx power variation can happen … 7dB is a large variation
129

Characterizing Low Power Links – Rx Variation
Rx sensitivity variation … 130

Characterizing Low Power Links – Tx/Rx Dual Mode vs. Rx Only Mode
Power variation in Tx/Rx dual mode vs. Rx only mode
131

 Why do we need MAC ?
 Wireless channel is a shared medium
 Radios, within the communication range of each other and operating in the same
frequency band, interfere with each others transmission
 Interference -> Collision -> Packet Loss -> Retransmission -> Increase in net energy
 The role of MAC
 Co-ordinate access to and transmission over the common, shared (wireless) medium
 Can traditional MAC methods be directly applied to WSN ?
 Control -> often decentralized
 Data -> low load but convergecast communication pattern
 Links -> highly volatile/dynamic
 Nodes/Hops -> Scale is much larger
 Energy is the BIGGEST concern
 Network longetivity, reliability, fairness, scalability and latency
are more important than throughput
MAC is Crucial … Design/Choose it Carefully !!!
133

MAC Family
Reservation
(Scheduled, Synchronous)
Contention
(Unscheduled, Asynchronous)
 Reservation-based
 Nodes access the channel based on a schedule
 Examples: TDMA
 Limits collisions, idle listening, overhearing
 Bounded latency, fairness, good throughput (in loaded traffic conditions)
 Saves node power by pointing them to sleep until needed
 Low idle listening
 Dependencies: time synchronization and knowledge of network topology
 Not flexible under conditions of node mobility, node redeployment and node death:
complicates schedule maintenance
 Contention-based
 Nodes compete (in probabilistic coordination) to access the channel
 Examples: ALOHA (pure & slotted), CSMA
 Time synchronization “NOT” required
 Robust to network changes
 High idle listening and overhearing overheads
MAC Taxonomy
134

MAC: Reservation vs. Contention
135

Understand the Application’s Traffic Pattern
137

Some Concluding Remarks …
138

 Low power, long range communication is a very different ball game
compared to standard communication technologies.
 Many attributes that inherently are known to work in regular communications
will “shock you” in low-power communications.
 Take inspiration from the tons of WSN deployments that have studied these
artifacts rather than hypothesizing “again”.
139

“Last-mile” Existing and Upcoming Standards

141
Existing Stack using IEEE 802.15.4 as the PHY Layer

142
Popular IETF Stack for Field Devices: RFC Portfolio

143
Popular IETF Stack for Field Devices: Other RFC Portfolio

144
Thread Stack for Field Devices

“New” IETF Stack for Field Devices: +6TiSCH
145

IETF Deterministic Networking (DetNet)
146

Interoperability via Data Semantics: IEEE 1451 + IEEE 2700 ?
 The IEEE 1451 (TEDS) is a well established
standard in industrial automation to achieve
plug-n-play capability with the help of
electronic datasheets.
 TEDS is the electronic version of the data sheet
that is used to configure a sensor.
 TEDS bring forward the concept that if the data
sheet is electronic and can be readily accessed
upon sensor discovery, it would be possible to
configure the sensor automatically.
 This is analogous to the operation of plugging a
mouse, keyboard, or monitor in the computer
and using them without any kind of manual
configuration.
 TEDS enables self-configuration of the system
by self-identification and self-description of
sensors and actuators (i.e., plug-and-play).
 IEEE 2700 is a sensor calibration standard.
149

Derivatives for Intelligence

The Data to Knowledge Pipeline
Cyber & Physical Space Entities
Edge
Global Infra
Data Ingestion
Data Analysis
Applications
Data source
“Big” data Infra
“Little” data Infra
Decision making
with Knowledge
DATA @ REST (VOLUME)
Archival/Static data (TBs) in Data stores
DATA @ MOTION (VELOCITY)
Streaming data
DATA @ MANY FORMS (VARIETY)
Structured/Unstructured, Text, Multimedia, Audio, Video
DATA @ DOUBT (VERACITY)
Data with uncertainty that may be due to
incompleteness, missing points, etc.,
NATURE of INGESTED DATA
PRESCRIPTIVE
What are the best outcomes ?
PREDICTIVE
What could happen ?
DIAGNOSTIC
Why did this happen ?
DESCRIPTIVE
What has happened ?
NATURE of DATA ANALYSIS
151

Value
Hindsight and Insight/
Insights into the PAST
Foresight/
Insights into the FUTURE
Skill
Descriptive
“WHAT has
happened ? ”
Diagnostic
“WHY did this
happen ?”
Prescriptive
“WHAT should
we do ?”
Predictive
“WHAT could
happen ? ”
Information Optimization
Nature of Data Analysis
DASHBOARD
FORECAST ACTIONS,
RULES,
RECOMMs
152

Example: Energy Analysis for a PV Microgrid
Descriptive: What is the total energy, instantaneous energy and power, etc., …?
Diagnostic: Why is the panel temperature decreasing, when the solar irradiance is high and the wind
speed is very low ?
Predictive: Can I forecast the plant output for tomorrow, or can I generate 4kWh net energy ?
Prescriptive: What actions should be undertaken for the plant to reach 4kW energy generation capacity
from its current 2 kW ?
153

Example: Self Health Monitoring of Multi-rotor MAV
Descriptive: What is the total input power (voltage and current), thrust, vibration and ego-noise profiles,
and motor/propeller unit RPM ?
Diagnostic: Why is the THRUST not increasing with increasing RPM ?
Predictive: What is the success probability of the upcoming mission, given that flight and structural
health history ?
Prescriptive: What actions should be taken for increasing the success probability of the upcoming mission
from 75% to 90% ?
154

Machine/System Intelligence …
Depending on the type and quality of analytics, machines/systems could manifest themselves into:
 Informed Systems — Systems That Know/Aware
 Adaptive Systems — Systems That Learn
 Cognitive Systems — Systems That Reason and Plan
155

Deriving Machine Intelligence
 Reason and Plan (with Uncertain Knowledge)
 Probabilistic Reasoning:
 Bayesian Networks
 Conditional Distributions
 Probabilistic Reasoning over Time:
 Hidden Markov Models
 Kalman Filters
 Dynamic Bayesian Networks
 Simple Decisions:
 Utility Theory
 Decision Networks
 Expert Systems
 Complex decisions:
 Partial observable Markov Decision Process (MDP)
 Game Theoretic Models
 Learning
 Supervised | Semi-supervised | Unsupervised | Reinforcement
 Classification
 Regression
 Clustering
156

ML computational methods / algorithms :
 LEARN information directly from data, “without” relying on predetermined models
 FIND natural patterns in data, which help to generate insights for better decisions and
predictions
ML teaches Machines to do what “naturally” comes to
Humans and Animals
“LEARN from EXPERIENCE”
158

ML Techniques
SUPERVISED
Develop a predictive model,
based on evidence (both
input and output data)
UNSUPERVISED
Group and interpret data,
based only on input data
(without labels)
CLASSSIFICATION
Predicts discrete responses
(e.g., email: genuine vs. spam;
tumor: cancerous vs. benign)
REGRESSION
Predicts continuous responses
(e.g., changes in temperature;
fluctuations in power demand)
CLUSTERING
Finds hidden patterns or
groupings
(e.g., object recognition)
When to use ?
 When you want to train a model to make
a prediction.
 When you have existing <input, output>
data for response that you are trying to
predict.
When to use ?
 When you want to train a model to find a good
internal representation.
 When you want to explore your data; but
don’t yet have a specific goal, or are not sure
what information the data contains.
 When you want to reduce the dimensions of
your data.
When to use ?
 When you are working with
data that can be tagged or
categorized.
When to use ?
 When you are working with
data ranges, and want to
predict trends.
159

Selecting the Right Algorithm
ML TECHNIQUES
SUPERVISED UNSUPERVISED
CLASSIFICATION REGRESSION CLUSTERING
Support Vector
Machines
Discriminant
Analysis
Naive Bayes
Near Neighbor
Linear Regression
Ensemble Methods
Decision Trees
Neural Networks
K-Means, K-Medoids,
Fuzzy C-Means
Hierarchical
Gaussian Mixture
 Is it TRIAL and ERROR ?
 Is it Trade-off between:
 Speed of training
 Memory usage
 Predictive accuracy on new data
 Transparency / Interpretability
(how easily can you understand the reasons for an algorithm to make that prediction)
 Using larger training datasets often yield models that generalize well for new data
160

ML Workflow
Input
AcquiredData
(Sensor/Image/Video/Transactional)
Sub Goal:
Data
Representation
Sub Goal:
Preprocessing
Sub Goal:
Feature
Extraction
Sub Goal:
Build / Train
Model
Identify:
Good and Bad
Data Portions
Identify:
Missing
Samples/Values
Detect:
Outliers
Prepare:
Cross
Validation
Sub Goal:
Improve
Model
161

ML Workflow: Feature Derivation
 The number of features that could be derived is limited only by our imagination !!!
Sensor data
 Extract signal properties from raw sensor data
 Peak analysis (frequency, power, etc.,)
 Pulse and transition analysis (rise time, fall time, settling time, etc.,)
 Spectral analysis (power, bandwidth, frequency & its span, etc.,)
Image/Video data
 Extract features such as edge locations, resolution, color …
 Bag of visual words (create a histogram of local image features : edges, corners, blobs, etc.,)
 Histogram of oriented gradients
 Minimum eigenvalue (detect corner locations in images)
 Edge detection (identify points where the degree of brightness changes sharply)
Transactional data
 Calculate derived features that enhance the information in the data
 Time decomposition (break timestamps down into components such as day and month)
 Aggregate value calculation (create higher-level features such as total number of times a
particular event occurred)
162

Common Classification Algorithms …
163

How it Works ?
 Categorizes data points based on the classes of their nearest neighbors in the dataset (“guilty by
association”).
 Motivating insight: data points near to each other, tend to be similar.
 Non-parametric: does not make any assumptions regarding the distribution of data.
 Metric for near neighbor : Distance, either Euclidean (most popular), City block, Chebychev,
Correlation, Cosine, etc.
 Choose K to be ODD for clear majority
Best Used :
 When you want to use a method that does not have training phase (often called a lazy learner).
 When response time, memory and space are of lesser concern (need to store not just the
algorithm, but also the training data).
 When you want a less smarter algorithm, which can be fooled with irrelevant inputs (i.e., less
robust to noise).
 When you need a simple algorithm to establish benchmark learning rules.
k-Nearest Neighbor (kNN)
K = 15
Smoother, more defined boundaries
K = 1
164

Logistic Regression
How it Works ?
 Fits a model that can predict the probability of a binary response belonging to one class or the other.
Best Used :
 When the dependent variable is BINARY.
 When data can be clearly separated by a single, linear boundary.
 When a baseline is needed for evaluating more complex classification methods.
𝑦 =
1
1 + 𝑒− 𝛽0+ 𝛽1𝑥
x
y
x
y
165

Support Vector Machines
How it Works ?
 Classifies data by finding the linear decision boundary (hyperplane), which separates all data points
of one class from those of the other class.
 When the data is linearly separable:
 the best hyperplane is the one with the:
largest margin between the two classes.
 When the data is not linearly separable:
 use a kernel transform to transform
nonlinearly separable data into higher dimensions,
where a linear decision boundary can be found.
 use a loss function to penalize points on the
wrong side of the hyperplane.
Best Used :
 When data has exactly two classes.
 multiclass classification can be performed with a divide-and-conquer approach
 When data is complex, has high-dimensionality, and is nonlinearly separable.
 When data is limited.
 When you need a classifier that’s simple, easy to interpret, and accurate.
 When fast response is needed.
Support vector
166

Neural Networks
How it Works ?
 Consists of highly connected networks of neurons, which relate (map) the inputs to the desired
outputs.
 The network is trained by iteratively modifying the strengths (i.e., weights) of the connections so
that given inputs map to the correct response.
Best Used :
 When modeling highly nonlinear systems.
 When computation cost is of lesser concern.
 When model interpretability is not a key concern* (… however, there is work that can go to the
details of interpreting each layer and also suggesting how many neurons are needed; therefore, is
interpretable | It can also now handle time information ….)
 When there could be unexpected changes in your input data* (… for which the network has to be
deep with large number of neurons …)
167

Naïve Bayes
How it Works ?
 Based on Bayes Probability Theorem, it assumes that the presence of a particular feature in a class
is unrelated to the presence of any other feature.
 Classifies new data based on the highest probability of its belonging to a particular class.
c = HYPOTHESIS (class)
x = EVIDENCE (predictor variable / new data point)
P(c) = probability of the hypothesis before getting the evidence
P(c|x) = probability of the hypothesis after getting the evidence
Best Used :
 When assumption of feature independence holds TRUE; it can easily outperform other well known
techniques with lesser training data.
 When the model is expected to encounter scenarios that weren’t in the training data.
 When CPU and memory resources are a limiting factor* (… although for likelihood estimation, a
dataset is needed …).
 When you want a method that doesn’t overfit.
 When you want a method that can update itself with continuous new data.
 When you need a classifier that’s easy to interpret.
𝑃 𝑐 𝑥 =
𝑃 𝑥 𝑐 𝑃(𝑐)
𝑃(𝑥)
Posterior = Likelihood ratio x Prior
168

Discriminant Analysis
How it Works ?
 Classifies data by finding linear combinations.
 Assumes that different classes generate data based on Gaussian distributions.
 Training a discriminant analysis model involves finding the parameters for a Gaussian distribution for
each class.
 The distribution parameters are used to calculate boundaries, which can be linear or quadratic
functions; and these boundaries are used to determine the class of new data.
Best Used …
 When memory usage during training is a concern.
 When you need a model that is fast to predict.
 When you need a simple model that is easy to interpret.
169

Decision Trees
How it Works ?
 represents a procedure for classifying categorical data based on their attributes.
 decide which attribute to test at a node by determining the “best” way to separate (splitting‐point)
 pick the attribute that has the highest Information gain.
A decision tree for the concept buys_computer, indicating whether a customer at AllElectronics is likely to purchase a computer. Each internal (nonleaf)
node represents a test on an attribute. Each leaf node represents a class (either buys_computer = yes or buy_computers = no)
Best Used :
 When handling large datasets.
 When there is a need to ignore redundant variables, and handle missing data elegantly* (… missing
data should be small …).
 When memory usage needs to be minimized.
 When decision traceability is needed. 170

Bagged and Boosted Decision Trees
 How do Bagging and Boosting get N leaners ?
Trees are simple, but often produce noisy (bushy) or weak (stunted) classifiers.
In these ensemble methods, several “weaker” decision trees are combined into a “stronger” ensemble.
 Why are the data elements weighted ?
171

 How does the classification stage work ?
172

Similarities Differences
Both are ensemble methods to get N learners from 1
learner
Bagging: builds N leaners independently
Boosting: tries to add new models that do well where
previous models fail
Both generate several training data sets by random
sampling
Bagging: no weighting strategy
Boosting: determines weights for the data to tip the
scales in favor of the most difficult cases
Both make the final decision by averaging the N
learners (or taking the majority of them)
Bagging: an equally weighted average
Boosting: a weighted average (i.e., more weight to
those with better performance on training data)
Both are good at reducing variance and provide
higher stability
Bagging: may solve the over-fitting problem | may not
reduce bias
Boosting: may increase the overfitting problem | tries
to reduce bias
Best Used :
 When there is a need to minimize prediction variance
Boosting > Random Forests > Bagging > Single Tree
173

Common Regression Algorithms
174

Linear/Non-linear/Gaussian Process Regression
How it Works ?
 Describes a continuous response variable as a linear/non-linear/Gaussian process function.
Linear regression : Best Used …
 When you need an algorithm that is easy to interpret and fast to fit.
 When you need a baseline for evaluating other, more complex regression models.
Non-linear regression : Best Used …
 When data has strong nonlinear trends, and cannot be easily transformed into a linear space.
 When you need to fit custom models to the data.
Gaussian Process regression (Kriging) : Best Used …
 When interpolation needs to be performed in the presence of uncertainty.
Linear Regression Non-linear Regression Kriging
175

SVM Regression / Regression Tree
SVM Regression: How it Works ?
 Works the same as SVM classification algorithms, but is modified to be able to predict a
continuous response.
 Instead of finding a hyperplane that separates data, it finds a model that deviates from the
measured data by a value no greater than a small amount, with parameter values that are as
small as possible (to minimize sensitivity to error).
SVM Regression: Best Used :
 For high-dimensional data , where there will be a large number of predictor variables.
 When data is limited, and number of predictor variables are large
Regression Tree: How it Works ?
 Works the same as decision trees, but is modified to be able to predict a continuous response.
Regression Tree : Best Used :
 When predictors are categorical (discrete) or behave nonlinearly.
176

Common Clustering Algorithms
(Unsupervised Learning)
177

Clustering Analysis
Hard Clustering
Each data point
belongs to only
ONE cluster
Soft Clustering
Each data point
belong to MORE
than ONE cluster
Data grouping
is KNOWN
Data grouping
is UNKNOWN
Self Organizing Maps (SOM)
Hierarchical Clustering
 Search for possible clusters
 Use cluster evaluation to look for the
“best’ number of groups for a given
clustering algorithm
 Data is partitioned into groups (or
clusters) based on some measure of
similarity or shared characteristic.
 Clusters are formed so that objects in
the same cluster are very similar and
objects in different clusters are very
distinct.
178

Common Hard Clustering Algos: k-Means / k-Mediods
k-Means: How it Works ?
 Partitions data into k number of mutually exclusive clusters.
 The fitment of a point into a cluster is determined by the distance from that point to
the cluster’s center.
k-Mediods: How it Works ?
 Similar to k-means, but with the requirement that the cluster centers coincide with
points in the data.
Best Used :
 When the number of clusters is known.
 For fast clustering of categorical data
 To scale to large data sets
179

Hierarchical Clustering & SOM
Hierarchical: How it Works ?
 Produces nested sets of clusters by analyzing similarities between pairs of points and
grouping objects into a binary, hierarchical tree.
Hierarchical : Best Used :
 When advance knowledge of data clusters is missing
 When you want visualization to guide your selection
SOM: How it Works ?
 Neural-network based clustering that transforms a dataset into a topology-preserving
2D map
SOM: Best Used :
 To visualize high-dimensional data in 2D or 3D
180

Possible Modes with Unsupervised Learning
Data Clusters
Results
Lower Dimensional Data
Feature Selection
Supervised
Learning
Model
Large Data
Unsupervised
Learning
End goal is unsupervised learning Preprocessing step for supervised learning
181

Improving Models
 Model improvement in learning means:
 increasing its accuracy
 increasing predictive power
 preventing over-fitting (ambiguity between data and noise)
 increasing model parsimony
 Essentially, reduces errors in learning due to noise, bias and variance
Feature Selection
 Identifying the most relevant features, which provide the best predictive power.
 Could be done by: adding or removing features, which do not improve model performance.
Feature Transformation
 Recasting existing features into new features using techniques such as: principal component
analysis, nonnegative matrix factorization, and factor analysis.
Hyperparameter Tuning
 It is the process of identifying the set of parameters that provide the best model.
 It controls how a ML algorithm fits the model to the data.
A model is only as good as the features selected to train on !!!
183

Feature Selection
 Especially useful:
 when dealing with high-dimensional data
 when the dataset contains a large number of features and a limited number of
observations
 Reducing the feature space saves storage and computation time
 Makes the result easier to understand
Stepwise Regression
 Sequentially adding or removing features until there is no improvement in prediction accuracy.
Sequential Feature Selection
 Iteratively adding or removing predictor variables and evaluating the effect of each change on the
performance of the model.
Regularization
 Using shrinkage estimators to remove redundant features by reducing their weights (coefficients)
to zero.
Neighborhood Component Analysis (NCA)
 Finding the weight each feature has in predicting the output, so that the features with lower
weights can be discarded.
184

Feature Transformation
 Feature transformation is a form of dimensionality reduction
Principal Component Analysis (PCA)
 Performs a linear transformation on the data, so that most of the variance or information in your
high-dimensional dataset is captured by the first few principal components.
 The first principal component will capture the most variance, followed by the second principal
component, and so on.
Nonnegative Matrix Factorization
 Used when model terms must represent nonnegative quantities, such as physical quantities.
Factor Analysis
 Identifies underlying correlations between variables in the dataset to provide a representation in
terms of a smaller number of unobserved latent factors, or common factors.
 Shows the relationship between variables, so that variables (or features) that are not highly
correlated can be removed.
185

Feature Transformation & Hyper-parameter Tuning
 Begin by setting parameters based on a “best guess” of the outcome.
 Goal is to find the “best possible” values - that would yield the best model.
 As the parameters are adjusted and model performance begins to improve, a note has to be
made as to which parameters are effective and which still require tuning.
 Three common parameter tuning methods are:
 Bayesian optimization
 Grid search
 Gradient-based optimization
 Hyperparameter tuning is an iterative process
186

Choosing the Right Model ?
 Why is it so hard to get right?
 Each model has its own strengths and weaknesses in a given scenario.
 No established set of rules/guidelines.
 Closely tied to business case, and understanding of what needs to be accomplished.
 What can you do to choose the right model?
 How much data do you have and is it continuous?
 What type of data is it?
 What are you trying to accomplish?
 How important is it to visualize the process?
 How much detail do you need?
 Is storage a limiting factor?
 Is response time a limiting factor ?
 Is computation cost a limiting factor ?
187

Model Over-fitting
 Overfitting means that the model is so closely aligned to training data sets that it does
not know how to respond to new situations.
 Why is overfitting difficult to avoid?
 often the result of insufficient/inaccurate
information about the scenario.
 How do you avoid overfitting?
 using appropriate training data.
 training data needs to accurately reflect the complexity
and diversity of the data the model will be expected to work with.
 use regularization
 penalizes large parameters to help keep the model from relying too heavily on individual
data points and becoming too rigid
 control the smoothness of fit
 Has the form: [Error + λf(θ)], where f(θ) grows larger as the components of (θ) grow
larger and λ represents the strength of the regularization
 λ decides how much you want to protect against overfitting
 if λ=0, you aren’t looking to correct for overfitting at all
 perform model cross-validation
 partitions a dataset and uses a subset to train the algorithm and the remaining data for
testing
 common techniques: k-fold | holdout 188

Some Concluding Remarks …
189

A general rule-of-thumb:
 Training - to generate the MODEL - is an expensive operation
 Estimation - using the derived MODEL - is lightweight
Intelligence (derived through LEARNING) on Embedded systems:
 On-device training MAY NOT be a good strategy
 It may be better to offload it to a resourceful device
 On-device estimations using the derived model MAY be a good strategy
There are EXCEPTIONS to this rule !!!
 Sequential versions of many commonly used learning algorithms have
been developed (K-means, etc.), and are part of the stream
processing suite.
190

Acknowledgment and References
This short course on IoT has been compiled from various online resources,
text books, and research papers on this topic.
While Prasant may not be able to “correctly” recollect the right sources, he –
nevertheless – requests all viewers to drop a note, if they come across any
discrepancies in this regard.
1. MAC Essentials :
A.Bachir, M. Dohler, T. Watteyne and K. K. Leung, "MAC Essentials for
Wireless Sensor Networks," in IEEE Communications Surveys & Tutorials, vol.
12, no. 2, pp. 222-248, Second Quarter 2010.
2. Machine Learning :
https://in.mathworks.com/campaigns/products/offer/machine-learning-
with-matlab.html

W: https://sites.google.com/site/prasantmisra
W: https://in.linkedin.com/in/prasantmisra

A Short Course on the Internet of Things

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a A Short Course on the Internet of Things

Semelhante a A Short Course on the Internet of Things (20)

Mais de Prasant Misra

Mais de Prasant Misra (7)

Último

Último (20)

A Short Course on the Internet of Things