9. Trend-II: Integration at Scale (Isolation has cost !!!)
(World Wide) Sensor Web
(Feng Zhao)
Future Combat Systems
Ubiquitous embedded devices
• Large scale network embedded systems
• Seamless integration with the physical environment
Complex system with global integration
9
10. Trend-III: Evolution: Man vs. Machine
The exponential proliferation of embedded devices (courtesy of Moore’s Law) is NOT
matched by a corresponding increase in human ability to consume information !
Increase in Machine Autonomy !!! 10
12. Confluence of Technologies
CPS
Trend-1:
Sensing &
Actuation
Trend-3:
Computation &
Control
Trend-2:
Communication &
Networking
A cyber-physical system (CPS) refers to a tightly integrated system that is engineered with a
collection of technologies, and is designed to drive an application in a principled manner.
12
14. Enormous SCALE : both in space and time
Functional Blocks of CPS
14
15. Casting CPS Technology into Application Requirement
Use Case: Adaptive Lighting in Road Tunnels
Problem: Control the tunnel lighting levels in a manner that ensures continuity of light conditions
from the outside to the inside (or vice-versa) such that drivers do not perceive the tunnel as too
bright or dark.
Solution: Design a system that is able to account for the change in light intensity (i.e., detect physical
conditions and interpret), and adjust the illumination levels of the tunnel lamps (i.e., respond) till a
point along the length of the tunnel where this change is indiscernible to the drivers (i.e., reason and
control in an optimal manner). 15
16. Casting CPS Technology into Application Requirement
Use Case: Smart Buildings/Homes
Problem: How to make buildings/homes (both new and existing) ‘smarter’ ?
• Energy efficient
• Damage prevention
• Increased comfort
16
17. Beaming from CPS to IoT : The SCALE is even BIGGER !!!
C1 C2 Cn
P1 P2 Pn
CPS
Internet
CyberworldPhysicalworld
NoT
IoT = CPS + People ‘in-the-loop’ (that act as sensors, actuators, controllers)
IoT = CPS + Hybrid (tight and loose) sense of control 17
18. CPS & IoT
Gives us the ability to look more broadly (SCALE), deeply (PRECISION) and
over extended periods of time at the physical world
As a result, our interactions with the physical world has increased !!!
Example of a Killer APP: Navigation System
18
19. Navigation System - I
Context Service Example
Current
Location
Local business
19
21. Context Service Example
Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
Take 520 East
Navigation System - III
21
22. Context Service Example
Current
Location
Local business and
directions
+
Time Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
35% people pick
the scenic route
Navigation System - IV
22
23. Alert: Bad
Traffic
Consider
Alternate
route
Context Service Example
Current
Location
Local business and
directions
Tracks
Businesses in
driving direction
+
History
Personalized
directions
+
Community
Tourist
recommendation
+
Push
alerts, triggers,
reminders
Navigation System - V
23
25. IoT: Vision and Value Proposition
Vision:
Build a ubiquitous society where everyone (“people”) and everything (“systems,
machines, equipment and devices") is immersively connected.
Value Proposition:
Connected “Things” will provide utility to “People”
Digital shadow of “People” will provide value to the “Enterprise”
25
28. The FORTUNE TELLER or NOT …
IIoT and Industry 4.O is ALL about re-imagination !!!
Improve flexibility, reliability and time to market/scale
Improve customer intimacy and profitability
Improve revenue and market position
28
29. Is the Internet of Things disruptive?
OR
Are they repackaging known technologies
and making them a little better?
What is your take ?
29
31. High-level Functional Architecture
DATA @ REST (VOLUME)
Archival/Static data (TBs) in Data
stores
DATA @ MOTION (VELOCITY)
Streaming data
DATA @ MANY FORMS (VARIETY)
Structured/Unstructured, Text,
Multimedia, Audio, Video
DATA @ DOUBT (VERACITY)
Data with uncertainty that may be
due to incompleteness, missing
points, etc.,
PRESCRIPTIVE
What are the best outcomes ?
PREDICTIVE
What could happen ?
DESCRIPTIVE
What has happened ?
DISCOVERY
What do we have ?
NATURE of INGESTED DATA
NATURE of ANALYSIS
DATA
KNOWLEDGE
31
33. Functional Architecture Layers and their Key Physical Attributes
Gateway
Physical Attribute
Field Devices
(with Sensing, Compute and
Actuation HW)
Functionality
Sense
Actuate
Control
Physical Attribute
Last-mile connectivity
PAN, HAN, FAN, NAN, CAN,
WAN, etc.,
Functionality
Connection
Management
Routing
Physical Attribute
Data Storage
Functionality
Ingestion
Semantics
Transformation
Functionality
Interoperability
Security
Access Control
Functionality
Business Logic
Orchestration
Functionality
Input
Output
Transform
Physical Attribute
Common Service Functions
Physical Attribute
Business Logic & Related
Functions
Physical Attribute
Users
33
34. Recap : Functional Architecture
Service Oriented
Approach
Application & Business Architecture
Describing the service strategy, the organizational,
functional process information and geographic aspects of
the environment based on the strategic goals and
strategic drivers
Information Systems Architecture
Describing information/data structure and semantics,
types and sources of data necessary to support various
smart application
Data Access Architecture
Describing technical components (software, hardware),
access technology and data aggregation policies
Information Security
characterized by:
• Availability
• Integrity
• Confidentiality
Interoperability
characterized by:
• Syntactic
• Semantics
34
41. Consist of many embedded units called sensor nodes, motes etc. ,.
Sensors (and actuators)
Small microcontroller
Limited memory
Radio for wireless communication
Power source (often battery)
Communication centric systems
Motes form networks, and in a one hop or multi-hop fashion transport sensor data to
base station
Background: Wireless Sensor Networks (WSN)
41
43. WSN Node: Core Features
Limited Energy Reserves
– PREMIUM resource
Under MAC Control
(bit) RISC
KBytes
KBytes
(bit)
KBytes
#
#
43
44. Sensor Web: Field Device Stack
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
Do we need a LAYERED approach @
the Field Device level ?
44
46. (Popular) Short and Medium Range Low Power Wireless Technology
Technology Standard Body
Frequency
Band
Max
Range
Max
Data Rate
Max
Power
Network
Type
Bluetooth Bluetooth SIG 2.4 GHz ISM 100 m 1-3 Mbps 1 W WPAN
Bluetooth
Smart
IoT Interconnect 2.4 GHz ISM 35 m 1 Mbps 10 mW WPAN
ZigBee
IEEE 802.15.4,
Zigbee Alliance
2.4 GHz ISM 160 m 250 Kbps 100 mW Star, Mesh
Wi-Fi
IEEE 802.11
g/n/ac/ad
2.4/5/60 GHz 100 m
6-780 Mbps,
6 Gbps @ 60
GHz
1 W
Star, Mesh
Zwave Zwave 908 MHz 30 m 100 Kbps 1 mW
Star, Mesh
ANT+ ANT Alliance 2.4 GHz 100 m 1 Mbps 1 mW
Star, Mesh
Rubee
IEEE 1902.1, IEEE
1902.2
131 kHz 5 m 1.2 Kbps 40-50 nW P2P
46
47. Low Power Wide Area Networking Technology
Technology
Standards/
Governing
Body
Frequency
Band
Max Range
Max
Data Rate
Topology
Devices /
Access
Point
Weightless -
SubGHz ISM,
TV Whitespaces
2-5 k (urban)
200 bps –
100 Kbps,
W: 1 Kbps –
10 Mbps
Star Unlimited
LoraWAN LoRa Alliance
433/780/868/9
15 MHz ISM
2.5 -15 km
0.3 – 50
Kbps
Star 1 million
SigFox SigFox
Ultra narrow
Band
30-50 km
(rural), 3-10
km (urban)
100 bps Star 1 million
WiFi LowPower
IEEE
P802.11ah
SubGHz
1 km
(outdoor)
150 - 340
kbps
Start, Tree
-
Dash7 Dash7 Alliance
433/868/915
MHz
2 km
9.6/56/167
Kbps
Star, Tree
-
LTE-Cat 0 3GPP R-13 Cellular 2.5 -5 km 200 kbps Start
> 20,000
UMTS (3G),
HSDPA / HSUPA
3GPP Cellular
27 km, 10
km
0.73 - 56
Mbps
Star
Hundreds
per cell
47
56. IEEE 802.15.4: Quick Facts
IEEE 802.15.4
Offers physical and media access control layers for low-speed, low-power wireless personal
area networks (WPANs)
16 non-overlapping channels, spaced 5 MHz apart; and occupy frequencies 2405-2480 MHz
Provides a physical layer bandwidth of 250kbps
Shares the same frequency band as IEEE 802.11 and Bluetooth
56
58. IEEE 802.15.4: Device Classes
Full Function Device (FFD)
Any topology
PAN coordinator capable
Talks to any other device
Implements complete protocol set
Reduced Function Device (RFD)
Reduced protocol set
Very simple implementation
Cannot become a PAN coordinator
Limited to leafs in more complex topologies
58
59. IEEE 802.15.4: Topology Types
Star Topology
All nodes communicate via the central PAN coordinator
Leafs may be any combination of FFD and RFD devices
PAN coordinator is usually having a reliable power source
Peer-to-Peer Topology
Nodes can communicate via the central PAN coordinator
and via additional point-to-point links
Extension of the pure star topology
Cluster Tree Topology
Leafs connect to a network of coordinators (FFDs)
One of the coordinators serves as the PAN coordinator
Clustered star topologies are an important case
(e.g., each hotel room forms a star in a HVAC system)
59
61. IEEE 802.15.4: Frame Formats
Beacon Frames
Broadcasted by the coordinator to organize the network
Command Frames
Used for association, disassociation, data and beacon requests, conflict notification, . . .
Data Frames
Carrying user data
Acknowledgement Frames
Acknowledges successful data transmission (if requested)
61
62. Link Layer Protocols L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IEEE 802.15.4 IEEE 802.15.4
62
63. Why do we need MAC ?
Wireless channel is a shared medium
Radios, within the communication range of each other and operating in the same
frequency band, interfere with each others transmission
Interference -> Collision -> Packet Loss -> Retransmission -> Increase in net energy
The role of MAC
Co-ordinate access to and transmission over the common, shared (wireless) medium
Can traditional MAC methods be directly applied to WSN ?
Control -> often decentralized
Data -> low load but convergecast communication pattern
Links -> highly volatile/dynamic
Nodes/Hops -> Scale is much larger
Energy is the BIGGEST concern
Network longetivity, reliability, fairness, scalability and latency
are more important than throughput
MAC is Crucial !!!
63
64. MAC Family
Reservation
(Scheduled, Synchronous)
Contention
(Unscheduled, Asynchronous)
Reservation-based
Nodes access the channel based on a schedule
Examples: TDMA
Limits collisions, idle listening, overhearing
Bounded latency, fairness, good throughput (in loaded traffic conditions)
Saves node power by pointing them to sleep until needed
Low idle listening
Dependencies: time synchronization and knowledge of network topology
Not flexible under conditions of node mobility, node redeployment and node death:
complicates schedule maintenance
Contention-based
Nodes compete (in probabilistic coordination) to access the channel
Examples: ALOHA (pure & slotted), CSMA
Time synchronization “NOT” required
Robust to network changes
High idle listening and overhearing overheads
Taxonomy
64
66. Collisions
Node(s) is/are within the range of nodes that are transmitting at the same time -> retransmissions
Overhearing
The receiver of a packet is not the intended receiver of that packet
Overhead
Arising from control packets such as RTS/CTS
E.g.: exchange of RTS/CTS induces high overheads in the range of 40-75% of the channel capacity
Idle Listening
Listening to possible traffic that is not sent
Most significant source of energy consumption
Function Protocols
Reduce Collisions CSMA/CA, MACA, Sift
Reduce Overheads CSMA/ARC
Reduce Overhearing PAMAS
Reduce Idle Listening PSM
Causes of Energy Consumption
66
67. Low-power, Constrained Field Devices MAC Family
Scheduled
(periodic, high-load traffic)
Common Active Periods
(medium-load traffic)
Preamble Sampling
(rare reporting events)
67
68. Build a schedule for all nodes
Time schedule
no collisions
no overhearing
minimized idle listening
bounded latency, fairness, good throughput (in loaded traffic conditions)
BUT: how to setup and maintain the schedule ?
Function Protocols
Canonical Solution TSMP, IEEE 802.15.4
Centralized Scheduling Arisha, PEDAMACS, BitMAC, G-MAC
Distributed Scheduling SMACS
Localization-based Scheduling TRAMA, FLAMA, uMAC, EMACs, PMAC
Rotating Node Roles PACT, BMA
Handling Node Mobility MMAC, FlexiMAC
Adapting to Traffic Changes PMAC
Receiver Oriented Slot Assignment O-MAC
Using different frequencies PicoRadio, Wavenis, f-MAC, Multichannel LMAC,
MMSN, Y-MAC, Practical Multichannel MAC
Other functionalities LMAC, AI-LMAC, SS-TDMA, RMAC
Scheduled MAC Protocols
68
69. Time Synchronized Mesh Protocol (TSMP): Overview
Goal: High end-to-end reliability
Major Components
time synchronized communication (medium access)
TDMA-based: uses timeslots and time frames
Synchronization is achieved by exchanging offset information (and not by
beaconing strategies)
frequency hopping (medium access)
automatic node joining and network formation (network)
redundant mesh routing (network)
secure message transfer (network)
Limitations
Complexity in infrastructure-less
networks
Scaling is a challenge
Finding a collision free
schedule is a two-hop
coloring problem
Reduced flexibility to adapt to
dynamic topologies 69
70. Nodes define common active/sleep periods
active period -> communication, where nodes contend for the channel
sleep period -> saving energy
need to maintain a common time reference across all nodes
Function Protocols
Canonical Solution SMAC
Increasing Flexibility TMAC, E2MAC, SWMAC
Minimizing Sleep Delay Adaptive listening, nanoMAC, DSMAC, FPA, DMAC, Q-MAC
Handling Mobility MSMAC
Minimizing Schedules GSA
Statistical Approaches RL-MAC, U-MAC
Using Wake-up Radio RMAC, E2RMAC
Common Active Period MAC Protocols
70
71. Goal: reduce energy consumption, while supporting good scalability and collision
avoidance
Major Components
periodic listen and sleep
Copes with idle listening: uses a scheme of active (listen) and sleep periods
Active periods are fixed; Sleep periods depend on a predefined duty-cycle param
Synchronization is used to form virtual clusters of nodes on the same sleep schedule
Schedules coordinate nodes to minimize additional latency
collision and overhearing avoidance
Adopts a contention-based scheme
In-channel signaling is used to put each node to sleep when its neighbor is transmitting to
another node; thus, avoids the overhearing problem but does not require an additional
channel
message passing
Small packets transmitted in bursts
RTS/CTS reserves the channel for the whole burst duration rather than for each packet;
hence unfair from a per-hop MAC level
Sensor MAC (S-MAC): Overview
71
72. Periodic Listen and Sleep
Each node goes to sleep for some time, and then wakes up and listens to see if any other
node wants to talk to it. During sleep, the node turns off its radio, and sets a timer to awake
itself later.
Maintain Schedules
Maintain Synchronization
S-MAC - I
72
73. Collision and Overhearing Avoidance
Adopts a contention based scheme
Collision Avoidance
Overhearing Avoidance
Basic Idea
A node can go to sleep whenever its neighbor is talking with another node
Who should sleep?
The immediate neighbors of sender and receiver
How to they know when to sleep?
By overhearing RTS or CTS
Hog long should they sleep?
Network Address Vector (NAV)
Message Passing
How to transmit a long message?
Transmit it as a single long packet
Easy to be corrupted
Transmit as many independent packets
Higher control overhead & longer delay
Divide into fragments, but transmit all in burst
S-MAC - II
73
74. Adaptive duty cycle: duration of the active period is no longer fixed but varies according
to traffic
Prematurely ends an active period if no traffic occurs for a duration of TA
Timeout MAC (TMAC): Overview
74
75. Goal: minimize idle listening -> minimize energy consumption
Operation
Node periodically wakes up, turns radio on and checks channel
Wakeup time fixed (time spend sampling RSSI?)
“Check interval” variable
If energy is detected, node powers up in order to receive the packet
Node goes back to sleep
If a packet is received
After a timeout
Preamble length matches channel “checking interval”
No explicit synchronization required
Noise floor estimation used to detect channel activity during LPL
Preamble Sampling MAC Protocols
75
76. Function Protocols
Canonical Solution Preamble-Sampling ALOHA, Preamble-Sampling CSMA, Cycled
Receiver, LPL, Channel polling
Improving CCA BMAC
Adaptive Duty Cycle EA-ALPL
Reducing Preamble Length by
Packetization
X-MAC, CSMA-MPS, TICER, WOR, MH-MAC, DPS-MAC, CMAC,
GeRAF, 1-hopMAC, RICER, SpeckMAC-D, MX-MAC
Reducing Preamble Length by
Piggybacking Synchronization
Information
WiseMAC, RATE EST, SP, SyncWUF
Use Separate Channels STEM
Avoiding Unnecessary
reception
MFP, 1-hopMAC
Drawbacks:
Costly collisions
Longer preamble leads to higher probability of collision in applications with considerate traffic
Limited duty cycle
“Check interval” period cannot be arbitrarily increased -> longer preamble length
Overhearing problem
The target receiver has to wait for the full preamble before receiving the data packet: the per-
hop latency is lower bounded by the preamble length. Over a multi-hop path, this latency can
accumulate to become quite substantial.
Preamble Sampling MAC Protocols
76
77. Goals:
Simple and predictable; Effective collision avoidance by improving CCA
Tolerable to changing RF/networking conditions
Low power operation; Scalable to large numbers of nodes; Small code size and RAM usage
CCA
MAC must accurately determine if channel is clear
Need to tell what is noise and what is a signal
Ambient noise is prone to environmental changes
BMAC solution: ‘software automatic gain control’
Signal strength samples taken when channel is assumed to be free – When?
immediately after transmitting a packet
when the data path of the radio stack is not receiving valid data
Samples go in a FIFO queue (sliding window)
Median added to an EWMA (exponentially weighted moving average with decay α) filter
Once noise floor is established (What is a good estimate?), a TX requests starts monitoring
RSSI from the radio
CCA: Thresholding vs. Outlier Detection
Common approach: take single sample, compare to noise floor
Large number of false negatives
BMAC: search for outliers in RSSI
If a sample has significantly lower energy than the noise floor during the sampling period, then
channel is clear
Berkeley MAC (BMAC): Overview
77
78. 0=busy, 1=clear
Packet arrives between 22 and 54 ms
Single-sample thresholding produces several false ‘busy’ signals
BMAC
78
79. Series of short preamble packets each containing target address information
Minimize overhearing problem
Reduce latency and reduce energy consumption
Strobed preamble: pauses in the series of short preamble packets
Target receiver can shorten the strobed preamble via an early ACK
Small pauses between preamble packets permit the target receiver to send an early ACK
Reduces latency for the case where destination is awake before preamble completes
Non-target receivers that
overhear the strobed preamble
can go back to sleep immediately
Preamble period must
be greater than sleep period
Reduces per-hop latency and energy
XMAC: Overview
79
80. Wireless Sensor (Wise) MAC: Overview
WiseMAC uses a scheme that learns the sampling schedule of direct neighbors and exploits
this knowledge to minimize the wake-up preamble length
ACK packets, in addition to a carrying the acknowledgement for a received data packet, also have
information about the next sampling time of that node
Node keeps a table of the sampling time offsets of all its usual destinations up-to-date
Node transmits a packet just at the right time, with a wake-up preamble of minimized size
80
81. Wireless Sensor (Wise) MAC: I
How does the system cope with Clock drifts ?
Clock drifts may make the transmitter lose accuracy about the receiver’s wakeup time.
Transmitter uses a preamble that is just long enough to make up for the estimated maximum clock
drift.
The length of the preamble used in this case depends on clock drifts: the smaller the clock drift, the
shorter the preamble the transmitter has to use.
What if the node has no information about the wakeup time of a neighbor node ?
Node uses a full-length preamble
81
82. Function Protocols
Flexible MAC Structure IEEE 802.15.4
CSMA inside TDMA Slots ZMAC
Minimizing Convergecast Effect Funneling MAC, MH-MAC
Slotted and Sampling SCP
Receiver based Scheduling Crankshaft
Hybrid Protocols
82
84. IEEE 802.15.4 MAC: Overview
Two different channel access methods
Beacon-Enabled duty-cycled mode (typically, used in FFD networks)
Non-Beacon Enabled mode (aka Beacon Disabled mode)
84
85. IEEE 802.15.4 Beacon Enabled Mode
CAP: Contention Access Period | CFP: Collision Free Period | GTS: Guaranteed Time Slot
Node listen to Beacon and check IF GTS is reserved
If YES: remain powered off until GTS is scheduled
If NO: Performs CSMA/CA during CAP
Synchronization
Sync with Tracking Mode
Sync with Non Tracking Mode
85
90. IP over IEEE 802.15.4
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IPv6 over
IEEE 802.15.4
IPv6 over
IEEE 802.15.4
90
91. Field Devices: Network Topology Planning
STAR topologies are the easiest to setup and manage
STAR will simply the network design, and if there is just 1-hop communication between
the field devices and gateway, then the need for the "routing layer" on the stack of the
field devices many not arise ... thereby making it more energy efficient and lightweight.
TREE and MESH are also interesting concepts, but they are very tedious to manage. 91
92. IPv6 over IEEE 802.15.4 (6LoWPAN)
Benefits of IP over 802.15.4 (RFC 4919)
The pervasive nature of IP networks allows use of existing infrastructure
IP-based technologies already exist, are well-known, and proven to be working
Open and freely available specifications vs. closed proprietary solutions
Tools for diagnostics, management, and commissioning of IP networks already exist
IP-based devices can be connected readily to other IP-based networks, without the need
for intermediate entities like translation gateways or proxies
92
93. 6LoWPAN Challenge
Header Size Calculation
IPv6 header is 40 octets, UDP header is 8 octets
802.15.4 MAC header can be up to:
25 octets (null security)
25+21=46 octets (AES-CCM-128)
With the 802.15.4 frame size of 127 octets, the following space left for application data:
127-25-40-8 = 54 octets (null security)
127-46-40-8 = 33 octets (AES-CCM-128)
IPv6 MTU Requirements
IPv6 requires that links support an MTU of 1280 octets
Link-layer fragmentation / reassembly is needed
93
94. 6LoWPAN Overview (RFC 4944)
Overview
An adaptation layer allowing transport of IPv6 packets over 802.15.4 links
Uses 802.15.4 in unslotted CSMA/CA
Based on IEEE standard 802.15.4-2003
Fragmentation / reassembly of IPv6 packets
Compression of IPv6 and UDP/ICMP headers
Mesh routing support (mesh under)
Low processing / storage costs
94
95. 6LoWPAN Dispatch Codes
All 6LoWPAN encapsulated datagrams are prefixed by an encapsulation header stack
Each header in the stack starts with a header type field followed by zero or more
header fields
95
96. 6LoWPAN Frame Formats
Uncompressed IPv6/UDP (worst case scenario)
Dispatch code (010000012) indicates no compression
Up to 54 / 33 octets left for payload with a max. size MAC header with null / AES-CCM-128
security
The relationship of header information to application payload is obviously really bad
96
97. 6LoWPAN Frame Formats
Compressed Link-local IPv6/UDP (best case scenario)
Dispatch code (010000102) indicates HC1 compression
HC1 compression may indicate HC2 compression follows
This shows the maximum compression achievable for link-local addresses (does not work
for global addresses)
Any non-compressible header fields are carried after the HC1 or HC1/HC2 tags (partial
compression)
97
98. Header Compression, Fragmentation & Reassembly
Compression Principles (RFC 4944)
Omit any header fields that can be calculated from the context, send the remaining
fields unmodified
Nodes do not have to maintain compression state (stateless compression)
Support (almost) arbitrary combinations of compressed / uncompressed header fields
Fragmentation Principles (RFC 4944)
IPv6 packets to large to fit into a single 802.15.4 frame are fragmented
A first fragment carries a header that includes the datagram size (11 bits) and a
datagram tag (16 bits)
Subsequent fragments carry a header that includes the datagram size, the datagram
tag, and the offset (8 bits)
Time limit for reassembly is 60 seconds
98
100. How “Lossy” is Lossy ?
LLN Link Characteristics:
High BER
Frequency packet drops
High instability
LLN failures are frequent and
usually transient
100
101. Routing Protocol for Low-power Lossy Links (RPL): Key Highlights
RPL :
Highly modular
(Core + Additional) modules
Designed specifically for “lossy” networks
Under-reacts to LLN link changes
Agnostic to underlying link layer technology
Is a proactive IPv6 distance vector protocol
Builds a Destination Oriented Directed Acyclic Graph (DODAG) based on an objective
Supports many-to-one, one-to-many, point-to-point communication
Supports different LLN application requirements
Urban (RFC 5548)
Industrial (RFC 5673)
Home (RFC 5826)
Building (RFC 5867)
101
102. RPL builds DODAGs
DODAG: set of vertices connected by directed edges with no directed cycles
In contrast to trees, DODAGs offer redundant paths
RPL supports DODAGs instance
Concept similar to multi-topology routing (MTR) as done in OSPF
Allows a node to join multiple DODAGs according to different Objective Functions (OF)
There can be multiple DODAGs within a RPL instance
A node can, therefore, belong to multiple RPL instances
Identifications:
DODAG -> {RPLInstanceID}
Unique identity of DODAG: {RPLInstanceID, DODAGID}
RPL: DODAG and Instances
102
103. RPL: DODAG and Instances
Traffic moves either up towards the DODAG root or down towards the DODAG leafs
DODAG Properties
Many-to-one communication: upwards
One-to-many communication: downwards
Point-to-point communication: upwards-downwards
RPL Instance Properties
RPL Instance has an optimization objective
Multiple RPL Instances with different optimization objectives can coexist
A typical example would be an energy-efficient topology for background traffic along with
a low-latency topology for delay-sensitive alarms.
103
104. RPL: TerminologyRPL: Terminology
A node’s Rank defines the node’s individual position
relative to other nodes with respect to a DODAG root.
The scope of Rank is a DODAG Version.
Route Construction
Up routes towards nodes of decreasing rank (parents)
Down routes towards nodes of increasing rank
Nodes inform parents of their presence and reachability to
descendants
Source route for nodes that cannot maintain down routes
Forwarding Rules
All routes go upwards and/or downwards along a
DODAG
When going up, always forward to lower rank when
possible, may forward to sibling if no lower rank
exists
When going down, forward based on down routes
Once a non-root node selects its parent set, it can
use the following table to covert the path cost of a
| Node/link Metric | Rank |
| Hop-Count | Cost |
| Latency | Cost/65536 |
| ETX | Cost |
104
105. RPL: Control Messages
DAG Information Object (DIO)
A DIO carries information that allows a node to discover an RPL Instance, learn its
configuration parameters and select DODAG parents
DAG Information Solicitation (DIS)
A DIS solicits a DODAG Information Object from an RPL node
Destination Advertisement Object (DAO)
A DAO propagates destination information upwards along the DODAG
105
106. RPL: DODAG Construction
Construction
Nodes periodically send link-local multicast DIO messages
Stability or detection of routing inconsistencies influence the rate of DIO messages
Nodes listen for DIOs and use their information to join a new DODAG, or to maintain an
existing DODAG
Nodes may use a DIS message to solicit a DIO
Based on information in the DIOs the node chooses parents that minimize path cost to the
DODAG root
Essentially a distance vector routing protocol with ranks to prevent count-to-infinity problems
106
107. Application Layer Protocols
L2: MAC
L4: ROUTING
L5: TRANSPORT
L6: APP
L3: NETWORK
L1: PHY
IPv6 over
IEEE 802.15.4
IPv6 over
IEEE 802.15.4
CoAP
107
108. Constrained Application Protocol CoAP: Key Features
CoAP (RFC 7252):
Web transfer protocol (coap://) for use with constrained nodes and networks
Based on RESTful protocol design minimizing the complexity of mapping with HTTP
Asynchronous transaction model
Default bound to UDP, and optionally to DTLS
Low header overhead and parsing complexity
URI and content-type support
Subset of MIME types and HTTP response codes
Has GET, POST, PUT, DELETE methods
108
109. CoAP: Transaction Model
UDP DTLS …
CoAP
Message Sub-layer
Reliability
Request/Response Sub-layer
RESTful interaction
Transport
UDP ( + DTLS)
Base Messaging
Simple message exchange between endpoints
Confirmable or Non-Confirmable message answered by Acknowledgment or Reset
message
REST Semantics
REST Request/Response piggybacked on CoAP messages
Method, Response code and Options (URI, content-type, etc.,)
109
110. CoAP: Message Format
Header (4 Bytes)
Ver - Version (1)
T – Message type (Confirmable, Non-Confirmable, Acknowledgment, Reset)
TKL – Token length, if any, number of token bytes after the header
Code – Request method (1-10), Response code (40-255)
Message ID – Identifier for matching response
Token (0-8 Bytes)
110
118. SEEDLING @ UNSW, Sydney
URL : http://cgi.cse.unsw.edu.au/~sensar/seedling/Seedling.html
Objective:
1. Show-case a basic prototype of a WSN System in precision agriculture
2. Understand sensornet deployment challenges
3. Increase the interest of high-school students in ICT
118
119. Choosing a radio transceiver that gave low-power, long-range links
A robust MAC protocol
Simple network topology and planning
Easy network reconfiguration
Simple uniform data representation
Early adoption of solar power for sensor networks
Factors CRITICAL to the SUCCESS of Deployments
Limited Energy Reserves
– PREMIUM Resource
Under MAC Control
119
122. Low POWER Low ENERGY
Wireless Communication Links: Power is NOT Energy
POWER
TIME
E1
E2
Message Passing / Time to Transmit
ALSO governs Energy
Transmit it as a single long packet
Easy to be corrupted
Transmit as many independent packets
Higher control overhead & longer delay
Divide into fragments, but transmit all in burst 122
124. Wireless Communication Links: “Longer the Better”
Reduced hops help to obtain better PRR with lesser field devices
Configuration - 1
Configuration - 2
124
126. IP Adaptation
MAC
PHY
Routing
Transport
App
IP
A Routing Layer can be AVOIDED with Smart Network Planning
If a single hop
(with long link)
suffices the
purpose, then
a routing layer
may not be
required …
save ENERGY
IP Adaptation
MAC
PHY
Transport
App
IP
126
128. Long Power, Long Links are “GREY”
Approximately 70% of low power, long range links are GREY
(i.e., neither good or bad)
Very difficult to predict link behavior
128
129. Characterizing Low Power Links – Tx Variation
Tx power variation can happen … 7dB is a large variation
129
133. Why do we need MAC ?
Wireless channel is a shared medium
Radios, within the communication range of each other and operating in the same
frequency band, interfere with each others transmission
Interference -> Collision -> Packet Loss -> Retransmission -> Increase in net energy
The role of MAC
Co-ordinate access to and transmission over the common, shared (wireless) medium
Can traditional MAC methods be directly applied to WSN ?
Control -> often decentralized
Data -> low load but convergecast communication pattern
Links -> highly volatile/dynamic
Nodes/Hops -> Scale is much larger
Energy is the BIGGEST concern
Network longetivity, reliability, fairness, scalability and latency
are more important than throughput
MAC is Crucial … Design/Choose it Carefully !!!
133
134. MAC Family
Reservation
(Scheduled, Synchronous)
Contention
(Unscheduled, Asynchronous)
Reservation-based
Nodes access the channel based on a schedule
Examples: TDMA
Limits collisions, idle listening, overhearing
Bounded latency, fairness, good throughput (in loaded traffic conditions)
Saves node power by pointing them to sleep until needed
Low idle listening
Dependencies: time synchronization and knowledge of network topology
Not flexible under conditions of node mobility, node redeployment and node death:
complicates schedule maintenance
Contention-based
Nodes compete (in probabilistic coordination) to access the channel
Examples: ALOHA (pure & slotted), CSMA
Time synchronization “NOT” required
Robust to network changes
High idle listening and overhearing overheads
MAC Taxonomy
134
139. The FORTUNE TELLER or NOT …
Low power, long range communication is a very different ball game
compared to standard communication technologies.
Many attributes that inherently are known to work in regular communications
will “shock you” in low-power communications.
Take inspiration from the tons of WSN deployments that have studied these
artifacts rather than hypothesizing “again”.
139
149. Interoperability via Data Semantics: IEEE 1451 + IEEE 2700 ?
The IEEE 1451 (TEDS) is a well established
standard in industrial automation to achieve
plug-n-play capability with the help of
electronic datasheets.
TEDS is the electronic version of the data sheet
that is used to configure a sensor.
TEDS bring forward the concept that if the data
sheet is electronic and can be readily accessed
upon sensor discovery, it would be possible to
configure the sensor automatically.
This is analogous to the operation of plugging a
mouse, keyboard, or monitor in the computer
and using them without any kind of manual
configuration.
TEDS enables self-configuration of the system
by self-identification and self-description of
sensors and actuators (i.e., plug-and-play).
IEEE 2700 is a sensor calibration standard.
149
151. The Data to Knowledge Pipeline
Cyber & Physical Space Entities
Edge
Global Infra
Data Ingestion
Data Analysis
Applications
Data source
“Big” data Infra
“Little” data Infra
Decision making
with Knowledge
DATA @ REST (VOLUME)
Archival/Static data (TBs) in Data stores
DATA @ MOTION (VELOCITY)
Streaming data
DATA @ MANY FORMS (VARIETY)
Structured/Unstructured, Text, Multimedia, Audio, Video
DATA @ DOUBT (VERACITY)
Data with uncertainty that may be due to
incompleteness, missing points, etc.,
NATURE of INGESTED DATA
PRESCRIPTIVE
What are the best outcomes ?
PREDICTIVE
What could happen ?
DIAGNOSTIC
Why did this happen ?
DESCRIPTIVE
What has happened ?
NATURE of DATA ANALYSIS
151
152. Value
Hindsight and Insight/
Insights into the PAST
Foresight/
Insights into the FUTURE
Skill
Descriptive
“WHAT has
happened ? ”
Diagnostic
“WHY did this
happen ?”
Prescriptive
“WHAT should
we do ?”
Predictive
“WHAT could
happen ? ”
Information Optimization
Nature of Data Analysis
DASHBOARD
FORECAST ACTIONS,
RULES,
RECOMMs
152
153. Example: Energy Analysis for a PV Microgrid
Descriptive: What is the total energy, instantaneous energy and power, etc., …?
Diagnostic: Why is the panel temperature decreasing, when the solar irradiance is high and the wind
speed is very low ?
Predictive: Can I forecast the plant output for tomorrow, or can I generate 4kWh net energy ?
Prescriptive: What actions should be undertaken for the plant to reach 4kW energy generation capacity
from its current 2 kW ?
153
154. Example: Self Health Monitoring of Multi-rotor MAV
Descriptive: What is the total input power (voltage and current), thrust, vibration and ego-noise profiles,
and motor/propeller unit RPM ?
Diagnostic: Why is the THRUST not increasing with increasing RPM ?
Predictive: What is the success probability of the upcoming mission, given that flight and structural
health history ?
Prescriptive: What actions should be taken for increasing the success probability of the upcoming mission
from 75% to 90% ?
154
155. Machine/System Intelligence …
Depending on the type and quality of analytics, machines/systems could manifest themselves into:
Informed Systems — Systems That Know/Aware
Adaptive Systems — Systems That Learn
Cognitive Systems — Systems That Reason and Plan
155
158. ML computational methods / algorithms :
LEARN information directly from data, “without” relying on predetermined models
FIND natural patterns in data, which help to generate insights for better decisions and
predictions
ML teaches Machines to do what “naturally” comes to
Humans and Animals
“LEARN from EXPERIENCE”
158
159. ML Techniques
SUPERVISED
Develop a predictive model,
based on evidence (both
input and output data)
UNSUPERVISED
Group and interpret data,
based only on input data
(without labels)
CLASSSIFICATION
Predicts discrete responses
(e.g., email: genuine vs. spam;
tumor: cancerous vs. benign)
REGRESSION
Predicts continuous responses
(e.g., changes in temperature;
fluctuations in power demand)
CLUSTERING
Finds hidden patterns or
groupings
(e.g., object recognition)
When to use ?
When you want to train a model to make
a prediction.
When you have existing <input, output>
data for response that you are trying to
predict.
When to use ?
When you want to train a model to find a good
internal representation.
When you want to explore your data; but
don’t yet have a specific goal, or are not sure
what information the data contains.
When you want to reduce the dimensions of
your data.
When to use ?
When you are working with
data that can be tagged or
categorized.
When to use ?
When you are working with
data ranges, and want to
predict trends.
159
160. Selecting the Right Algorithm
ML TECHNIQUES
SUPERVISED UNSUPERVISED
CLASSIFICATION REGRESSION CLUSTERING
Support Vector
Machines
Discriminant
Analysis
Naive Bayes
Near Neighbor
Linear Regression
Ensemble Methods
Decision Trees
Neural Networks
K-Means, K-Medoids,
Fuzzy C-Means
Hierarchical
Gaussian Mixture
Is it TRIAL and ERROR ?
Is it Trade-off between:
Speed of training
Memory usage
Predictive accuracy on new data
Transparency / Interpretability
(how easily can you understand the reasons for an algorithm to make that prediction)
Using larger training datasets often yield models that generalize well for new data
160
162. ML Workflow: Feature Derivation
The number of features that could be derived is limited only by our imagination !!!
Sensor data
Extract signal properties from raw sensor data
Peak analysis (frequency, power, etc.,)
Pulse and transition analysis (rise time, fall time, settling time, etc.,)
Spectral analysis (power, bandwidth, frequency & its span, etc.,)
Image/Video data
Extract features such as edge locations, resolution, color …
Bag of visual words (create a histogram of local image features : edges, corners, blobs, etc.,)
Histogram of oriented gradients
Minimum eigenvalue (detect corner locations in images)
Edge detection (identify points where the degree of brightness changes sharply)
Transactional data
Calculate derived features that enhance the information in the data
Time decomposition (break timestamps down into components such as day and month)
Aggregate value calculation (create higher-level features such as total number of times a
particular event occurred)
162
164. How it Works ?
Categorizes data points based on the classes of their nearest neighbors in the dataset (“guilty by
association”).
Motivating insight: data points near to each other, tend to be similar.
Non-parametric: does not make any assumptions regarding the distribution of data.
Metric for near neighbor : Distance, either Euclidean (most popular), City block, Chebychev,
Correlation, Cosine, etc.
Choose K to be ODD for clear majority
Best Used :
When you want to use a method that does not have training phase (often called a lazy learner).
When response time, memory and space are of lesser concern (need to store not just the
algorithm, but also the training data).
When you want a less smarter algorithm, which can be fooled with irrelevant inputs (i.e., less
robust to noise).
When you need a simple algorithm to establish benchmark learning rules.
k-Nearest Neighbor (kNN)
K = 15
Smoother, more defined boundaries
K = 1
164
165. Logistic Regression
How it Works ?
Fits a model that can predict the probability of a binary response belonging to one class or the other.
Best Used :
When the dependent variable is BINARY.
When data can be clearly separated by a single, linear boundary.
When a baseline is needed for evaluating more complex classification methods.
𝑦 =
1
1 + 𝑒− 𝛽0+ 𝛽1𝑥
x
y
x
y
165
166. Support Vector Machines
How it Works ?
Classifies data by finding the linear decision boundary (hyperplane), which separates all data points
of one class from those of the other class.
When the data is linearly separable:
the best hyperplane is the one with the:
largest margin between the two classes.
When the data is not linearly separable:
use a kernel transform to transform
nonlinearly separable data into higher dimensions,
where a linear decision boundary can be found.
use a loss function to penalize points on the
wrong side of the hyperplane.
Best Used :
When data has exactly two classes.
multiclass classification can be performed with a divide-and-conquer approach
When data is complex, has high-dimensionality, and is nonlinearly separable.
When data is limited.
When you need a classifier that’s simple, easy to interpret, and accurate.
When fast response is needed.
Support vector
166
167. Neural Networks
How it Works ?
Consists of highly connected networks of neurons, which relate (map) the inputs to the desired
outputs.
The network is trained by iteratively modifying the strengths (i.e., weights) of the connections so
that given inputs map to the correct response.
Best Used :
When modeling highly nonlinear systems.
When computation cost is of lesser concern.
When model interpretability is not a key concern* (… however, there is work that can go to the
details of interpreting each layer and also suggesting how many neurons are needed; therefore, is
interpretable | It can also now handle time information ….)
When there could be unexpected changes in your input data* (… for which the network has to be
deep with large number of neurons …)
167
168. Naïve Bayes
How it Works ?
Based on Bayes Probability Theorem, it assumes that the presence of a particular feature in a class
is unrelated to the presence of any other feature.
Classifies new data based on the highest probability of its belonging to a particular class.
c = HYPOTHESIS (class)
x = EVIDENCE (predictor variable / new data point)
P(c) = probability of the hypothesis before getting the evidence
P(c|x) = probability of the hypothesis after getting the evidence
Best Used :
When assumption of feature independence holds TRUE; it can easily outperform other well known
techniques with lesser training data.
When the model is expected to encounter scenarios that weren’t in the training data.
When CPU and memory resources are a limiting factor* (… although for likelihood estimation, a
dataset is needed …).
When you want a method that doesn’t overfit.
When you want a method that can update itself with continuous new data.
When you need a classifier that’s easy to interpret.
𝑃 𝑐 𝑥 =
𝑃 𝑥 𝑐 𝑃(𝑐)
𝑃(𝑥)
Posterior = Likelihood ratio x Prior
168
169. Discriminant Analysis
How it Works ?
Classifies data by finding linear combinations.
Assumes that different classes generate data based on Gaussian distributions.
Training a discriminant analysis model involves finding the parameters for a Gaussian distribution for
each class.
The distribution parameters are used to calculate boundaries, which can be linear or quadratic
functions; and these boundaries are used to determine the class of new data.
Best Used …
When memory usage during training is a concern.
When you need a model that is fast to predict.
When you need a simple model that is easy to interpret.
169
170. Decision Trees
How it Works ?
represents a procedure for classifying categorical data based on their attributes.
decide which attribute to test at a node by determining the “best” way to separate (splitting‐point)
pick the attribute that has the highest Information gain.
A decision tree for the concept buys_computer, indicating whether a customer at AllElectronics is likely to purchase a computer. Each internal (nonleaf)
node represents a test on an attribute. Each leaf node represents a class (either buys_computer = yes or buy_computers = no)
Best Used :
When handling large datasets.
When there is a need to ignore redundant variables, and handle missing data elegantly* (… missing
data should be small …).
When memory usage needs to be minimized.
When decision traceability is needed. 170
171. Bagged and Boosted Decision Trees
How do Bagging and Boosting get N leaners ?
Trees are simple, but often produce noisy (bushy) or weak (stunted) classifiers.
In these ensemble methods, several “weaker” decision trees are combined into a “stronger” ensemble.
Why are the data elements weighted ?
171
172. Bagged and Boosted Decision Trees
How does the classification stage work ?
172
173. Bagged and Boosted Decision Trees
Similarities Differences
Both are ensemble methods to get N learners from 1
learner
Bagging: builds N leaners independently
Boosting: tries to add new models that do well where
previous models fail
Both generate several training data sets by random
sampling
Bagging: no weighting strategy
Boosting: determines weights for the data to tip the
scales in favor of the most difficult cases
Both make the final decision by averaging the N
learners (or taking the majority of them)
Bagging: an equally weighted average
Boosting: a weighted average (i.e., more weight to
those with better performance on training data)
Both are good at reducing variance and provide
higher stability
Bagging: may solve the over-fitting problem | may not
reduce bias
Boosting: may increase the overfitting problem | tries
to reduce bias
Best Used :
When there is a need to minimize prediction variance
Boosting > Random Forests > Bagging > Single Tree
173
175. Linear/Non-linear/Gaussian Process Regression
How it Works ?
Describes a continuous response variable as a linear/non-linear/Gaussian process function.
Linear regression : Best Used …
When you need an algorithm that is easy to interpret and fast to fit.
When you need a baseline for evaluating other, more complex regression models.
Non-linear regression : Best Used …
When data has strong nonlinear trends, and cannot be easily transformed into a linear space.
When you need to fit custom models to the data.
Gaussian Process regression (Kriging) : Best Used …
When interpolation needs to be performed in the presence of uncertainty.
Linear Regression Non-linear Regression Kriging
175
176. SVM Regression / Regression Tree
SVM Regression: How it Works ?
Works the same as SVM classification algorithms, but is modified to be able to predict a
continuous response.
Instead of finding a hyperplane that separates data, it finds a model that deviates from the
measured data by a value no greater than a small amount, with parameter values that are as
small as possible (to minimize sensitivity to error).
SVM Regression: Best Used :
For high-dimensional data , where there will be a large number of predictor variables.
When data is limited, and number of predictor variables are large
Regression Tree: How it Works ?
Works the same as decision trees, but is modified to be able to predict a continuous response.
Regression Tree : Best Used :
When predictors are categorical (discrete) or behave nonlinearly.
176
178. Clustering Analysis
Hard Clustering
Each data point
belongs to only
ONE cluster
Soft Clustering
Each data point
belong to MORE
than ONE cluster
Data grouping
is KNOWN
Data grouping
is UNKNOWN
Self Organizing Maps (SOM)
Hierarchical Clustering
Search for possible clusters
Use cluster evaluation to look for the
“best’ number of groups for a given
clustering algorithm
Data is partitioned into groups (or
clusters) based on some measure of
similarity or shared characteristic.
Clusters are formed so that objects in
the same cluster are very similar and
objects in different clusters are very
distinct.
178
179. Common Hard Clustering Algos: k-Means / k-Mediods
k-Means: How it Works ?
Partitions data into k number of mutually exclusive clusters.
The fitment of a point into a cluster is determined by the distance from that point to
the cluster’s center.
k-Mediods: How it Works ?
Similar to k-means, but with the requirement that the cluster centers coincide with
points in the data.
Best Used :
When the number of clusters is known.
For fast clustering of categorical data
To scale to large data sets
179
180. Hierarchical Clustering & SOM
Hierarchical: How it Works ?
Produces nested sets of clusters by analyzing similarities between pairs of points and
grouping objects into a binary, hierarchical tree.
Hierarchical : Best Used :
When advance knowledge of data clusters is missing
When you want visualization to guide your selection
SOM: How it Works ?
Neural-network based clustering that transforms a dataset into a topology-preserving
2D map
SOM: Best Used :
To visualize high-dimensional data in 2D or 3D
180
181. Possible Modes with Unsupervised Learning
Data Clusters
Results
Lower Dimensional Data
Feature Selection
Supervised
Learning
Model
Large Data
Unsupervised
Learning
End goal is unsupervised learning Preprocessing step for supervised learning
181
183. Improving Models
Model improvement in learning means:
increasing its accuracy
increasing predictive power
preventing over-fitting (ambiguity between data and noise)
increasing model parsimony
Essentially, reduces errors in learning due to noise, bias and variance
Feature Selection
Identifying the most relevant features, which provide the best predictive power.
Could be done by: adding or removing features, which do not improve model performance.
Feature Transformation
Recasting existing features into new features using techniques such as: principal component
analysis, nonnegative matrix factorization, and factor analysis.
Hyperparameter Tuning
It is the process of identifying the set of parameters that provide the best model.
It controls how a ML algorithm fits the model to the data.
A model is only as good as the features selected to train on !!!
183
184. Feature Selection
Especially useful:
when dealing with high-dimensional data
when the dataset contains a large number of features and a limited number of
observations
Reducing the feature space saves storage and computation time
Makes the result easier to understand
Stepwise Regression
Sequentially adding or removing features until there is no improvement in prediction accuracy.
Sequential Feature Selection
Iteratively adding or removing predictor variables and evaluating the effect of each change on the
performance of the model.
Regularization
Using shrinkage estimators to remove redundant features by reducing their weights (coefficients)
to zero.
Neighborhood Component Analysis (NCA)
Finding the weight each feature has in predicting the output, so that the features with lower
weights can be discarded.
184
185. Feature Transformation
Feature transformation is a form of dimensionality reduction
Principal Component Analysis (PCA)
Performs a linear transformation on the data, so that most of the variance or information in your
high-dimensional dataset is captured by the first few principal components.
The first principal component will capture the most variance, followed by the second principal
component, and so on.
Nonnegative Matrix Factorization
Used when model terms must represent nonnegative quantities, such as physical quantities.
Factor Analysis
Identifies underlying correlations between variables in the dataset to provide a representation in
terms of a smaller number of unobserved latent factors, or common factors.
Shows the relationship between variables, so that variables (or features) that are not highly
correlated can be removed.
185
186. Feature Transformation & Hyper-parameter Tuning
Begin by setting parameters based on a “best guess” of the outcome.
Goal is to find the “best possible” values - that would yield the best model.
As the parameters are adjusted and model performance begins to improve, a note has to be
made as to which parameters are effective and which still require tuning.
Three common parameter tuning methods are:
Bayesian optimization
Grid search
Gradient-based optimization
Hyperparameter tuning is an iterative process
186
187. Choosing the Right Model ?
Why is it so hard to get right?
Each model has its own strengths and weaknesses in a given scenario.
No established set of rules/guidelines.
Closely tied to business case, and understanding of what needs to be accomplished.
What can you do to choose the right model?
How much data do you have and is it continuous?
What type of data is it?
What are you trying to accomplish?
How important is it to visualize the process?
How much detail do you need?
Is storage a limiting factor?
Is response time a limiting factor ?
Is computation cost a limiting factor ?
187
188. Model Over-fitting
Overfitting means that the model is so closely aligned to training data sets that it does
not know how to respond to new situations.
Why is overfitting difficult to avoid?
often the result of insufficient/inaccurate
information about the scenario.
How do you avoid overfitting?
using appropriate training data.
training data needs to accurately reflect the complexity
and diversity of the data the model will be expected to work with.
use regularization
penalizes large parameters to help keep the model from relying too heavily on individual
data points and becoming too rigid
control the smoothness of fit
Has the form: [Error + λf(θ)], where f(θ) grows larger as the components of (θ) grow
larger and λ represents the strength of the regularization
λ decides how much you want to protect against overfitting
if λ=0, you aren’t looking to correct for overfitting at all
perform model cross-validation
partitions a dataset and uses a subset to train the algorithm and the remaining data for
testing
common techniques: k-fold | holdout 188
190. The FORTUNE TELLER or NOT …
A general rule-of-thumb:
Training - to generate the MODEL - is an expensive operation
Estimation - using the derived MODEL - is lightweight
Intelligence (derived through LEARNING) on Embedded systems:
On-device training MAY NOT be a good strategy
It may be better to offload it to a resourceful device
On-device estimations using the derived model MAY be a good strategy
There are EXCEPTIONS to this rule !!!
Sequential versions of many commonly used learning algorithms have
been developed (K-means, etc.), and are part of the stream
processing suite.
190
191. Acknowledgment and References
This short course on IoT has been compiled from various online resources,
text books, and research papers on this topic.
While Prasant may not be able to “correctly” recollect the right sources, he –
nevertheless – requests all viewers to drop a note, if they come across any
discrepancies in this regard.
1. MAC Essentials :
A.Bachir, M. Dohler, T. Watteyne and K. K. Leung, "MAC Essentials for
Wireless Sensor Networks," in IEEE Communications Surveys & Tutorials, vol.
12, no. 2, pp. 222-248, Second Quarter 2010.
2. Machine Learning :
https://in.mathworks.com/campaigns/products/offer/machine-learning-
with-matlab.html