Phil Day presented on Configured Things' data gateway for filtering real-time IoT sensor data using declarative policies. The gateway ingests data through MQTT or HTTPS and applies user-defined policies in Flux to control data visibility, quality, and filtering. Policies can dynamically define data scopes, quality metrics like aggregation, and filters. The policies are mapped to Flux queries to process the data in InfluxDB. This allows different stakeholders to securely access customized data streams from sensors in applications like smart cities.
Phil Day [Configured Things] | Policy-Driven Real-Time Data Filtering from IoT Sensors with Flux | InfluxData EMEA 2021
1. Phil Day
Director of Engineering
Policy driven real-time
data filtering from IoT
sensors with Flux
2. | Agenda
• Background to Configured Things
• Data challenges in Connected Spaces such as Smart
Cities
• The role of a data gateway, and how we built one with
Flux
• Short Demo
3. | Configured Things
UK based company, founded in 2018
We build declarative systems to securely configure smart spaces.
Alumni of the NCSC Cyber Accelerator and Smart Cities programs
4. | Declarative systems
Our Platform is a declarative model based configuration system.
We model and manage the changes as first order objects, where
changes can
• come from multiple sources
• be added and removed in any order
• be constrained to specific areas of the overall model
• be authenticated, verified, and potentially signed
5. | Declarative systems
In a declarative system the interface is constrained to statements about
the desired state of all or some part of the systems.
When properly implemented they are:
• Simple to interact with
• Robust
• Easy to reason about from a security and audit perspective.
6. | Data Challenges in Smart Cities and other Connected Places
Smart Cities are formed of multiple co-operating systems.
Most existing systems focused on single data owner, where sharing is a
binary choice.
Data owners need to be able control the content and quality of data they
share
The drivers and policy for sharing data can change dynamically.
The policies and changes to policies need to be auditable.
7. | ConfigureThings Data Gateway
The ConfiguredThings Data Gateway is a software appliance for
providing controlled processing and sharing of streaming data
Policy is define via a declarative model, which configures and queries
an underlying Influx DB platform.
Policy can be can change dynamically at any time
8. | Data Gateway Context
IoT
System
Data
Gateway
Policy
Data
Gateway
Policy
Data
Gateway
Policy
Sensor
Owner
Data Stream Policy
Controlled
Data Streams
Sensors
Radio
Network
Sensor
Owner
Data Stream
9. | Data Gateway Requirements
• Policy to control data visibility & quality
• Support different policies, and map multiple users to
each policy
• Provide a real time and historical data streams
• Support user defined filtering and analytics
10. | Architecture
Ingest
IoT System
MQTT, https
Collected
Data
Policy 1
Data
Policy 2
Data
Policy 3
Data
P2
P1
P3
Policy
Mgr
Notify
Policies (Flux) User Filters (Flux)
User Filter
User Filter
User Filter
User Filter
User Filter
Filter
Mgr
Notify
User
Data
Streams
11. | What can you define in a Policy ?
Data Scope
What time period (Historical data)
Which measurements
Which metadata (device ID, device name, ...)
Which fields (some series can contain multiple values)
Data Quality
Aggregation (sample, sum, count, average, min, max) - e.g. calculate average in every 10 minute
period
Resolution
Mapping (map value ranges to a new enumeration)
Policies and User Filters have the same format and capabilities
12. | Policy Definition (JSON)
{
"period": "2h", // Limit data to last 2 hours
"measurements": {
"temp_ave": { // Derive a new measurement called “temp_ave”
"from": "_Lora", // from the data source “_Lora”
"source": "sim.temp", // and the measurement “sim.temp”
"filter": {
"devName": "/Sim-Dev-1.*/", // Match any source where devName starts “Sim-Dev-1”
"devEUI": null // Include the metadata “devEUI”
},
"fields": ["1", "2"], // Include just the fields “1” and “2”
"window": { // Average the data over 40 seconds
"duration": "40s",
"function": "average"
},
"resolution": 1 // Limit the resolution to 1 decimal place
}
}
}
13. | Policy Definition (Mapped to Flux)
{
"period": "2h", // Set the retention period of the Policy bucket
"measurements": {
"temp_ave": {
"from": "_Lora", temp_ave = from(bucket: "_Lora")
|> range(start: <start>, stop: <stop>)
"source": "sim.temp", |> filter(fn: (r) => r._measurement == "sim.temp"
"filter": {
"devName": "/Sim-Dev-1.*/", and (r.devName =~ /Sim-Dev-1.*/)
"devEUI": null
},
"fields": ["1", "2“], and (r._field == "1" or r._field == "2"))
|> keep(columns: ["_time", "_field", "_value", "devName", "devEUI"])
"window": {
"duration": "40s", |> aggregateWindow(every: 40s, fn: mean)
"function": "average“ |> map(fn: (r) => ({ r with _time:
experimental.addDuration(d: 20s, to: r._start)}))
}, |> drop(columns: ["_start", "_stop"])
"resolution": 1 |> map(fn: (r) => ({ r with _value: rnd(x: r._value, n: 1)}))
}
} |> yield temp_ave
}
14. | Handling time periods
Query period #1
t0
t0 - policy.period
#results > 0
#results > 0
#results = 0
t1 t2 t3
The flux query is run when the
policy or filter is defined /
updated, and then every time we
get a notification that a new
value has been added to the
source
initial notification notification notification
#4
#3
#2
15. | Handling time periods with windows
Query period #1
t0
t0 - policy.period
notification
#results > 0
#results > 0
notification
#results = 0
notification
#2
#3
#4
t1 t2 t3
When the policy includes a
window we have to ensure we
don’t leak data.
• Always align windows to the
same boundary
• Only include full windows
initial
00:00
16. | Location Handling
Location data can have particular value and security considerations.
During ingest we calculate the S2 Cell Id for any point with location fields at a
range of resolutions, e.g.
Level 18 (27m), Level 16 (153m), Level 13 (850m), …
We add a metadata value for each calculated level.
Policy can then be used to filter only the level(s) a user is allowed to know
Also looking at using the Flux Geo package to allow rules such as:
“All sensors within xxx meters of this location”
"filter": {
"S2_Cell_13": null,
17. | Policy Definition - Sampling
{
"period": "2h", // Limit data to last 2 hours
"measurements": {
"temp_sample": { // Derive a new measurement called “temp_sample”
"from": "_Lora", // from the data source “_Lora”
"source": "sim.temp", // and the measurement “sim.temp”
"window": {
"duration": "5m",
"function": "sample", // Sample the values in 5 minute windows
"frequency": 3, // Return every 3rd point
"limit": 4 // with a limit of at most 4 points in each window
},
}
}
}
19. | Creating a default policy or filter
We can inspect a bucket and create a policy or filter definition which
matches all of the data in the bucket
filter: object of all of <key_name>: null
filter_values: object of arrays giving <key_values> for each key
fields: array of all <field_names>
Default filter available to each user so they can inspect the data available to
them from a policy, and modify as needed.