Abstract. Enterprise adoption of AI/ML services has significantly accelerated in the last few years. However, the majority of ML models are still developed with the goal of solving a single task, e.g., predictiction, classification. In this talk, Debmalya Biswas will present the emerging paradigm of Compositional AI, also known as, Compositional Learning. Compositional AI envisions seamless composition of existing AI/ML services, to provide a new (composite) AI/ML service, capable of addressing complex multi-domain use-cases. In an enterprise context, this enables reuse, agility, and efficiency in development and maintenance efforts.
2. Enterprise AI
Enterprise AI/ML use-cases are
pervasive.
4
Broadly categorized by the three core
AI/ML capabilities enabling them:
Natural Language Processing (NLP),
Computer Vision and Predictive Analytics
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor
Majority of AI/ML models are still
developed with the goal of solving a
single task, e.g., prediction, classification.
3. Compositional AI Scenario
Consider the online Repair Service of a
luxury goods vendor.
The service consists of a Computer Vision
(CV) model capable of assessing the repairs
needed, given a picture of the product
uploaded by the customer.
Product Repair
Assessment CV Model
Chatbot Ordering App
Repair
Ordering
Service
The assessment is followed by an Ordering
Chatbot conversation that captures
additional details required to process the
user’s repair request, e.g., damage details,
username, contact details, etc.
4. Compositional AI Scenario (2)
In future, when the enterprise is looking for models
to develop a Product Recommendation service; the
Repair Service is considered.
The data gathered by the Repair Service: state of
products owned by the users (gathered by CV
assessment model) together with their demographics
(gathered by the Ordering Chatbot) - provides
additional training data for the Recommender Service.
Privacy policies may prevent their data from being
combined, such that, they cannot be used to profile
customers – “data used for a different purpose than
originally intended”. Product Repair
Assessment CV Model Chatbot Ordering App
Repair Ordering
Service
[Damaged product
images + Text
description, Customer
demographics ]
Product
Recommendation
Service
[Products purchased
+ Demographics]
5. Compositional AI Scenario (3)
Enterprise further wants to develop a CV
App to detect Defective products during
Manufacturing.
The Repair Service can help here as it has
labeled images of damaged products (with
the product damage descriptions provided
to the Chatbot acting as ‘labels’).
Product Repair
Assessment CV Model Chatbot Ordering App
Repair Ordering
Service
[Damaged product
images + Text
description, Customer
demographics ]
Manufacturing Defect
Detection App
[Damaged Product images +
Text description]
Training data is acquired by fusing data
gathered by two different AI/ML Services.
6. Compositionality
Ability to form new
(composite) services by
combining the capabilities
of existing services.
The existing services may
themselves be composite,
leading to a hierarchical
composition.
7. Prior-Art: Web Services Composition
WS-Composition enables reuse and
integration of existing (isolated)
applications in an enterprise.
Composition challenges: Discovery,
Matchmaking, Monitoring, Transactions
BPEL specification to orchestrate Web
Services Compositions (link)
* D. Biswas. Web Services Discovery and Constraints Composition. RR 2007: 73-87
* D. Biswas, K. Vidyasankar. Optimal Compensation for Hierarchical Web services
Compositions under Restricted Visibility. IEEE APSCC 2009: 293-300
8. Prior-Art: Secure Composition
Given a complex task, first partition the task
to several, simpler sub-tasks. Then, design
protocols for securely realizing the sub-tasks.
Universal Composition (UC)*-framework
ensures that the protocol composed from
(secure) sub-protocols, securely realizes the
given task.
UC continues to guarantee security in novel
execution environments, or where other
protocols are running concurrently – essential
to run protocols in complex, unpredictable
and adversarial environments.
* Ran Canetti. 2020. Universally Composable Security. J. ACM 67, 5, Article 28 (October 2020).
9. ML Prior-Art: Ensemble Learning
Ensemble Learning attempts to make the
best use of the predictions from multiple
models catering to the same problem.
Commonly used Ensemble Learning
techniques include: Bagging, Boosting
and Stacking.
D4
D3
D2
D1
Original Training
Data D
Split data set
Build multiple
models
Combine
models
10. ML Prior-Art: Federated Learning
Federated learning, also known as Collaborative Learning, enables multiple (non-trusting) entities to
collaborate in training an ML model on their combined dataset.
FL-Neural Network training: All nodes agree upon
the same neural network architecture and
task to train a global model.
During each epoch, nodes download the global model
parameters from the coordinator, and updates them
locally using some variant of gradient descent on
their local datasets; sharing the updated values back
with the coordinator.
The coordinator node averages the gathered
parameter values from all child nodes.
* B. McMahan, et. al. Communication-Efficient Learning of Deep Networks from
Decentralized Data. AISTATS 2017: 1273-1282
Org3
Data
Org2
Data
Org1
Data
Training data
belonging to
different
organizations
Locally
trained Neural
Networks
Coordinator:
Parameter Server
(Global model -
average parameters)
Download global
parameters
Share local
updates
11. ML Prior-Art: Stacking Neural Networks
In the context of OCR, CNN is only used
as the feature extractor; with the
features provided as input to the LSTM.
The LSTM is able to take into account
both the preceding and following set of
output characters - to output the most
probable character at each time step.
Fusion
Input image Image features
CNN LSTM
“fusion”
Sequential Composition
12. AI Service Basics
AI Service: Data + Model + API
(Labeled)
Data
(Train)
ML Model
API
Endpoint
DataOps MLOps
APIOps /
API Mesh /
API
Management
13. DataOps – Data Fusion
Integration/fusion tools for AI Services are lacking - a key part of Compositional AI
“DataOps is an automated,
process-oriented methodology,
used by analytic and data teams, to
improve the quality and reduce the
cycle time of data analytics.”
- Wikipedia
Data Processing
NiFi: Data movements and transformations
Spark: Complex data transformations
Data Integration
PrestoDB: Federate queries over multiple data sources
Hive + LLAP (Data Warehouse): Central repository of integrated
data from one or more data sources
Neo4j: Use graph structures to understand relationships and
perform semantic queries
Data Access
Tableau, PowerBI: Dashboard, Reports
WSO2: Expose data and ML services as APIs
Data Ingestion
Kafka: Millions of events per seconds
HDFS: Hadoop File System
Federated
Queries
Data
Marts
Knowledge
Graphs
14. MLOps
Manages model versions and
parameters, however model
fusion aspect is missing.
* D. Sculley, et. al. Hidden Technical Debt in Machine Learning Systems. NIPS 2015: 2503-2511
MLOps, also known as ModelOps,
combines DevOps with ML to
manage ML models in production.
End-to-end ML lifecycle: Data and
(Serving) API aspects are also
considered.
15. APIOps - API Management – API Mesh
(Black-box APIs) Good for prototyping,
difficult to use for strategic use-cases,
without any knowledge of the
underlying models and data.
Cloud ML APIs providing core AI/ML
capabilities, e.g., Computer Vision, NLP,
Chatbots, Speech, Video, etc.
16. Data Governance
Data Governance includes: Data
catalog, Data dictionary, Data
provenance and lineage tracking,
Data modeling, etc.
We have considered the operational
part: DataOps, MLOps, APIOps.
Does the answer lie in establishing a
governance framework?
Data Processing
NiFi: Data movements and transformations
Spark: Complex data transformations
Data Integration
PrestoDB: Federate queries over multiple data sources
Hive + LLAP (Data Warehouse): Central repository of integrated
data from one or more data sources
Neo4j: Use graph structures to understand relationships and
perform semantic queries
Data Access
Tableau, PowerBI: Dashboard, Reports
WSO2: Expose data and ML services as APIs
Data Ingestion
Kafka: Millions of events per seconds
HDFS: Hadoop File System
D
a
ta
G
o
v
e
r
n
a
n
c
e
E
th
ic
a
l
A
I
G
o
v
e
r
n
a
n
c
e
:
P
r
iv
a
c
y
,
E
x
p
la
in
a
b
ility
,
B
ia
s
/F
a
ir
n
e
s
s
,
A
c
c
o
u
n
ta
b
ility
17. Data Governance: FAIR Principles
The software / ML code part – how the
data is transformed is not considered. This
leads to potentially conflicting Open Data
vs. Open-Source Software frameworks.
FAIR principles provide guidance in terms of
specifying the data lineage and provenance,
maximizing reuse and enabling the users to
decide which data is fit for their purpose.
Source: https://www.openaire.eu/
*Lamprecht et. al., Towards FAIR principles for research software, June 2020
*“there are also several significant
differences between data and software as
digital research objects”
18. Ethical AI Governance
**Key components of an Ethical AI
Governance Framework include:
Privacy, Explainability, Bias/Fairness
& Accountability.
*“Ethical AI, also known as
Responsible AI, is the practice of using
AI with good intention to empower
employees and businesses, and fairly
impact customers and society.”
*R. Porter. Beyond the promise: implementing Ethical AI, 2020 .
**D. Biswas. Ethical AI: its implications for Enterprise AI Use-cases and
Governance. Linux Foundation Open Compliance Summit, 2020 (Article)
Data Processing
NiFi: Data movements and transformations
Spark: Complex data transformations
Data Integration
PrestoDB: Federate queries over multiple data sources
Hive + LLAP (Data Warehouse): Central repository of integrated
data from one or more data sources
Neo4j: Use graph structures to understand relationships and
perform semantic queries
Data Access
Tableau, PowerBI: Dashboard, Reports
WSO2: Expose data and ML services as APIs
Data Ingestion
Kafka: Millions of events per seconds
HDFS: Hadoop File System
D
a
ta
G
o
v
e
r
n
a
n
c
e
E
th
ic
a
l
A
I
G
o
v
e
r
n
a
n
c
e
:
P
r
iv
a
c
y
,
E
x
p
la
in
a
b
ility
,
B
ia
s
/F
a
ir
n
e
s
s
,
A
c
c
o
u
n
ta
b
ility
19. Ethical AI Governance: Privacy
Black box attacks are still possible when
the attacker only has access to the APIs:
invoke the model and observe the
relationships between inputs and outputs.
Two broad categories of inference attacks:
membership inference (if a specific user
data item was present in the training
dataset) and property inference
(reconstruct properties of a participant’s
dataset) attacks.
M. Rigaki and S. Garcia. A Survey of Privacy Attacks in Machine Learning.
2020.
A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box Adversarial Attacks
with Limited Queries and Information. ICML 2018, pages 2137–2146.
20. ML Privacy
This is because (during backpropagation) gradients of a given
layer of a neural network are computed using the layer’s
feature values and the error from the next layer.
For example, in the case of sequential fully connected layers,
A trained model may leak insights related
to its training dataset.*
*M. Nasr, et. al. Comprehensive Privacy Analysis of Deep Learning: Passive
and Active White-box Inference Attacks against Centralized and Federated
Learning. IEEE Symposium on Security and Privacy (SP), 2019, 739–753.
the gradient of error E with respect to Wl is:
That is, the gradients of Wl are inner products of the error
from the next layer and the features hl; and hence the
correlation between the gradients and features. This is esp.
true if certain weights in the weight matrix are sensitive to
specific features or values in the participants’ dataset.
21. Privacy implications in Compositional AI
A recent FTC ruling* stated that it is no longer
sufficient to just delete data when a user opts-
out; the organization will need to delete
models/algorithms trained on that data as
well.
Enforcing this in a compositional setting
requires capturing the (higher) level
composite services that have directly or
indirectly accessed the underlying (affected)
training data.
*FTC. California Company Settles FTC Allegations It Deceived
Consumers about use of Facial Recognition in Photo Storage App, 2021.
Privacy policies, e.g., FTC FIIPs** recommend that
data is only used for specific purposes (for which
the user has provided explicit opt-in), and not
combined with other datasets to reveal additional
insights that can be used to profile the user.
Such data aggregations can be very difficult to
detect in a compositional setting, as (higher) level
composite services can (via intermediate services)
aggregate data belonging to different services -
without their explicit approval.
**R. Gellman. Fair Information Practices: A Basic History -
Version 2.20. 2021. doi: 10.2139/ssrn.2415020.
23. Conclusion
Compositional AI envisions seamless composition of existing
AI/ML services, to provide a new (composite) AI/ML service,
capable of addressing complex multi-domain use-cases.
Data Fusion --> Compositional AI Comprehensive framework
integrating DataOps +
MLOps + AIOps
Critical to enable enterprise
reuse: Reduce re-work (80%)
in Data Engineering
Non-functional aspects will also
need to be addressed, e.g.,
lineage, privacy.
1 2
3 4