ML Governance is often discussed either in abstract terms without practical details or using detailed AI ethics examples. This talk will focus on the day-to-day realities of ML Governance. How much documentation is appropriate? Should you have manual sign-offs? If so, when and who should perform them? When is an escalation needed? What should a governance board do? What if you are in a regulated industry? How can MLOps help? And most importantly, what is the point of all this governance and how much is too much? This talk will show how each organisation can best answer these questions for their own context by referring to examples and public resources.
Ryan:
Lots of discussions of ML Governance never really get into details and can be confusing. We’re going to break through the confusion and tell you how you can really do something about ML Governance at the level of a Data Science team. This is going to involve talking about Documentation. But it’s good documentation - the kind of documentation that can really help you out if you use it wisely. So let’s get started.
First up, why is this topic so confusing? Why do so many people feel like they don’t even know what ML Governance is?
The fact is that many teams right now have very little governance. This is understandable as technologists have a delivery focus which means teams are biased towards building solutions now and worrying about risks later. To get to a better place on governance the burden can’t all be on tech. It has to be a collaborative process. This leaves techies a bit nervous because nobody really knows what is needed and there’s a fear of some bureacrat coming in and telling the team what they can and can’t do. The nervousness is amplified by confusion about what is really needed.
It’s understandable that we get confused about ML Governance. There’s lots of different aspects and it’s easy to get lost in the conversations. There’s hot topics like Responsible AI that gets alot of attention. And this is an MLOps meetup so we obviously love MLOps. But each of these is just a part of ML Governance.
We could roughly cluster different aspects of ML Governance under Ethics and Principles, Tech Practices and MLOps and Management and Frameworks. This helps us get a better picture of ML Governance but it only takes us so far.
We will get into the details very shortly. First just to re-emphasise that this is not a Responsible AI talk. Ethics and Responsible AI are important. But they’re only part of the conversation. We want to talk about the boring side of ML Governance.
Meissane:
We also need to think about the relative weight or perceived importance of the topics under ML Governance. When people think about ML Governance they tend to think about something like this with Ethics a big part of it and the documentation and sign-offs stuff falling under bureaucracy that they’d rather not get involved with.
We want to shift this thinking and instead think about documentation and peer review as parts of best practice that should be reinforced by good governance. And sign-offs shouldn’t be about having to beg some bureacrat to tick a box next to your model. It should be about positioning risk trade-off decisions with the right people.
It’s also worth understanding how ML Governance relates to other types of Governance and especially Data Governance. This is important because there’s important overlaps between ML and Data Governance.
The main areas of overlap are in documenting datasets. Where the dataset comes from, what it means, how it gets updated and its known limitations.
This is needed for both Data and ML Governance and ideally it would fall under Data Governance so that Data Scientists can leverage it.
It is not only needed for producing ML models but also for doing:
Data Analysis
Analytics dashboards
generally asking questions of data.
Data Labelling likewise has a lot of value for Data Analysis and non-ML applications.
Data Lineage is about tracking changes to the data over time:
This can be important for ML training pipelines and reproducibility.
Once again there are also data analytics use cases where data lineage can be important.
Sometimes this can be a requirement of auditors.
Meissane:
So let’s talk about how to come up with a ML governance process.
Here are the kind of questions we want to talk about to set up ML Governance.
READ QUESTIONS
Processes can easily be over engineered. Manual checks slow down the process and can be harder to follow and keep track which can affect the team morale.
Simply referring to a it as ‘Best Practice’ isn’t enough to make us trust it. It might just be bureaucracy rebranded.
For the team to feel comfortable with the process, it has to be relevant and appropriate.
This is a bit of a side note but the term ‘bureacracy’ literally means ‘rule by desks’.
When you are constrained by bureaucracy it does feel like you are at the mercy of something unthinking.
However, it’s important to note that what this feeling arises when rules don’t work well for your case. Like you can’t get done what you want to get done because somebody has made a rule without thinking about what you want to do. Processes and rules are not the problem.
The problem is when the rules and processes don’t fit with what needs to be done.
Ryan: So now we know what ML Governance is about. How do we make it happen?
Ryan:
This slide is going to look super simple. It is not the whole answer to ML Governance. But it is a starting point that we’ll use in this presentation.
You can think of this slide as a flexible template for a process that can be adapted for different organisations and teams.
The flow hinges around certain key documents that you can see here named as the Model Card, the Model Validation Report and Model owner approval. But the flow is not just about producing good documentation. It’s also about facilitating informed decision-making and positioning decisions with the most appropriate people.
Ryan:
The model developer produces a model card which documents the purpose of a model, its design, what data it uses, what risks they can see around it and advice on how the model should and should not be used. This is checked by the model validator.
Ryan:
The model validator also checks that the model is reproducible and that the code and documentation is clear. There might be some back and forth here. Think of it like a pull request review process. There might be a separate model validation report from this or it might be a section that gets added to the model card or it might even be a link to a pull request with structured comments.
Ryan:
The next step is the model owner. The model owner is also looking for clarity about how the model works and how it should be used and its limitations. But the model owner probably won’t be technical so this needs to be explained at a different level. This might result in some more back and forth on the documentation. Most importantly the model owner needs to know about any risks and trade-offs associated with the model as they will be responsible at a business level for the model within the business product or process in which it is to be used.
Within this process there might also be an escalation route to an oversight board. Not every org will have an oversight board but if you do then they would become involved in cases where a model is identified as high risk, triggering a deeper review with more parties. Factors that could trigger an oversight review:
Use of sensitive data or attributes (PII, protected attributes such as gender etc.)
Models making decisions with a potential negative impact on an individual or entity
Issues arising from ISRM security review
Serious concerns about quality of the model and monitoring (e.g. live data not well known and unable to perform desired testing and monitoring)
An oversight board might also lead a periodic review process. Perhaps you do a review every year or at some other frequency. This might be for an external auditor though it’s better to think of this process for non-regulated industries first. We can think of regulated industries separately. You might also do an internal audit to check that documentation is all up to a similar standard. You might also use the information to look for patterns and opportunities within the org. There’s a lot to understand here so let’s try to make it more concrete. We’ll get into the details of model cards and understand what a model validator or model owner would be looking for. But first let’s understand the roles in more detail.
Meissane:
In ML governance we want to place right kind of questions and decisions to sit with the appropriate roles.
Too often what we’re seeing is that Data Scientists are assumed to have already assessed risks and dealt with them, so that product management and other business managers don’t have to think about them.
This is not appropriate as Data Scientists are not empowered to make decisions about what risks are worth taking and are not able to simply make risks go away.
Data Scientists are in a position to develop models, to explain what they do and make the risks and trade-offs of models clear. Data Scientists are also in a position to advise on what monitoring will be appropriate for running models in production.
There may be more than one model validator with different intentions.
Assumption is that a model validator will be a fellow data scientist. This is necessary in order to check the robustness of the development process.
But there may also be some validation from an ML Engineer or Support Engineer or similar to ensure that they know all the background to monitor the model in live.
Ideally the Model Developer and an ML Engineer will work together to put together a Deployment and Monitoring plan. That also needs to be part of the extended Model Card as the model owner needs to know about it.
They need to know about any deployment risks and what kind of monitoring is achievable as it is part of the overall risk profile.
There has been a lot of discussion about how best to document ML models. We’ve listed the most notable approaches to ML Governance documentation here.
Model cards are a checklist that google is trying to popularise. They’re focused on overviews and design trade-offs of models.
Fairness and Limitation tradeoffs
https://drive.google.com/file/d/1QvwWNfFoweGVjsXF3DXzcrCnz-mx-Lha/preview
Datasheets are a kind of checklist for datasets, not for models. So they’re complimentary.
So model cards and datasheets both started from a position of reducing misuse, mistakes and bias. Reproducibility checklists started from a different angle. The motivation for reproducibility checklists was more about ensuring the robustness of the results being reported for ML models, especially in research papers.
Another angle for checklists is production readiness. ML Test Scores for Production Readiness address deployment and infrastructure and also elements aimed at the ML model such as ensuring that the code is reviewed and in git and that hyperparameters are tuned and that the model chosen is as simple as it can be without loss of performance.
With so many different angles to ML documentation, it’s clear that we need to cover a mixture of different concerns in document models. We might choose to do this in one checklist with a range of different sections or we could use a variety of checklists. The ML Cards for D/MLOps Governance link at the bottom of the slide here suggests using separate cards or checklists for different concerns and offers lots of suggestions for questions to include in the checklists.
We should now get into more detail on at least one of these checklist approaches. This will help us picture the idea more clearly. Google’s model cards probably the easiest to explain as google has done a lot of work to try to popularise the idea.
Model cards were proposed by google in a research paper. Added toolkit and google vision face and object detection examples. READ SLIDE
Ryan:
So that’s model cards. That’s the central piece in the process that we talked about before. Actually you could simply extend the model card concept and treat the three documents from this slide as one big model card. Maybe the model validator just provides feedback that updates the model card. And model owner approval could be recorded on the model card.
Ryan:
This can sound easy when you talk about it in a presentation. The difficult thing is making it work for a particular team. There are lots of difficult questions you hit when you try to introduce a process like this in a real team.
READ QUESTIONS
Answering these questions tends to depend a lot on the context of the team and organisation. You have to talk to people and figure out what everyone will be comfortable with.
Ryan:
Making the process work for a team isn’t just about talking to people either. There’s also documentation that shows people what the process is about and that’s super important.
There should be reference examples for the documentation - example model cards that show models that make sense for the team. Reference examples will have a big impact on what documentation really gets produced because they show developers what kind of detail is expected.
Talking to people and producing reference documentation is also not enough. You should test out a new process and get feedback and adjust it. I would say adjust it until it is proven but really you can keep adjusting it forever as it can be a living process.
This is just a small piece of general advice about shaping a governance process. You somehow have to decide about how much documentation detail is too much and how many sign-offs are too many. There is no general right answer. Firstly you have to look at your risks and get a sense for what realistically might go wrong and what the implications could be. Then you should work with your team and shape the process together. This ensures everyone feels included and buys in to the process
We’re coming to the end of the presentation now so we want to leave with you with a key thought. ML Governance is about lots of things like best practice and communication and so on but for many organisations the really big thing they need to tackle is risk management.
Here’s a useful picture to keep in mind for risk management. We have to be wary of doing our risk assessments in a superficial way. It’s tempting to focus on specific risks or specific types of risk and then not really look for others. The format of the documentation should help practitioners go through risks in a methodical and balanced way. Otherwise you get bitten.
Let’s make this concrete by looking at a famous case of getting bitten by risks in using ML. There are lots of these but one that illustrates the point well is when AppleCard launched in 2019 and its credit assessments were accused of gender bias. Lots of high profile people were critical including Steve Wozniak and David Heinemeier Hanson. The credit assessment service was operated by Goldman Sachs and they were quick to say that they were not using gender as an attribute. So then there was speculation that maybe gender was entering indirectly through other attributes. This could happen as some occupations have big gender bias. In fact an investigation from New York State Department for Financial Services found no gender bias. The problem was actually that people didn’t understand the logic. There were complaints that female spouses were getting lower limits and this was questioned on the basis of shared assets and income. But credit histories are not shared and that was actually part of the algorithm. Where the New York State Department for Financial Services did criticise Goldman Sachs though was on communication and customer response. Goldman had no way to respond to all these complaints and wasn’t able at the time to explain why the credit scores were coming out the way that they were. You can imagine this might have been overlooked or just not prioritised due to the rush to get the AppleCard service live.