As the AI technology is evolving fast, technology enthusiasts across the world are exploring various uses cases and ways to advance their digital transformation journey. Undoubtedly, we must also focus on its application and long-term impact in the future. Even the business community is fascinated in exploring the infinite possibilities of AI and its use cases. However, an important question to answer is how to ensure the development of a fair, non-biased model, which not only provides an expected result but also addresses the social responsibilities and standards, thereby not harming the users and society. How much effort should we invest to design a human-centric model? Should this factor be a key evaluation metric for approving any AI-enabled solution? Let us discuss.
Need vs AI?
There is a general notion that AI/ML is the only path for successful digital transformation. Often, it becomes mandatory for the Design Architect to always consider AI as a solution for the problem at hand. Applying the ethical factor begins right here. Do we really need an AI solution here? First, we should understand the need of the end user – Be one among them and understand the situation clearly.
- Explore similar use cases & solutions deployed.
- Verify whether standard rule-based- non-AI approach could be a possible alternative.
- Whether bringing AI into the equation will really add value?
- Does the end user understand the benefits of the solution as understood by the Design Architect?
Recently, one of our customers insisted on AI solutions as a part of their digital initiatives. After a detailed and comprehensive discussion about their use cases, we concluded that they were looking for an OCR-based mobile app to reduce manual intervention. To summarize, it is mandatory to include the relevance of AI in your initial project discussion with the customer. Once we finalized the AI solution, the next ethical question will be: –
- How does the solution impact the user and the society?
- Does this solution pose any direct/indirect harm for the users?
- Can the user’s privacy be compromised at any stage of the development pipeline right from data acquisition, storage, process, training, testing, evaluation, and deployment?
- If your final estimate of harm quotient is more, compared to benefits, then is it better to change the strategy.
Providing users a prototype of the model upfront and letting them see the outcomes could definitely help in understanding user perceptions/expectations from the solution. We could use some rapid application development tools to mimic the model interfaces for users to interact with. A small tip: Involve users from diversified backgrounds to get maximum data coverage.
Now that we have validated the need for AI solutions, let’s move to the next stage of our ethical criteria.
AI Being Fair
The common buzzword that we hear while dealing with AI/ML solutions is “Garbage-In, Garbage-Out”. This is one of the most important factors to consider while designing an AI solution. We need to ensure the availability, accessibility, diversity, completeness, relevance, privacy, quality, and quantity of input data for building a robust, dependable, responsible AI model for any use case.
Exploring data from multiple dimensions, coupled with specific domain needs can eventually ensure a well-performing model. There are many other factors that should be adjusted during the process of standard exploratory data analysis. Since we are discussing more about the social aspects of model development, let’s focus on the fairness criteria including bias, privacy, harmfulness, and safety of the model.
How to quantify the fairness of your model is an interesting topic to discuss. What parameters do we use as a metric to compare the fairness of the models developed? The fairness metrics differ based on the type of the problem or use case that we are dealing with. Still, there are a few basic measures we could use to benchmark this exercise:
- First, how does the model perform with respect to its different targeted groups or sections of users. Whether statistical (demographical) parity is maintained across the diversified set of groups being part of the data.
- Is the model capable of providing equal opportunity and accuracy within its active groups?
- It should also be verified how our model is performing being unaware about the group existence.
Now, this clearly shows that the criteria metrics are not mutually exclusive. A balanced performance against the pre-defined criteria is what we could practically expect from a realistic model. And this must come out from your discussion with your team, including the end users. A demonstration with a prototype is always highly recommended with your end users of the model even before starting your model development pipeline.
Data being the fuel for any model, biased data can severely impact the fairness aspect of your model. You cannot envision depending on a model whose decisions are biased towards a certain group or section. The results would be devastating when it comes to critical models based on the specific use cases. The worst part is, sometimes the designer will not be aware that a bias factor really exists in the data.
Another challenge will be that real-world data itself is biased, due to the flawed state of online data or an existing social bias. Biases can even develop when you process your data during exploratory data analysis. Even a model biasedness will be factored in when you evaluate your model against real unknown data/group or when you choose your model deployment parameters and regions. There should be well defined process and steps included as part of your model building exercise where we could identify and address these biases in your model.
Think about a model which generates insight from textual data using English language. Now, if we try to use the same model to generate insights for a different language, using a language translator can result in biased output due to the difference in the targeted language. The translator could induce bias due to errors caused during translation.
A biased model could result in wrong predictions, causing social issues including diversity concerns, opportunity denials, irrelevant promotions, or recommendations, wrong approvals, biased credits etc. and the list goes on based on the use cases which we are dealing with.
Care about Privacy
Another critical factor which decides the fairness of your modelling exercise will be how gracefully data privacy is being handled in the whole process. The feature vectors that can reveal privacy information should be stripped out or encrypted or sanitized before feeding the model. Also, the storage of these private data even before starting the analysis process should ensure confidentiality and security of the data.
The safety and security of the data stored in centralized cloud or on-premises locations should be assured. The data storage, usage, transfer, archive, and purge should be governed by a well-documented process adhering to the data privacy policies relevant for each region.
Harm and Safety
How does a model harm the user? Yes, it could harm with its biased or wrong decisions, revealing private & personal information etc. based on the use case. The impact of the decisions could be destructive based on the use cases you are dealing with. Assess the harm factors and ensure safety system for model users. Like the product FMEA, there should be a Harm Impact analysis chart which should be reviewed, and safety plan identified during the design of the model development pipeline.
There should be a process in place to regularly monitor these charts during the different phases of model development and post-deployment. As the model deals with dynamic and unknown data, once in production, there should be in-built alert mechanisms for corrective actions. The system should be challenged consistently for critical cases based on the severity of the uses cases the model is dealing with. Domain expertise in the field plays a major role in ensuring the completeness of the Harm matrix and its mitigation plans.
Score card for Accountability
Quantifying & communicating Model Ethics, fairness, and safety parameters are the most challenging tasks in the development process. As discussed, it is not something that we accomplish towards the end of the development pipeline. It needs to start right from project initiation process, design, development, evaluation and to the deployment process. Once your model is deployed, it must still be monitored for its performance against the pre-defined score card.
A signed-off model score card should be included as part of the Acceptance criteria that could demonstrate its capability and adherence to various fairness factors to the users. The model score card raises the accountability and responsibility of the model by bringing transparency and clarity on the rules on which the model performs.
What parameters should be part of the Score card?
General Summary, Intended Usage (Scope), Groups (Sections, Regions, diversities), Metrics (Confusion Matrix, Precision, Recall, F1-Score etc.), Training Data, Evaluation Data, Ethical Considerations and Limitations & Recommendations. Moreover, the score card should be explained in a simple and concise manner for the users to decipher and understand the information clearly.
Being responsible towards our society is a quality we expect from everyone. It is governed by our ethical thoughts and actions. Ethics is always difficult to define, but easy to detect. Being socially responsible in all your efforts while deciding, designing & deploying AI solutions should be a mandatory criterion for approving your model.
Hope we could bring some light into this area for all our new AI/ML designers to ponder. Whatever wonderful solution we build, if it lacks fairness towards the social system, then it is bound to fail. So, better invest time and effort in understanding and defining fairness criteria and design the basic framework of your model. All the best!
Source: Inspired from my AI/ML customer experiences and model Use cases from Google, Kaggle & Microsoft AI solutions