Week 4 (Feb 23/25)

Lead: Team 3

Required Readings (for everyone): The remaining two chapters in Ruha Benjamin’s book, and two blog posts on explaining ML models:

Race After Technology, Chapter 4 (Technological Benevolence) and Chapter 5 (Retooling Solidarity, Reimagining Justice)
Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin. Local Interpretable Model-Agnostic Explanations (LIME): An Introduction. O’Reilly, 12 August 2016.
Wojtek Kretowicz, Be careful! Some model explanations can be fooled. Medium Blog Post, 8 Jan 2021.

Optional additional readings: (these papers provide more technical depth that is the basis of the blog posts above; everyone is encouraged to at least read the abstracts and conclusions of the papers)

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. KDD 2016.
Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. AAAI/ACM Conference on AI, Ethics, and Society (AIES ’20), February 7–8, 2020.

Response assignments for Team 1 and Team 4: By 10:59pm on Sunday, February 21, post a response to the readings that does at least one of these options:

Connect the RAT reading with the posts on explanations to discuss when (if ever) explanations provided for classifiers can be used to allow classifiers to be used to make decisions where we would not allow it without the explanations.
In the readings this week, we are looking into a model LIME that was created to help create transparency and a interpretable model. What other things did LIME fail to consider? What made lime transparent to non-developers? Why should or shouldn’t these models be used? What remains opaque?
What do you think about Benjamin’s writing, “racial fixes are better understood not as viruses but as a part of the underlying code of operating systems”?
Benjamin writes about the Appolition app (which is now shutdown), and the Promise Pay app (in which Jay-Z is an investor) (which appears to be thriving), and is quite critical of Promise Pay. Is Benjamin’s comparison of these two apps fair? How should we consider the different in their success?
The Slack et al. paper shows how these explanatory models can also be fooled. How can we harness these adversarial models to better reduce inequalities? Investigate the related work section and the models the authors mention. Why should or shouldn’t we use these models?
Respond to something in one of the readings that you found interesting or surprising.
Identify something in one of the readings that you disagree with, and explain why.
Respond constructively to something someone else posted.

Response assignments for Team 2 and Team 5: By 5:59pm on Monday, 22 February, post a response to the readings that either (1) responds constructively to one of the initial postings, or (2) does any of the options above, but without duplicating points that were already made.

Class Meetings

Lead by Team 3

Slides for Week 4 [PDF]

Blog Summary

Team 1

23 February

Tuesday’s class focused on the concept of technology “benevolence” — a double edged sword due to the difference between intentions and technological exploits. There was a focus on how technology dehumanizes and how institutions use buzzwords to create an open and inclusive imagery while still participating in discriminatory practices. With this introduction, we entered a series of discussions revolving around several case studies of such technological pitfalls:

GPS Surveillance

The first discussion (24 hour surveillance with ankle monitors: dehumanization or supervision?) centered on ankle monitors used for 24 hour remote surveillance to monitor offenders. This allows for the freeing up of beds in jails and is used for those who cannot afford bail. Inherently, the use of these monitors results in privilege of the rich while targeting those in lower income communities. This discussion started with agreement that using ankle monitors during pre-trial is a better alternative than the alternative of keeping people in jail. We also came to agreement that the technology used isn’t really the issue, but the cash bail system as a whole. We then pivoted to a related discussion on COVID tracking. We found that COVID tracking by location was similar in its dehumanization of participants, but it was fairer in how it was applied equally across classes and races.

Automated Resume Scanner

Next, we moved into a discussion about automated resume scanners. In today’s age, many companies are using AI to parse resume qualifications due to the large number of applicants and ease of checking for qualifications. However, issues have arisen as a result. For example, Amazon (Oct. 2018) was found to have discriminated against women in the hiring process – terms such as “Women’s Chess Club” or “Women’s Cybersecurity” were not being taken into consideration when the algorithm was told to ignore biases on the account of gender. Discussion focused on whether students would use the resume scanner to their advantage and whether doing so was ethical. Similar points were made regarding the current practice of buzzword formatting to resumes with respect to skills such as machine learning and programming languages (C++, Python, Java, etc.) that recruiters look for. Therefore, in the competitive job market, there is a need to use appropriate action words and terminology to catch the eyes of the recruiter or the resume parser. However, the usage of “invisible ink” (hidden words) in a resume is deemed unethical and unprofessional since it’s more apparent as an intention of lying to the machine and employer.

Predicting Race

This was followed with another discussion: Collection of racial-ethnic data: diversity or technological redlining? We focused on the company Diversity Inc. that was brought up in Benjamin’s writing. Diversity Inc. creates racial-ethnic data that can be sold to others; the usage of such a service results in effective discrimination and group inclusivity. Provided a name, the company can produce a racial-ethnic profile for customers. Since many companies are not allowed to collect ethno-racial data directly, this service is a game changer for the Diversity Inc’s customers.

In discussion, people debated whether race was more private compared to other personal data that is normally collected. Most people agreed that race isn’t something that they hold private because it is clearly displayed in public. However, predicting people’s race, especially when they don’t want to disclose it (such as in job hiring or housing), does cause issues. We also discussed that predicting people’s race to tailor personalized results or advertising isn’t creating as personal of an experience that these companies believe. Large differences exist within every race to the point where we believe that targeting people based on their race is a pointless effort.

Predictive Health Monitoring

The final discussion (Predictive health monitoring: cultural segregation or inclusion?) was about the use of zip codes to predict health outcomes. There are many ways to get around HIPAA and one of them is the use of zip codes to determine which neighborhoods would cost the most in healthcare. Diversity Inc.’s triangulation of information via name and zip code can similarly be used to identify consumer habits, an extremely profitable service; however, this leads to cultural segregation. In the discussion of this topic, many groups talked about various applications of predictive health monitoring and how it impacts marginalized areas of society. One application of this technology was risk assessment for health insurance. While many insurance companies argue that the “objective” nature of their risk evaluations foster diversity and inclusion, they fail to realize that, even though race is not explicitly considered as a factor, there are still many other factors such as zip code that can be used to predict one’s ethnicity (and develop biases as a result). Therefore, while the intent behind many of these predictive algorithms may be sincere, it is vital to consider all of the possible ways that discrimination can arise and take the appropriate measures to mitigate them.

Machine Learning

The second part of Tuesday’s class focused on a brief overview of machine learning to better understand the optional readings for the week. Basic understanding of deep learning — supervised vs. unsupervised learning, general neural network models (CNN), limitations in accuracy (high variance via overfitting and bias from bad data resulting in underfitting) — was required to better dig into the concept of LIME (Local Interpretable Model-Agnostic Explanations), an interpretable model to explain predictions of classifiers. The major takeaway from the machine learning mini lecture was that accuracy is not a metric of whether a model was good or not, a concept referred to as the accuracy paradox. We ended the class with a discussion of the what issues that can be anticipated with the use of an explanatory model (with reference to LIME) to explain a black box model.

Students brought up many concerns with this method:

Isolation of problem but little traceback to solution discovery
More of a “black box” due to little explanation of what happens “under the hood”
Promotion of over-reliance on technology
No guarantee of accuracy and/or precision (may not work with more complicated models)

25 FEbruary

Thursday’s class opened with a small group discussion on the brainstorm prompt posed at the end of Tuesday’s class: What does design justice look like and how can it be achieved?

Notably, it was emphasized that at the bare minimum, more comprehensive testing needs to be implemented. Group members asserted that technology’s performance should be assessed across marginalized and non-marginalized groups. Then, disparities in performance should be corrected and/or acknowledged. Another point emphasized that training data should be more critically assessed. To have design justice, good training data must be used. Additionally, class members noted that disability should be centered in design to ensure equitable access. One member stated that true design justice is achieved by centering marginalized groups from the beginning of the design process. Finally, one member referred the class to a resource found at the back of Benjamin’s book (https://designjustice.org.

Promise

The next discussion centered on the Promise App and pre-trial bail. The class member presenting touched on how inundated Virginia’s jails are with citizens awaiting trial. She noted the way that bail is determined in a closed-door setting by a magistrate, with no lawyers present. This immense pressure often causes the detained to plead guilty. She then presented the concept of the Promise app, which provides a variety of services including: helping criminalized people pay bail, tracking individuals on the app, helping schedule plans/appointments, and tracking if people have kept these appointments. She noted that donations are recycled when bail is returned. Additionally, she described how external government agencies have access to an individual’s user data.

The discussion questions posed to the class were:

What are your general thoughts and opinions of the Promise app?
Is it morally just to have this information available to government agencies?
How do you think that his app interacts with Benjamin’s quote: “Calls for abolition are never simply about bringing harmful systems to an end but also about envisioning new ones”?

In response to these questions, class members brought up how more data from apps such as Promise enable tech companies to better surveil people. The Promise app perpetuates the existing issues in society and the criminal justice system, and members of the class agreed that there are privacy issues associated with it. Where the data goes is important and matters; transparency regarding this should be greater.

Despite these criticisms, people did bring up some benefits of the Promise app such as it being an attempted fix that is easier than a traditional ankle monitor. It is a step in the right direction even if it will not be the final solution. Change will happen slowly, and it is unlikely that we are able to fix systemic issues overnight. Another important consideration that a person brought up was how do you make software well, with a good business model, if it is not going to make any profit? This addresses the issue of different companies only being able to do so much with what they have. Change will take more than just a few tech companies trying to solve these issues, it might even require a societal shift.

Additionally, another follow-up question was posed: Would you work for Promise over a “neutral” tech company? A strong majority (over 80%) of the class indicated that they would work for Promise over a “neutral tech company.” One classmate discussed how she would rather feel like she was helping people and contributing to a social mission such as Promise’s than none at all. For this reason, she would choose to work for Promise over a “neutral” tech company. Professor Evans introduced an interesting perspective by posing this idea that businesses who claim to have a social mission should be held to a higher standard. He also explained how he did not think Promise was represented very fairly by Benjamin in the book (in particular, calling it Jay-Z’s app, rather than mentioning the actual founders and CEO of Promise, Phaedra Ellis-Lamkins).

Selling Empathy

Moving on from this discussion of Promise, we then began discussing Benjamin’s concept of “selling empathy.” The following discussion questions were posed:

Can seeing through another person’s eyes actually help us understand them?
Could VR (or another technology) be a viable tool for this?
Is empathy necessary to bring about change and progress? Is it enough?
Benjamin provides Facebook’s VR app as an example of “selling empathy”. What are some other instances of where trying to build empathy can go wrong?

We discussed how VR is not a substitute for empathy, and creating empathy is so much more complex than simply showing someone another person’s experience. This technology can take advantage of victims and burden those affected to have to try and help develop empathy in others. As with everything, the VR experience will never be perfect, and it will always be biased. What is lost and what is included are significant in what is being provided to users. In answering these questions, a debate between sympathy and empathy arose, and one classmate suggested that these companies using VR are failing at creating empathy by only really creating a face value sympathetic understanding at best.

Explanations under Attack

After the discussion on empathy, the class period was concluded with this final discussion of adversarial models. If a model or algorithm appears to output biased results, does knowing its inputs make a difference? In other words, does it matter what its explanation is? What are some of the ethical benefits and risks associated with researching adversarial models?

People described being in favor of transparency and expressed that being open to talk about problems is important. Even if being open about the problem does not fix it, it is still a step closer to solving it. You cannot always fix a problem if you do not know how it is being created. One classmate described how adversarial models are essentially a form of defensive cybersecurity; once you know how you can be attacked, you can be wary of your vulnerabilities. It is also important to be aware of oversights and blind spots by asking questions about why results are biased.