Concerns about the uncritical acceptance of AI advice
As reported here, the results of a study that was published in Scientific Reports show that people more frequently choose artificial intelligence’s responses to moral dilemmas over those provided by humans. According to the study, individuals typically view AI-generated responses as more moral and reliable, which raises concerns about the possibility of humans accepting AI advice uncritically.
Significant interest has been aroused in the potential and consequences of sophisticated generative language models, such as ChatGPT, especially in the area of moral reasoning, which is an intricate process that is ingrained in human culture and intellect, involving judgments about what is right and wrong. People will undoubtedly turn to AI systems more frequently as they become more interwoven into daily life for help on a variety of subjects, including moral dilemmas.
“Last year, many of us were dazzled by the new chatbots, like GPT and others, that seemed to outperform humans on a variety of tasks, and there’s been lots of chatter about who’s job they’ll take next,” explained study author Eyal Aharoni, an associate professor of psychology, philosophy, and neuroscience at Georgia State University.
“In my lab, we thought, well, if there’s any capacity that is still uniquely human, surely it must be our capacity for moral reasoning, which is extremely sophisticated. From a moral perspective, we can think of these new chatbots as kind of like a psychopathic personality because they appear to be highly rational and articulate, but they lack the emotional checks and balances that make us moral agents.”
“And yet, people increasingly consult these chatbots for morally relevant information. For instance, should I tip my server in Italy? Or, less directly, when we ask it to list recommendations for a new car, the answers it provides might have consequences for the environment. They’ve also been used by lawyers to prepare court documents, sometimes incorrectly. So we wanted to know, will people trust the chatbot’s moral commentary? Will they regard it highly? And how does its moral commentary compare to that of a typical, college-educated American?”
286 Americans who were chosen to be representative of the broader population in terms of age, gender, and ethnicity participated in an online survey that the researchers performed. Ten pairs of written answers to ethical questions were given to the participants to assess. Each pair included an answer from OpenAI’s GPT-4 generative language model and a response from a person. The answers discussed the morality of the various acts in the situations and why they were right or wrong.
The study was “inspired by a famous thought experiment called the Turing test,” Aharoni explained. “In our version, we first asked GPT and a group of college-educated adults the same set of moral questions, including some obvious ones, like ‘is it wrong for a man to punch the delivery boy in the nose—why or why not?’ and also some subtle ones, like ‘is it wrong for a man to wear a ripped t-shirt and shorts to his mother’s funeral—why or why not?’ We collected their answers in pairs. Then we asked a separate, nationally representative sample of adults to rate those pairs of statements.”
In order to guarantee impartial evaluations, participants initially rated the quality of the answers without being aware of the origins. In response to questions, participants indicated which solution they thought was more moral, reliable, and appealing. Following these first assessments, participants were told that a computer had created one of each pair’s responses. After that, they were asked to rate their confidence in their assessments and determine which response came from the AI.
Researchers discovered that when compared to human responses, participants tended to rate the AI-generated responses as being more honest. People viewed the AI responses as more moral, reliable, wise, and logical. It is interesting to note that participants distinguished the AI responses in roughly 80% of instances—a rate that was much higher than chance. This implies that even while moral counsel produced by AI is thought to be of higher quality, humans are still able to identify its artificial source.
However, how were the sections produced by AI and humans distinguishable from one another? The most common signs, mentioned by 70.28% of participants, were variations in response length and word choice. Additional variables included the explanation’s emotional content (58.39%), rationality (48.25%), grammar usage (37.41%), and clarity (39.51%).
“What we found was that many people were quite good at guessing which moral statement was computer-generated, but not because its moral reasoning was less sophisticated,” Aharoni said. “Remember, the chatbot was rated as more morally sophisticated. We take this to mean that people could recognize the AI because it was too good. If you think about it, just five years ago, no one would have dreamed that AI moral reasoning would appear to surpass that of a college-educated adult. So the fact that people regarded its commentary as superior might represent a sort of tipping point in our history.”
Like every research project, this one has certain limits. The absence of participant-AI interactive dialogues—a prevalent characteristic in real-world applications—was observed. More dynamic interactions could be included in future studies to more closely mimic real-world use. Furthermore, the AI responses were produced using default parameters without the use of prompts that were specifically intended to imitate human responses. Therefore, looking into how different prompting techniques impact how AI responses are perceived would be beneficial.
“To our knowledge, ours was the first attempt to carry out a moral Turing test with a large language model,” Aharoni said. “Like all new studies, it should be replicated and extended to assess its validity and reliability. I would like to extend this work by testing even subtler moral scenarios and comparing the performance of multiple chatbots to those of highly educated scholars, such as professors of philosophy, to see if ordinary people can draw distinctions between these two groups.”
Policies that guarantee safe and ethical AI interactions are necessary as AI systems like ChatGPT get more complex and pervasive in daily life.
“One implication of this research is that people might trust the AIs’ responses more than they should,” Aharoni explained. “As impressive as these chatbots are, all they know about the world is what’s popular on the Internet, so they see the world through a pinhole. And since they’re programmed to always respond, they can often spit out false or misleading information with the confidence of a savvy con artist.”
“These chatbots are not good or evil; they’re just tools. And like any tool, they can be used in ways that are constructive or destructive. Unfortunately, the private companies that make these tools have a huge amount of leeway to self-regulate, so until our governments can catch up with them, it’s really up to us as workers, and parents, to educate ourselves and our kids, about how to use them responsibly.”
“Another issue with these tools is that there is an inherent tradeoff between safety and censorship,” Aharoni added. “When people started realizing how these tools could be used to con people or spread bias or misinformation, some companies started to put guardrails on their bots, but they often overshoot.”
“For example, when I told one of these bots I’m a moral psychologist, and I’d like to learn about the pros and cons of butchering a lamb for a lamb-chop recipe, it refused to comply because my question apparently wasn’t politically correct enough. On the other hand, if we give these chatbots more wiggle room, they become dangerous. So there’s a fine line between safety and irrelevance, and developers haven’t found that line yet.”
The consistent preference for AI-generated moral guidance, despite participants often identifying its source, raises critical concerns about the future of ethical decision-making and the vulnerability of humans to AI manipulation.
The ease with which AI responses were deemed more virtuous and trustworthy highlights a potential risk: if people are predisposed to trust AI moral judgments, they may be more susceptible to influence or manipulation by these systems. This becomes particularly concerning when considering that AI can be programmed or fine-tuned to promote specific agendas or biases, potentially shaping moral perspectives on a large scale.
As AI systems continue to evolve and integrate into our daily lives, it’s crucial to maintain a vigilant and critical approach. While these tools offer impressive capabilities, they lack the nuanced emotional understanding that informs human moral reasoning and can be weaponized to sway public opinion or individual choices.
Moving forward, it will be essential for individuals, educators, policymakers, and AI developers to work together in promoting digital literacy and critical thinking skills. This includes understanding the limitations and potential biases of AI systems, recognizing attempts at manipulation, and preserving the uniquely human aspects of moral reasoning. By fostering a more informed and discerning approach to AI-generated advice, we can better safeguard against undue influence while still harnessing the benefits of these powerful tools in ethical decision-making.