Mastering the Mind of Machines - Part 1
The Race to Align Artificial Intelligence with Human Values
The AGI Alignment debate has entered the mainstream.
More and more people realize that developing powerful AI is a devil's bargain adequately described by Roko Mijic:
“AI is the following devil’s bargain:
It will solve every hard sciences problem ever for us – colonizing the universe, immortality, an end to scarcity
BUT
Unless we solve EVERY soft sciences questions (philosophy, politics, game theory, political economics), we get sent to hell.”
Welcome to the era where contemporary technology will force all humans to become applied philosophers.
This is part one of a two-part article, as there is too much to cover in one go.
tl;dr:
We are in a race to build AGI, advancing at incredible speed.
It’s incredibly likely we will have misaligned AI agents.
None of the CEOs of companies that create AI models take the alignment problem seriously.
We have no idea how to keep control over AGI.
Some of the proposed solutions to alignment are probably already in place (But they might not be enough).
More alignment research is urgently needed.
Here is a quick timeline of what happened recently:
Europol published a report highlighting the criminal use of ChatGPT. They claimed that phishing & online fraud can be created faster, much more authentically, & at a significantly increased scale. LLMs can be abused to mislead potential victims into placing their trust in the hands of criminal actors and may facilitate terrorist activities.
Later, Elon musk, Max Tegmark, S. Russle, Andrew Yang and over 1000 others signed an open letter calling for a six month pause on developing training systems exceeding GPT-4 due to “risking loss of control of our civilization”.
Half of society celebrated it, and the other half ridiculed it.
The ridicule comes a bit as a surprise to me, as even OpenAI’s CEO Sam Altman acknowledges the dangers of AI to humanity.
Then, the pope of AGI doomerism, Eliezer Yudkowsky published an article in Time Magazine going a step further, arguing to shut it all down. Also, being the rationalist he is, he points out that a serious ban on AI development entails possible air strikes on GPU data centers in foreign countries. It’s worth quoting a key passage of the article:
“I believe we are past the point of playing political chess about a six-month moratorium. If there was a plan for Earth to survive, if only we passed a six-month moratorium, I would back that plan. There isn’t any such plan.
“Here’s what would actually need to be done:
“The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the US, then China needs to see that the US is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the US and in China and on Earth. If I had infinite freedom to write laws, I might carve out a single exception for AIs being trained solely to solve problems in biology and biotechnology, not trained on text from the internet, and not to the level where they start talking or planning; but if that was remotely complicating the issue I would immediately jettison that proposal and say to just shut it all down.
"Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
“Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
“That’s the kind of policy change that would cause my partner and I to hold each other, and say to each other that a miracle happened, and now there’s a chance that maybe Nina will live. The sane people hearing about this for the first time and sensibly saying ‘maybe we should not’ deserve to hear, honestly, what it would take to have that happen. And when your policy ask is that large, the only way it goes through is if policymakers realize that if they conduct business as usual, and do what’s politically easy, that means their own kids are going to die too.
“Shut it all down.
“We are not ready. We are not on track to be significantly readier in the foreseeable future. If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.
“Shut it down.”
Many criticized Yudkowsky for arguing for violence, assuming a higher moral position, while failing to see that in the real world, there are no simple solutions, only tradeoffs. If you believe that the current path of developing AGI leads to the certain death of all humans, bombing data centers is a logical conclusion.
Since then, the US and other governments began studying possible rules to regulate AI like ChatGPT.
The discourse stalled out a bit between AI Doomers who think, more or less, like Yudkowski and AI Optimists who think the risks are overblown.
Outline
AGI alignment is a broad topic with many nuances. This article attempts to be a primer by deconstructing the issue and shining light on some aspects that are currently underrepresented in the discourse.
I’ll cover the following topics:
Part 1
What is AGI?
How will AGI become so powerful that it will be a threat?
What does it mean for an AI to be aligned with humanity?
Why is the topic of AGI alignment important?
Surprise! We already have AGI.
Part 2
How will we build AGI? Will it be soon?
Why should AI become misaligned?
How might we loose control of AGI?
How might we solve AGI misalignment?
Conclusion: What should we do?
First, let’s be clear what we are talking about.
What is AGI?
AGI stands for Artificial General Intelligence, which refers to a type of artificial intelligence that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks and domains, much like a human.
I like Ian Hogarth’s concise definition of AGI in his article “We Must Slow Down the Race to God-Like AI”
“A super-intelligent computer that learns and develops autonomously, that understands its environment without the need for supervision and that can transform the world around it.”
Developing AGI is the most crucial technological breakthrough, as it should be able to improve itself, or the next generation of AGIs which could lead to an exponentially accelerating intelligence explosion.
How will AGI become so powerful that it will be threat?
Since AGI possesses a human-like understanding and learning capacity, it could potentially analyze its own architecture and algorithms to make improvements to itself. This process of self-improvement would lead to a positive feedback loop, where the AGI system becomes more intelligent with each iteration, ultimately surpassing human-level intelligence.
Accelerated technological growth
If AGIs themselves will do the vast majority of research in AI and related fields and improve their own models and training recursively, will the rate of intelligence improvement speed up or will it be a slow rate of progress?
Yudkowsky argues it will lead to exponential growth:
“The history of hominid evolution to date shows that it has not required exponentially greater amounts of evolutionary optimization to produce substantial real-world gains in cognitive performance - it did not require ten times the evolutionary interval to go from Homo erectus to Homo sapiens as from Australopithecus to Homo erectus. All compound interest returned on discoveries such as the invention of agriculture, or the invention of science, or the invention of computers, has occurred without any ability of humans to reinvest technological dividends to increase their brain sizes, speed up their neurons, or improve the low-level algorithms used by their neural circuitry. Since an AI can reinvest the fruits of its intelligence in larger brains, faster processing speeds, and improved low-level algorithms, we should expect an AI’s growth curves to be sharply above human growth curves.”
This exponential acceleration of technological advancements and innovations through recursive improvements of AGI systems is often referred to as The Singularity. It is called that because, at one point, the rate of technological progress becomes so rapid that it becomes impossible for humans to predict or comprehend the future trajectory of AI or its impact on society.
Can we just not give the AI the objective to improve itself? Yes, but humans have an economic incentive to build AI that is more and more powerful, so it will happen.
What does it mean to be aligned with humanity?
The challenge of aligning artificial intelligence with human values looms large, as we strive to create systems that understand and prioritize our complex and nuanced beliefs. In a world where AI has become increasingly powerful, the consequences of this misalignment could be dire.
“No one knows how to describe human values. When we write laws we do our best, but in the end the only reason they work at all is because their meaning is interpreted by other humans with 99.9% identical genes implementing the same basic emotions and cognitive architecture.” – Arram Sabeti
The core issue lies in our inability to communicate our values effectively to AI systems, which can be thought of as vastly more intelligent alien minds. We currently lack both the engineering tools and the theoretical framework to guarantee that these systems inherently understand and prioritize human values.
Only by achieving the delicate balance between AI power and alignment to human values can we truly harness the potential of artificial intelligence, while safeguarding the core values that make us human. But we don’t know how to do this, not even in theory.
Why is AGI alignment important?
Super-intelligent AI systems pose catastrophic risks. AIs could become autonomous agents with large-scale goals that are misaligned with our own, leading to them gaining control over humanity’s future.
Misaligned AGI poses an immense threat not only to us but to all life in the universe as they might self-replicate and spread around the galaxy. Self-replicating AI has the potential to transform all matter in the universe to maximize for whatever its optimization function is.
Spreading around the galaxy is simpler than it sounds if you consider that time is not a huge issue for artificial lifeforms.
The realization that the future of life in the universe is being shaped by a relatively small number of human AI developers, with little oversight or public input, is quite fascinating.
Current alignment efforts consist of fine-tuning, reinforcement learning with human feedback and pre-prompting the chatbot to show a desired behavior.
These constraints are easy to overcome through clever prompt engineering, called jailbreaks. Such jailbreaks can result in the production of harmful content, including misinformation, offensive language, or the promotion of negative stereotypes.
Without radically different alignment methods, jailbreaks will probably always exist because of the so-called Waluigi effect.
“The Waluigi effect posits that after you train an LLM to satisfy a desirable property P, then it’s easier to elicit the chatbot into satisfying the exact opposite of property P.”
This is because rules normally exist in the contexts in which they are broken. It doesn’t matter how much effort you take to optimize a chatbot to behave. It just needs a simple prompt attack to create the exact antipode of the desired behavior.
We must recognize that our conventional problem-solving approach may not be sufficient. Usually we solve problems by muddling through an issue, trying a few things, making mistakes and correcting course based on what happens. This will not work here as we only have one opportunity to get it right. Failing to address this issue correctly on the first try could result in dire consequences for humanity.
It is crucial to address AI-related concerns before our politics, economy, and daily lives become reliant on AI technologies. If we wait for chaos to unfold, it will be too late to rectify the situation.
Surprise! Misaligned AGI is already here
If we define AGI as a super-intelligence that learns and develops autonomously, that understands its environment without supervision and that transforms the world, we could argue that it is already here.
We can view the global capitalist system as a type of autopoietic superintelligence, with the objective function of converting the world’s resources, human labor, and creativity into capital. This system perpetuates a narrow set of values focused on financialisation, while often disregarding broader metrics essential for long-term well-being, such as biodiversity and ecological boundaries. The consequences of this misaligned superintelligence are widespread, resulting in issues like climate change, species extinction, and growing socio-economic disparities.
The concept of Moloch, as described by Scott Alexander, represents the emergent properties and dynamics arising from this system’s incentives and coordination. The concept is named after a demon that demands child sacrifices, which then creates a vicious cycle of more sacrifices.
“Moloch is the personification of the forces that coerce competing individuals to take actions which, although locally optimal, ultimately lead to situations where everyone is worse off. Moreover, no individual is able to unilaterally break out of the dynamic. The situation is a bad Nash equilibrium. A trap.
One example of a Molochian dynamic is a Red Queen race between scientists who must continually spend more time writing grant applications just to keep up with their peers doing the same. Through unavoidable competition, they have all lost time while not ending up with any more grant money. And any scientist who unilaterally tried to not engage in the competition would soon be replaced by one who still does. If they all promised to cap their grant writing time, everyone would face an incentive to defect.”
Moloch drives many global issues, including environmental degradation, arms races, and political polarization.
Within this overarching Molochian system, public corporations exemplify the complexity and lack of accountability. No single individual or group is truly in charge; rather, a web of interdependencies exists between executives, boards, shareholders, and other stakeholders. This network results in a relentless pursuit of profit maximization, with corporations effectively becoming “obligate sociopaths.
The misalignment of our global superintelligence raises concerns about the alignment of emerging AI technologies which will probably reflect the values of the system that gives birth to it. Many worry that AI could be misaligned with human values, but it is crucial to recognize that our current socio-economic system is itself a misaligned superintelligence. The challenge, then, is not only to ensure the alignment of AI systems but also to reorient the broader superintelligent system that governs our world.
Next time: Getting in the weeds of AGI misalignment
This article provided the groundwork for understanding the problem of AGI misalignment. In part two, I will answer the following questions:
How will we build AGI? Will it be soon?
Why should AI become misaligned?
How might we loose control of AGI?
How might we solve AGI misalignment?
Conclusion: What should we do?
Mastering the Mind of Machines - Part 2
In my last article I answered following questions: What is AGI? How will AGI become so powerful that it will be a threat? What does it mean for an AI to be aligned with humanity? Why is the topic of AGI alignment important? Surprise! We already have AGI.