Sunday, January 21, 2024

Questioning AI Alignment

I strongly suspect that AI alignment can’t mean what a lot of us think it means, because the popular framing of the problem includes category errors. For example, here is the first sentence of OpenAI’s blog on their approach:

Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent.

This evokes, to me, the controversial idea of evolutionary group selection: that our evolutionary impulses direct us to improve species outcomes, rather than our own. Generally speaking, forces that act on individuals are much easier to characterize and have much greater predictive power than forces that act on groups. When one encounters a framing like this, where whole sets of individuals are treated as a category, think explicitly about the elements of that set, and see what happens. Let’s look again at that opening sentence, but this time highlighting cases where sets are treated as indivisible:

Our [1] alignment research aims to make artificial general intelligence (AGI) [2] aligned with human [3] values and follow human [3] intent.

1: “Our” alignment research. In this case, “our” presumably means the portion of OpenAI’s research division performing this class of research. In this case, using a set instead of individuals is very likely justified, as a blog like this is an official publication of the company, and though there are occasional exceptions, the individuals working for the company are expected to align with its stated goals.

2: “AGI”. This is implicit, but of course OpenAI can only be talking about AGI systems which OpenAI can influence. I’ll touch briefly on this, below.

3: “Human” values and “human” intents. This is the big one. For example, I have to wonder, is there such a thing as human universal “values” and “intents”? Maybe, but if you were to think of the values you hold closest to your heart, closest to your sense of personal integrity, you can probably also think of humans that don’t share those values. The question absolutely needs to be asked: which humans’ values and intents are being considered? What coherent subset of humanity can we plug in, here, that would allow readers to be comfortable with the result?

There’s an idea I first encountered in George Orwell’s 1945 essay, You and the Atom Bomb:

It is a commonplace that the history of civilisation is largely the history of weapons. … [T]hough I have no doubt exceptions can be brought forward, I think the following rule would be found generally true: that ages in which the dominant weapon is expensive or difficult to make will tend to be ages of despotism, whereas when the dominant weapon is cheap and simple, the common people have a chance.

So, perhaps the development of the atom bomb didn’t augur a new age of despotism, I don’t want to argue the point. It’s certainly the case that high capital requirements and difficult construction constitute serious barriers to entry, and thus tend to benefit incumbents and discourage disruptors. AI research has had tremendous disruption, largely due to its recent low barrier to entry. But now, to reach the threshold of AGI, it has cost hundreds of millions of dollars to train state-of-the-art AI models, a cost which is out of reach of all but a handful of entities.

Are these, then, tools to concentrate power? Is there a research program to ensure that Mark Zuckerberg will act in alignment with human interests and values? That anyone would? On the other hand, these costs decline, probably according to Wright’s Law; and perhaps more importantly, the cost to run these models once they have been trained is orders of magnitude less than the cost to train them in the first place. Do the common people have a chance?

I’ll pull another quote from Orwell’s essay:

Some months ago, when the bomb was still only a rumour, there was a widespread belief that splitting the atom was merely a problem for the physicists, and that when they had solved it a new and devastating weapon would be within reach of almost everybody. (At any moment, so the rumour went, some lonely lunatic in a laboratory might blow civilisation to smithereens, as easily as touching off a firework.)

If AGI can be as powerful as it appears to be, would we want to live in a world where it’s controlled by a few, who put limits on how others are allowed to access the fruits of these inventions? Or is it better to live in a world where excessive intellectual power is available to essentially anyone? If AGI remains extremely expensive to build and operate, then we’ll probably end up in the former. If we make it cheap, over time, to develop and operate AGIs, then the consequences could be wondrous and catastrophic. (There’s more to say, here, but I’m trying to keep this post focused.)

Coming back to a point from before: OpenAI’s research program will only affect those AI models that OpenAI can itself influence. Their famous call for government regulation of their own business can be seen as an attempt to extend their influence over other AI developers. Which could have the consequence that others are inhibited from doing AI research, furthering OpenAI’s position in the market.

I am reasonably confident that the AI genie is out of the bottle, and that AGI is somewhere between “already here” and “inevitable in the near term”. Those who are explicitly researching AI alignment presumably have or are working towards their preferred operational definitions. The rest of us should be critical and skeptical of definitions coming from stakeholding actors.