Open question: how hard is the alignment problem?

Free illustrations of Watercolor

The path to the future that seems worst is Misaligned AI, in which AI systems end up with non-human-compatible objectives of their own and seek to fill the galaxy according to those objectives. How seriously should we take this risk – how hard will it be to avoid this outcome? How hard will it be to solve the “alignment problem,” which essentially means having the technical ability to build systems that won’t do this?

  • Some people believe that the alignment problem will be formidable; that our only hope of solving it comes in a world where we have enormous amounts of time and aren’t in a race to deploy advanced AI; and that avoiding the “Misaligned AI” outcome should be by far the dominant consideration for the most important century. These people tend to heavily favor the “caution” interventions described above: they believe that rushing toward AI development raises our already-substantial risk of the worst possible outcome.
  • Some people believe it will be easy, and/or that the whole idea of “misaligned AI” is misguided, silly, or even incoherent – planning for an overly specific future event. These people often are more interested in the “competition” interventions described above: they believe that advanced AI will probably be used effectively by whatever country (or in some cases smaller coalition or company) develops it first, and so the question is who will develop it first.
  • And many people are somewhere in between.

The spread here is extreme. For example, see these results from an informal “two-question survey [sent] to ~117 people working on long-term AI risk, asking about the level of existential risk from ‘humanity not doing enough technical AI safety research’ and from ‘AI systems not doing/optimizing what the people deploying them wanted/intended.'” (As the scatterplot shows, people gave similar answers to the two questions.)

We have respondents who think there’s a <5% chance that alignment issues will drastically reduce the goodness of the future; respondents who think there’s a >95% chance; and just about everything in between. My sense is that this is a fair representation of the situation: even among the few people who have spent the most time thinking about these matters, there is practically no consensus or convergence on how hard the alignment problem will be.

I hope that over time, the field of people doing research on AI alignment will grow, and as both AI and AI alignment research advance, we will gain clarity on the difficulty of the AI alignment problem. This, in turn, could give more clarity on prioritizing “caution” vs. “competition.”

Other open questions

Even if we had clarity on the difficulty of the alignment problem, a lot of thorny questions would remain.

Should we be expecting transformative AI within the next 10-20 years, or much later? Will the leading AI systems go from very limited to very capable quickly (“hard takeoff”) or gradually (“slow takeoff”)? Should we hope that government projects play a major role in AI development, or that transformative AI primarily emerges from the private sector? Are some governments more likely than others to work toward transformative AI being used carefully, inclusively and humanely? What should we hope a government (or company) literally does if it gains the ability to dramatically accelerate scientific and technological advancement via AI?

With these questions and others in mind, it’s often very hard to look at some action – like starting a new AI lab, advocating for more caution and safeguards in today’s AI development, etc. – and say whether it raises the likelihood of good long-run outcomes.

Robustly helpful actions

Despite this state of uncertainty, here are a few things that do seem clearly valuable to do today:

Technical research on the alignment problem. Some researchers work on building AI systems that can get “better results” (winning more board games, classifying more images correctly, etc.) But a smaller set of researchers works on things like:

This sort of work could both reduce the risk of the Misaligned AI outcome – and/or lead to more clarity on just how big a threat it is. Some takes place in academia, some at AI labs, and some at specialized organizations.

Pursuit of strategic clarity: doing research that could address other crucial questions (such as those listed above), to help clarify what sorts of immediate actions seem most useful.

Helping governments and societies become, well, nicer. Helping Country X get ahead of others on AI development could make things better or worse, for reasons given above. But it seems robustly good to work toward a Country X with better, more inclusive values, and a government whose key decision-makers are more likely to make thoughtful, good-values-driven decisions.

Spreading ideas and building communities. Today, it seems to me that the world is extremely short on people who share certain basic expectations and concerns, such as:

  • Believing that AI research could lead to rapid, radical changes of the extreme kind laid out here (well beyond things like e.g. increasing unemployment).
  • Believing that the alignment problem (discussed above) is at least plausibly a real concern, and taking the “caution” frame seriously.
  • Looking at the whole situation through a lens of “Let’s get the best outcome possible for the whole world over the long future,” as opposed to more common lenses such as “Let’s try to make money” or “Let’s try to ensure that my home country leads the world in AI research.”

I think it’s very valuable for there to be more people with this basic lens, particularly working for AI labs and governments. If and when we have more strategic clarity about what actions could maximize the odds of the “most important century” going well, I expect such people to be relatively well-positioned to be helpful.

A number of organizations and people have worked to expose people to the lens above, and help them meet others who share it. I think a good amount of progress (in terms of growing communities) has come from this.

Donating? One can donate today to places like this. But I need to admit that very broadly speaking, there’s no easy translation right now between “money” and “improving the odds that the most important century goes well.” It’s not the case that if one simply sent, say, $1 trillion to the right place, we could all breathe easy about challenges like the alignment problem and risks of digital dystopias.

It seems to me that we – as a species – are currently terribly short on people who are paying any attention to the most important challenges ahead of us, and haven’t done the work to have good strategic clarity about what tangible actions to take. We can’t solve this problem by throwing money at it. First, we need to take it more seriously and understand it better.

Like Our Story ? Donate to Support Us, Click Here

You want to share a story with us? Do you want to advertise with us? Do you need publicity/live coverage for product, service, or event? Contact us on WhatsApp +16477721660 or email Adebaconnector@gmail.com

1 thought on “Open question: how hard is the alignment problem?

Leave a Reply

Your email address will not be published. Required fields are marked *