AI Safety Scorecard

We want AI companies to be rewarded for taking risks seriously and implementing safety measures. We also want to criticize those who do not. Creating a report card / score card could be a great way of achieving this.

Strategy

- Publish a report. Make it a newsworthy thing. Make sure @Comms Team is involved.
- Maybe do it regularly (e.g. before each summit, every 6 months)
- If there are other orgs doing similar things, make sure our tone stands out. We can probably be more critical than an OpenPhil funded project!

Other initiatives

- @Tyler worked on a spreadsheet comparing RSPs: https://docs.google.com/spreadsheets/d/16QRpgMS5qG1pZmXC1MennS6WLwkAtkXjunVBs16QoRQ/edit#gid=0
- FLI is working on something similar

How to help

- Research RSPs (responsible scaling policies)! Help compare them. All major AI labs have them written down for the AI safety summit. I know @Tyler has done a lot of work in this already!
- Research acknowledgement of xrisk. OpenAI has, but some others have not.
- Design a scorecard / find example designs (see below)
- Design a rating system

Progress

- See here: https://github.com/joepio/pauseai/pull/32

Examples:

https://www.ciwf.org.uk/media/7452331/eggtrack-2022-report.pdf
https://thehumaneleague.org/2023-cage-free-eggspose
https://mercyforanimals.org/count-your-chickens-report/

sacred-rose•12/13/23, 3:32 PM

Is this similar to http://PlanwithAI.io (which assigns a score to each ai) or am i misunderstanding?

sacred-rose•12/13/23, 3:33 PM

full-green•12/13/23, 6:32 PM

We should probably use a rubric of how well the companies themselves reportedly handle specific things. e.g.:
- Public messaging about safety
- Internal safety testing
- External safety evals
- Release criteria
- Model weights proliferation
etc.
We could get very granular.

moderate-tomato•12/28/23, 10:10 PM

In addition to AI company scorecards, we also need:

Politician scorecards
“Endorse candidates who take the right position. Issue statements harshly urging voters to reject candidates who don’t. You could even create an AI seal of safety or something tangible that candidates can feature on their websites and social media profiles to show they’re on the right side.”
https://thehill.com/opinion/technology/4038971-a-campaign-plan-to-put-ai-regulation-in-the-political-zeitgeist/amp/

AI policy scorecards like this one from the Future of Life Institute
https://futureoflife.org/document/fli-governance-scorecard-and-safety-standards-policy/

The Hill

A campaign plan to put AI regulation in the political zeitgeist

Leaders in the AI space who want to see regulations proposed, supported and enacted need to make that the only politically salient position to take in the 2024 campaigns.

Future of Life Institute

FLI Governance Scorecard and Safety Standards Policy (SSP) - Future...

hurt-tomatoOP•1/3/24, 3:07 PM

Some things we could compare:

- Have they acknowledged x-risk
- Have they acknowledged other risks
- Are they investing in safety
- Anti-profit mechanism (OpenAI)
- Long-term benefit trust accounts (Antrhopic)

hurt-tomatoOP•1/4/24, 12:05 PM

this gon be gud
https://deploy-preview-32--pauseai.netlify.app/scorecard

hurt-tomatoOP•1/4/24, 12:11 PM

@Tyler could you take a look / give feedback?

colossal-harlequin•1/4/24, 11:46 PM

fwiw: I think this is a decent, worthwhile effort. Obviously any attempt to quantify subjective judgements across some shifting consensus and sum them for a score is problematic, but I don't have a better idea.

Will you detail the mechanisms of voting/scoring at all?

At some point, I suspect having to maintain this will incur (time and reputation) costs we will regret: but again, it has value, so I don't think it is a bad idea.

Some kind of "dynamics" or "principles" dimension (covering the "if push comes to shove, what will you do?" angles) is somewhat missing. How will folk handle commercial pressure and race dynamics? What lines won't they cross? In some ways I guess this would just be an extension of the "acknowledge" dimension that we could bring in as it becomes more key?

hurt-tomatoOP•1/5/24, 12:03 AM

Current rating is not very consistent and clear. We could write guidelines to improve on this. But I do think some degree of flexibility is important, we need the jury to have some autonomy in this.

Most important bottleneck right now is simply research. It takes time to learn enough about these companies. Any help is very much appreciated

hurt-tomatoOP•1/17/24, 2:38 PM

It is probably nice if we have a jury. Some potential people to ask for Jury:

- Zvi Mowshowitz
- Someone at ARC (although they work closely with these labs)
- Zach Stein Perlman (is also working on a scorecard)
- MIRI (malo bourgon)
- Jeffrey Ladish
- Andrew Gritch
- Ajeya Cotra (although probably reluctant)
- Someone from Public Intelligence Project

Thanks to @Tyler for inspiration!

other ideas welcome!

hurt-tomatoOP•1/17/24, 2:46 PM

We should consider reworking the names of the scores, perhaps frame them negatively. Instead of acknowledgementacknowledgement call it denialdenial.

We have to balance credibility with expressiveness. The most credible way uses neutral language and looks like a whitepaper, the most expressive one looks like a smear campaign with a lot of negative language.

Hhurt-tomato We should consider reworking the names of the scores, perhaps frame them negativ...

sacred-rose•1/18/24, 7:32 AM

I like positive names as it shows "what can you improve" instead of sounding unconstructive hate

dead-brown•1/19/24, 1:57 PM

Well maybe the question to ask is: who is this addressed to? If this is addressed to the companies themselves in the hope that they take it seriously then negative language would be counterproductive. But I have zero illusion that Meta will care the slightest about this directly. However, if this is addressed to the general public, in the hope that their opinion then matters to the companies in question, I am not so sure that we need to avoid negative language. I feel that expressiveness would be more effective

hurt-tomatoOP•1/19/24, 3:37 PM

good points, thanks

hurt-tomatoOP•1/22/24, 10:50 AM

I'll meet with Zwi Mowshowitz thursday to discuss this plan

dead-brown•1/22/24, 12:24 PM

@Joep Meindertsma I met Simeon Campos this week end, who told me safer-ai is working on a similar project, are you aware of this?

hurt-tomatoOP•1/22/24, 12:24 PM

No I'm not!

dead-brown•1/22/24, 12:24 PM

From his point of view duplication was not necessarily a problem

wispy-yellow•1/25/24, 3:18 AM

Just wanted to share a couple of thoughts:

For other causes, scorecards like this are really effective at generating media coverage, especially when they are wrapped up in a big (annual?) release that has some easy headline opportunities for journalists. I think media coverage should be a priority for basically all scorecard efforts in the AI safety space, especially this one. Then, if some scorecard effort actually gains clout, it can be used in future releases as something labs will fight to score well on.

In light of this, one really important thing is just carving out the time to do a bunch of outreach to journalists, most of which might not even result in a response. Like, potentially hundreds of emails. Speaking of which — does PauseAI have a spreadsheet of journalists who cover this stuff / who you have connections with already?

Also, it might be worth looking through the report and trying to optimize it for what a journalist would feel comfortable covering. They'd probably want to see well-cited and relatively objective reasoning for the scores, even if the actual number is kinda made up, or just the mean of the judge's made up numbers. I don't think they'd want to report a claim like "Meta scores the worst on acknowledging risks" without having some kind of neutral evidence they can point to to explain PauseAI's decision. So it'd be good to keep the blurbs both neutral and concrete.

hurt-tomatoOP•1/25/24, 9:19 AM

Great points @Tyler!

Agreed that reaching out to journalists is top priority for this project. We have a spreadsheet for contacts and a @Comms Team that could help out here.

For the scores and sources, I completely agree that we need to make it as objective as we can make it. This is partly a design issue - do you calculate the individual scores? Do specific claims make a number go up or down? Does each judge give a rating per domain and we take the averages and list all the arguments + sources?

sacred-rose•1/25/24, 8:16 PM

OpenAI score should be less than 6 given that they have advanced capabilities like no one and kinda started a race

hurt-tomatoOP•1/25/24, 8:38 PM

yes, good point - I'm gonna add a new column. Zwi had the same point basically, he suggested adding a "frontier" column, where OpenAI scores horribly

sacred-rose•1/25/24, 8:52 PM

Or capabilities advancements

hurt-tomatoOP•2/1/24, 8:16 PM

@destrucules wants to help out!

hurt-tomatoOP•2/2/24, 3:44 PM

@Maxime F (RationalHippy) would also like to help

dead-brown•2/6/24, 3:15 PM

Just getting started on this project. I have a few questions and suggestions, I will keep posting them here as they come to my mind.
1. A suggestion to update the structure of the report for each company so that it has a short summary of the explanation (equivalent to what is currently there), a longer explanation (which could be saved as markdown in order to be a self-sufficient report), and a list of sources
2. How did we decide on having a score out of 10 rather than 5? Is it standard practice in this kind of exercise, do we need this level of granularity?

dead-brown•2/6/24, 3:17 PM

3. The current explanation for OpenAI on "Acknowledge" reads:

Has now publicly acknowledged most of the AI risks, including existential risk. However, it took them many years to do so. Sam Altman wasn't honest about his 'worst nightmare' during the Senate hearing in May 2023.

I am wondering about the phrase "it took them many years to do so." Do we really want to have a time component in this research? Should the scorecard reflect only a judgement about the companies as of 2024, over a certain time period (for example 2023 to 2024) or over their whole history? The latter solution does not seem entirely right to me

dead-brown•2/6/24, 3:17 PM

@Joep Meindertsma

dead-brown•2/6/24, 3:23 PM

4. Do we have an idea of how fleshed out each point needs to be before we go public with these scorecards? At the moment this is 7 companies * 4 factors = 28 points to be made. Do we want each one of these to be a full multi-pages report? Single page one? One paragraph? In any case, I would suggest focusing first on the 3 main labs in order to get everything right and get an idea of the time this research takes, then expanding the work to the smaller players

hurt-tomatoOP•2/6/24, 3:36 PM

1. Agree!
2. I don't have a strong opinion on the scoring system. Out of 5 is fine.
3. Good point. I think it's likely that we'll do multiple scorecards over time, and each one will be updated over what happened in that period. Perhaps it does make more sense to restrict the first one to last year, too.
4. Maybe one paragraph per point per company? Each one consisting of a bunch of sentences, each with one or more sources?

dead-brown•2/7/24, 3:08 PM

5. What would make an AI company a 10/10? Wouldn't it be a bit contradictory with the mission of PauseAI? Say that there is a company that completely acknowledges x-risk, lobbies actively for the best frontier models regulations, has the strongest deployment process out there, and pioneers the research in AI Safety, we would have to give them a 10 out of 10 according to our criterias. People could then ask us, why are you advocating for a pause?

Ddead-brown 5. What would make an AI company a 10/10? Wouldn't it be a bit contradictory wit...

dead-brown•2/7/24, 3:09 PM

The point that bothers me the most I guess is the deployment score, because the only safe deployment with this kind of technology is no deployment - at least until we know way way better what we are doing

dead-brown•2/7/24, 4:24 PM

Here is a first version of my research for OpenAI. Just a few sentences for each component (acknowledge, lobby, deployment, research) backed by sources
https://maximolog.notion.site/OpenAI-500861c01f9e4f459dfe808daf80e027

Maxime Fournes's Notion on Notion

OpenAI | Notion

Acknowledge

Ddead-brown 5. What would make an AI company a 10/10? Wouldn't it be a bit contradictory wit...

full-green•2/7/24, 5:46 PM

Pause advocacy would still be necessary unless every company had a perfect score.

Ddead-brown The point that bothers me the most I guess is the deployment score, because the ...

full-green•2/7/24, 5:49 PM

We could cap the deployment score at 4/5 with that disclaimer. Or rather, it would be possible for a company to say, "we will not deploy more capable foundation models," in which case I think it would be fair to give them a perfect score.

dead-brown•2/8/24, 10:17 AM

6. Should we add a "recommendations" section on these scorecards? For example, I am currently researching the "acknowledge" part for Anthropic. I find that the public declarations of the founders are exemplary, however there is no mention of existential risk on the company's website. I would give them a high score, say 8, assorted with a recommendation to mention x-risk on their website

Ddead-brown 6. Should we add a "recommendations" section on these scorecards? For example, I...

hurt-tomatoOP•2/8/24, 10:17 AM

That's a great idea!

dead-brown•2/8/24, 11:01 AM

"Anthropic seems to have a policy of not deploying SOTA models. Anthropic sat on Claude - and waited with deploying it until ChatGPT came out." @Joep Meindertsma do you know/remember where you got that from?

Ddead-brown "Anthropic seems to have a policy of not deploying SOTA models. Anthropic sat on...

hurt-tomatoOP•2/8/24, 11:09 AM

Been told that, needs sauce!

dead-brown•2/9/24, 2:57 PM

https://maximolog.notion.site/Anthropic-e40f4a2195864b6796e1cd6eaf609e5b?pvs=4

Maxime Fournes's Notion on Notion

Anthropic | Notion

Acknowledge

hurt-tomatoOP•2/12/24, 2:01 PM

So we need a jury for the scorecard. Ideally they:

- Are not working for an AI lab
- Have good takes on AI safety
- Have some opinions / dare to judge companies

Some names:

- Zvi Mowshowitz (reached out to already, shared some good insights)
- Max Tegmark
- Yoshua Bengio
- Eliezer Yudkowski
- Rob Bensinger
- Gwern

dead-brown•2/12/24, 8:13 PM

- Simeon Campos
- Davidad
- Stuart Russel
- Nick Bostrom
- Connor Leahy

colossal-harlequin•2/13/24, 9:42 AM

Those are all great. But obviously also busy, and will be called out by some as having already picked a side.

I mean, of course I expect many thoughtful and open minded people to end up where they are. But is there also anyone we can ask who won't ring the same bells?

dead-brown•2/14/24, 4:49 PM

Update
- I have done the research for Google Deepmind: https://www.notion.so/maximolog/Google-DeepMind-595bd56bf44f41e8887c5c0b6edf75af
- As I am progressing in my research, I am developing a methodology/workflow in order to make the scorecards more coherent and quantitative. For each point, I wrote down a list of research questions. This is still work in progress https://www.notion.so/maximolog/Scorecard-Workflow-d8025b4738f64ead97ae75eacf8b18ce
- I am very liberally adding names of potential jury members (29 so far) to this google doc: https://docs.google.com/document/d/1_kxYWpyI_ulcIDERDi09OYAbasSesqCmmdgB4LRXSFA/edit?usp=sharing
- I am still mostly working in my personal Notion at the moment but planning to move to Google Docs soon so that others can join and help

dead-brown•2/15/24, 4:39 PM

Project Tracker: https://maximolog.notion.site/AI-Safety-Scorecards-f7cf685593234aaeb108da8e1d53ebaf?pvs=4

Maxime Fournes's Notion on Notion

AI Safety Scorecards | Notion

Description

dead-brown•2/15/24, 5:06 PM

@Tyler Could you walk me through your RSP spreadsheet and explain how you created it?

Ddead-brown @Tyler Could you walk me through your RSP spreadsheet and explain how you create...

wispy-yellow•2/15/24, 5:45 PM

Hi! I don't actually think of it as being about RSPs so much as lots of potential demands that activists could make of companies (although RSP-like policies end up looking pretty good). I was mostly borrowing from other people's work — the sheet was first conceived of by someone else, and I fleshed it out a bit using public AI governance resources. Zach Stein Perlman's list of ideal behavior for labs was really helpful (https://www.greaterwrong.com/posts/GCMMPTCmGagcP2Bhd/ideas-for-ai-labs-reading-list). The scores I gave were basically all subjective — this is just to clarify my (and other activists') thinking rather than to publicly present in any way (speaking of which, please keep it private for now!). Kind of like a guesstime model.

Ideas for AI labs: Reading list

Related: AI policy ideas: Reading list. This document is about ideas for AI labs. It's mostly from an x-risk perspective. Its underlying organization black-boxes technical AI stuff, including technical AI safety. Maybe I should make a separate post on desiderata for labs (for existential safety). See generally The Role of Cooperation in Responsi...

dead-brown•2/20/24, 5:01 PM

Interesting point that @Tyler mentioned in a call: when we release the scorecard, we can "bait" media organisations for coverage by contacting them and telling them "we are going to release a scorecard, do you want to be the first one covering this?"

Ddead-brown Interesting point that @Tyler mentioned in a call: when we release the scorecard...

wispy-yellow•2/20/24, 5:35 PM

Yes! More details here: https://inksights.rep-ink.com/2023/06/media-exclusives-how-and-when-to-use-them/

It helps if you have a history of past press coverage, which PauseAI does

Content and legal marketing, marketing & public relations agencySteven Gallo

Media exclusives: how and when to use them | Content and legal mark...

For PR purposes, exclusives can help you squeeze all of the “juice” you can get out of a story and make sure it’s placed with an outlet that will give it the coverage you think it deserves.

dead-brown•2/22/24, 4:59 PM

Created a spreadsheet to make this work more methodical: https://docs.google.com/spreadsheets/d/1Qqu3SyWob41oS4ViFhKc7JK7eIh2T8qVVyR_VLqNehw/edit?usp=sharing

Google Docs

AI Safety Scorecards

README

How to use this document?
/!\ This is a work in progress
Researching a company
1. Create copy of the "Company Template" sheet
2. Do some research to try and answer the individual research questions. Always add the source.
3. Do some research to try and find all the papers published by the...

AI Safety Scorecard

Strategy

Other initiatives

How to help

Progress

Similar Threads

Similar Threads

Similar Threads