security research, software archaeology, geek of all trades
577 stories

The Great Rivalry?

1 Share

Every argument that the US is in danger of losing out to China, that the US needs more weapons to deter China, that the US can’t afford to help arm Ukraine, and many others, should be required to begin with these two graphs.

Data for the first graph is from the International Monetary Fund, for the second from the International Institute for Strategic Studies. The graphs appear in this article.

Cross-posted to Lawyers, Guns & Money

Read the whole story
4 days ago
Pittsburgh, PA
Share this story

Software Bugs That Cause Real-World Harm


Years ago, when I was an undergraduate student at McGill, I took a software engineering class, and as part of that class, I heard the infamous story of the Therac-25 computer-controlled radiotherapy machine. Long story short: a software bug caused the machine to occasionally give radiation doses that were sometimes hundreds of times greater than normal, which could result in grave injury or death. This story gets told in class to make an important point: don’t be a cowboy, if you’re a software engineer and you’re working on safety-critical systems, you absolutely must do due diligence and implement proper validation and testing, otherwise you could be putting human lives at risk. Unfortunately, I think the real point kind of gets lost on many people. You might hear that story and think that the lesson is that you should never ever work on safety-critical systems where such due diligence is required, and that you’re really lucky to be pocketing hundreds of thousands of dollars a year working on web apps, where the outcome of your work, and all the bugs that may still remain dormant somewhere in your code, will never harm anyone. Some people work on safety-critical code, and these people bear the weight of tremendous responsibility, but not you, you’re using blockchain technology to build AirBnB for dogs, which couldn’t possibly harm anyone even if it tried. I’d like to share three stories with you. I’ve saved the best story for last.

Back in 2016, I completed my PhD, and took my first “real” job, working at Apple in California. I was joining a team that was working on the GPU compiler for the iPhone and other iDevices. While getting set up in California prior to starting the job, it occurred to me that showing up to work with an Android phone, while being part of a team that was working on the iPhone, might not look so great, and so I decided to make a stop at the Apple store and bought the best iPhone that was available at the time, an iPhone 6S Plus with 128GB of storage. Overall, I was very pleased with the phone: it was lightweight, snappy and beautiful, with great battery life, and the fingerprint sensor meant I didn’t have to constantly type my pin code like on my previous Android phone, a clear upgrade.

Fast forward a few months and I had to catch an early morning flight for a work-related conference. I set an early alarm on my phone and went to sleep. The next day, I woke up and instantly felt like something was wrong, because I could see that it was really sunny outside. I went to check the time on my iPhone. I flipped the phone over and was instantly filled with an awful sinking sense of dread: it was already past my flight’s takeoff time! The screen on my phone showed that the alarm I had set was in the process of ringing, but for some reason, the phone wasn’t vibrating or making any sound. It was “ringing” completely silently, but the animation associated with a ringing alarm was active.

I did manage to get another flight, but I needed my manager’s approval, and so I had to call him and explain the situation, feeling ashamed the whole time (I swear it’s not my fault, I swear I’m not just lazy this bug is real, I swear). Thankfully, he was a very understanding man, and I did make it to the conference, but I missed most of the first day and opening activities. It wasn’t the first or the last time that I experienced this bug, it happened sporadically, seemingly randomly, over the span of several months. I couldn’t help but feel angry. Someone’s incompetence had caused me to experience anxiety and shame, but it had also caused several people to waste time, and the company to waste money on a missed flight. Why hadn’t this bug been fixed after several months? How many other people were impacted? I had a cushy tech job where if I show up to work late, people ask if I’m doing alright, but some people have jobs where being late can cause them to be fired on the spot, and some of these people might have a family to support, and be living paycheque to paycheque. A malfunctioning alarm clock probably isn’t going to directly cause a person’s death, but it definitely has the potential to cause real-world harm.

The point of this blog post isn’t to throw Apple under the bus, and so I’ll share another story (or maybe more of a rant) about poor software design in Android OS and how it’s impacted my life. About 3 years after working at Apple, when the replacement battery in my iPhone 6S Plus started to wear out, I decided to try Android again, and so I got myself a Google Pixel 3A XL. This phone also had a nice fingerprint scanner, but the best differentiating feature was of course the headphone jack. Unfortunately, Android suffers from poor user interface design in a few areas, and one of the most annoying flaws in its user interface is simply that the stock Android OS doesn’t have flexible enough options when it comes to controlling when the phone rings, which is one of the most important aspects of a phone.

Being a millenial, I don’t particularly like phone calls. I would much prefer to be able to make appointments and file support tickets using an online system. However, my deep dislike for phone calls probably stems from a more personal issue, which is that my mother is an unmedicated schizophrenic. She doesn’t respect my boundaries. She has done things such as randomly call me in the middle of the night because her irrational paranoia causes her to be worried that shadowy evil figures are coming after me. Thankfully, Android now has “bedtime mode” feature, which allows me to make it so that phone calls won’t cause my phone to ring between 10PM and 8:30AM. If my mom happens to die in a hospital in the middle of the night, I’ll just have to find out and be sad the next day. My sleep is sacred, and bedtime mode allows me to enforce some basic boundaries using software.

Bedtime mode is quite useful, but I still have the other problem that my mom could decide to randomly call me in the daytime as well, and unfortunately I rarely want to take her phone calls. However, I also don’t want her to end up homeless or in jail (which has happened before, but that’s a story for another time), and so I don’t want to block her and completely lose the ability to receive her calls. This results in me having to almost always have my phone set to “do not disturb”, so that I don’t have to be disturbed at random times by unwanted phone calls. I wish that Android had an option to set a specific person to never cause the phone to ring, and it seems like that should be an easy feature to implement that would have a real positive impact on the quality of lives of many people, but I digress.

The real problem is that, although I hate phone calls, our society is still structured in such a way that sometimes, I have to receive “important” phone calls. For instance, my doctor recently placed a referral for me to see a specialist. I’ve been told that the hospital is going to call me some time in the next few weeks. I don’t want to miss that phone call, and so I have to disable “do not disturb”. However, because the stock Android OS has only one slider for “Ring & notification volume”, disabling do not disturb means that my phone will constantly “ding” and produce annoying sounds every time I get a text message or any app produces a notification, which is very disruptive. The fact is, while I occasionally do want my phone to ring so I can receive important phone calls, I basically never want app notifications to produce sound. I’ve been told that I should go and individually disable notifications for every single app on my phone, but you tell me, why in the fuck can’t there simply be two separate fucking sliders for “Ring volume” and “Notification volume”? In my opinion, the fact that there isn’t simply highlights continued gross incompetence and disregard for user experience. Surely, this design flaw has caused millions of people to experience unnecessary anxiety, and should have been fixed years ago.

This is turning out to be a long-ish blog post, but as I said, I’ve kept the best story for last. I’m in the process of buying a new place, and I’ll be moving in two weeks from now. As part of this, I’ve decided to do some renovations, and so I needed to get some construction materials, including sheets of drywall. This is a bit awkward, because I’m a woman living in the city. I don’t have a car or a driver’s license. Sheets of drywall are also quite heavy, and too big to fit in the building’s elevator, meaning they have to be carried in the stairs up to the third floor. Yikes.

In Montreal, where I live, there are 3 main companies selling renovation supplies: Home Depot, Rona and Reno-Depot. Home Depot is the only one that had all the things I needed to order, so I went to their website and added all the items to my cart. It took me about 45 minutes to select everything and fill the order form, but when I got to the point where I could place the order, the website gave me a message saying “An unknown error has occurred”. That’s it, no more details than that, no description of the cause of the error, just, sorry lol, nope, you can’t place this order, and you don’t get an explanation. I was really frustrated that I had wasted almost an hour trying to place that order. A friend of mine suggested that maybe she could try placing the order and it would work. I printed the page with the contents of my cart to a PDF document and sent them over. It worked for her, she was able to place the order, and so I sent her an electronic payment to cover the costs.

Since my new place is on the third floor, we had some time pressure to get things done, and heavy items would have to be carried up the stairs, we paid extra specifically to have the items delivered inside the condo unit and within a fixed time period between noon and 3PM. The total cost for delivery was 90 Canadian dollars, which seems fairly outrageous, but sometimes, you just have no choice. I was expecting my delivery before 3PM, and the Home Depot website had said that I would get a text 30 minutes before delivery. At 2:59PM, I received two text messages at the same time. The first said “Your order has just been picked up”. The second said “Your order has just been delivered, click here to rate your delivery experience”. Again, I was filled with a sense of dread. Had they tried to reach me and failed? Had they just dumped the construction materials outside? I rushed downstairs. There was no sign of a delivery truck or any of the materials. I figured there must be another software bug, despite what the second text message said, the delivery clearly hadn’t happened yet.

Sure enough, at 3:27PM, 27 minutes after the end of my delivery window, I received a phone call from a delivery driver. He was downstairs, and he was about to dump the construction materials on the sidewalk. NO! I explained that I had paid extra to have the materials delivered inside the unit. I could show him the email that proved that I had paid specifically for this service. He argued back, according to his system, he was supposed to dump the materials at the curb. Furthermore, they had only sent one guy. There was no way he alone could carry 8 foot long, 56-pound sheets of drywall up to the third floor. I raised my voice, he raised his. After a few minutes, he said he would call his manager. He called back. The delivery company would send a second truck with another guy to help him carry the materials upstairs. I felt angry, but also glad that I had stood my ground in that argument.

The first guy waited, sitting on the side of the curb in the heat, looking angry, doing nothing, for about 30 minutes until the second guy showed up to help. When the second delivery guy showed up, he asked to see the email. I showed him proof that I had paid to have things delivered upstairs. He also stated that their system said they only had to drop things in front of the building, but that he believed me. The delivery company was a subcontractor, and this was a software bug they had encountered before. This bug had caused multiple other customers to be extremely upset. So upset, in fact, that one customer, he said, had literally taken him hostage once, and another one had assaulted him. Gross, almost criminal incompetence on the part of one or more developers somewhere had again caused many people to waste time and to experience stress, anger, and even violence. The most infuriating part of this though, of course, is that bugs like this are known to exist, but they often go unfixed for months, sometimes even years. The people responsible have to know that their incompetence, and their inaction is causing continued real-world harm.

The point of this blog post is that, although most of us don’t work on software that would directly be considered safety-critical, we live in a world that’s becoming increasingly automated and computerized, and sometimes, bugs in seemingly mundane pieces of code, even web apps, can cause real-world suffering and harm, particularly when they go unfixed for weeks, months or even years. Part of the problem may be that many industry players lack respect for software engineering as a craft. Programmers are seen as replaceable cogs and as “code monkeys”, and not always given enough time to do due diligence. Some industry players also love the idea that you can take a random person, put them through a 3-month bootcamp, and get a useful, replaceable code monkey at the other end of that process. I want to tell you that no matter how you got to where you are today, if you do your job seriously, and you care about user experience, you could be making a real difference in the quality of life of many people. Skilled software engineers don’t wear masks or capes, but they can still have cool aliases, and they truly have the power to make the world better or worse.

Read the whole story
6 days ago
Pittsburgh, PA
Share this story

Every Author as First Author

1 Share

Erik D. Demaine and Martin L. Demaine. Every author as first author. 2023. arXiv:2304.01393.

We propose a new standard for writing author names on papers and in bibliographies, which places every author as a first author—superimposed. This approach enables authors to write papers as true equals, without any advantage given to whoever's name happens to come first alphabetically (for example). We develop the technology for implementing this standard in LaTeX, BibTeX, and HTML; show several examples; and discuss further advantages.

Read the whole story
30 days ago
Pittsburgh, PA
Share this story

Blue skies over Mastodon


In the early 80s, my mom worked a couple shifts a month at a little small-town food co-op that smelled like nutritional mummy. She brought home things like carob chips and first-generation Soysage, which remains one of the grossest things I have ever eaten. This was also a real boom time for unsalted Legume Surprise and macrobiotically balanced grain mush that tasted like macrame owl.

This food sold reasonably well to a fringe class of Americans, including many who were rightfully worried about pesticides, animal cruelty, and the health effects of a meat-and-potatoes diet, and also a bunch who were just a real specific kind of nerd. And there was a strong current in the community of scorn for people who were lured into eating junk food when they could be eschewing seasonings until they could properly enjoy the glories of gelled millet or whatever.

I’ve been thinking about this a lot over the past few months on Mastodon and especially this week, as I hung out observing the pupal stage of Bluesky.

To get the background out of the way: Mastodon is a decentralized social network developed in the open and built on the ActivityPub protocol. It was founded by German software developer Eugen Rochko and is presently a German non-profit company. Bluesky is a social networking app built on the new Authenticated Transfer Protocol for decentralized networks, which is being developed in the open by the Bluesky team. Bluesky launched out of Twitter as a project promoted by Twitter founder Jack Dorsey and is presently an independent US-based public benefit company headed by Jay Graber (Dorsey retains a seat on the board).

we are not the same

Things got interesting in the Bluesky closed beta this week when a ton of people got let in while the app was still in an unstable state—no block function, semi-working mute function, problems with enormous threads. Posters ran around threatening noted centrist Matt Yglesias with hammer emojis, etc.

Lots of people joined the Bluesky beta and posted about why it worked better for them than Mastodon did. A big chunk of Mastodon responded with a social immune response intended to both warn people away from Bluesky for a very long list of reasons, including its association with Dorsey, its incompleteness and everything that clearly meant about the intentions of the developers, and that it would split the decentralized network vote. Many, many posts that amounted to, Bluesky obviously won’t ban Nazis, let me repeat an enlightening story about a Nazi bar I’ve heard 400 times.”

Incidentally, when a straightforwardly I’m a Nazi” Nazi showed up in the beta, people used the report function, and the Bluesky team labeled the account and banned it from the Bluesky app and restricted promotion of the account of the person who invited him. This changed exactly none of the tenor of the Nazi conversation on Mastodon, but it happened.

I have a suspicion that a lot of the defensive maneuvering on Mastodon is happening because Mastodon fans know that the network absolutely cannot compete on user friendliness and basic social functionality, so they’re leaning hard into the things it does get right—and then, in some cases, trying to shame people into not even thinking about trying a competing network.

But about that ease of use problem. Let’s rewind for a second.

bouncing off Mastodon

Editing on May 1 2023 to add: Eugen Rochko published a new blog post today that discusses immediate changes to the mobile sign-up flow, which should help with both the initial barrier and, maybe more importantly, the initial safety problem of people ending up on bad instances because they didn’t know any better.

In what I think is a really positive sign, Rochko also wrote:

We’re always listening to the community and we’re excited to bring you some of the most requested features, such as quote posts, improved content and profile search, and groups. We’re also continuously working on improving content and profile discovery, onboarding, and of course our extensive set of moderation tools, as well as removing friction from decentralized features. Keep a lookout for these updates soon.

I think this is all great! I’m adding this new context here because I think it kind of leapfrogs some of what I wrote below (which, again, is great), and I’m leaving the rest of this post intact as a discussion of how things had been going until now. But I’m optimistic about these statements. (end edit)

During the big waves of Twitter-to-Mastodon migrations, tons of people joined little local servers with no defederation policy and were instantly overwhelmed with gore and identity-based hate. A lot of those people, understandably, did not stick around, and plenty of them went back to their other social spaces and warned others that Mastodon wasn’t safe. For people who lucked out and landed on a well-moderated instance, finding fun people to follow was hard and actually following each of them often involved three separate steps, depending on which link you happened to click.

It’s a lot of hassle for a gamble on a network that might not end up being what you need.

Over on Bluesky, by contrast, once you’re in the beta, it’s super easy to sign up, find people, follow them, and participate in conversations. I’m seeing a lot of the people I’ve missed the most since I stopped using Twitter in like 2018, which is a delight, but I’m also not really posting because it’s a chaos machine and it’s still way too early for me to know if I really want to contribute there.

The thing is, networks can recover from even big initial fuckups. Mastodon developers could have made a project of interviewing people who wanted to leave Twitter and then building their needs as a roadmap. Writers and designers could make a great brief visual + textual guide to a few fun, tightly moderated instances to join, with pros and cons and a comparison of moderation and defederation policies, and slap that on the front page of Join Mastodon. Or the team could have taken any of dozens of other suggestions for streamlining. None of that happened.

You can recover from bad product design choices by changing things, but you do have to change things. Neither did the Mastodon core developers take swift action to—well, do much of anything.

unfriendly design feeds insular culture

I—a nerd—actually really like Mastodon most of the time, but I would like it so much more and feel like it was doing a lot more good in the world if it were more welcoming and easier to use. When I raise these points on Mastodon, I get a steady stream of replies telling me that everything I’m whining about is actually great, that valuing a pleasant UI over the abstraction of federation is shallow and disqualifying, and that people who find Mastodon difficult don’t belong anyway, so I should go join Spoutible” or whatever.

And of course this stuff shows up in much worse ways for at least some Black and brown people on Mastodon.

I hate it that I can’t in good conscience encourage Black friends to get on Mastodon, because I know they’re going to be continuously chided by white people if they mention race or criticize anything at all about Mastodon itself. I hate that a difficult sign-up process keeps out lazy people with bad culture” is a thing in so many Mastodon conversations. (Fun fact, if you hold this idea up to your ear, you can hear them say sheeple.”)

I have absolutely zero fortune-telling to offer re: Bluesky. The AT protocol approach is enough of a tweak on existing models that I think it’s pretty much impossible to tell how it’s all going to play out when the technical abstractions meet actual users at scale—most of all, because it remains to be seen whether or how much the team will accept feedback on things that aren’t working (and for whom). In what seems to me like a moderately good sign, late on Saturday, Bluesky CEO Jay Graber posted:

At the very beginning of bluesky I said the tech would be straightforward to build, but moderation, and designing decentralized moderation, would be hard. It is. I talked with a bunch of people about it at our meetup today, but need to get the chance to sit down and write—so, logging off, see you tomorrow, and I hope we can get more of your proposed approaches implemented soon.

Maybe they’ll figure it out, maybe they won’t, but I would love to see even half the kicked-anthill energy being spent hating a closed beta app directed toward making Mastodon better for more people.

the strongest path forward for Mastodon advocates

I haven’t mentioned the simplest and IMO best critique of Bluesky and most other big platforms, which is that they emerged out of venture-capital galaxy brain, which has the moral sense of an AI chatbot. After the past decade or so on Twitter, I won’t touch anything Jack Dorsey has touched” is a reasonable reaction. I will only put my social labor into platforms that can never benefit billionaires” is fair.

But the missing step, to me, is when people with principled objections to other platforms are unwilling or unable to make the alternatives of their choosing more welcoming to more people. And there are absolutely people trying to do the work, but they’re dependent on the choke-point of what Mastodon-the-company decides is valuable. (Almost like something…centralized?)

One of the big things I’ve come to believe in my couple of decades working on internet stuff is that great product design is always holistic: Always working in relation to a whole system of interconnected parts, never concerned only with atomic decisions. And this perspective just straight-up cannot emerge from a piecemeal, GitHub-issues approach to fixing problems. This is the main reason it’s vanishingly rare to see good product design in open source.

Great product design is also grounded in user research and a commitment to ongoing evaluation and iteration. For something like a decentralized social network, it also requires letting people from many distinct communities help steer the ship—and building ways to work toward consensus in some areas and accept both conflict and compromise in others. And great design at mass scale requires the core team to value mass adoption and push back—hard and loudly—against the idea that inconvenience is good because it filters out undesirables.

This doesn’t mean that I think Mastodon should necessarily implement full-text search or the whole set of interlocking patterns that constitute Twitter-style quote posts. But particularly given the third-party pressure on both search and quote posts, I think it’s way past time to do full-scale user research and design work on ways to integrate some kinds of search and quotation in some places and in ways that preserve privacy, safety, and autonomy. And to handle the whole nested doll of problems related to sign-up, discovery, and following, for starters.

while I’m opinionating

I feel enormous empathy for tiny teams doing high-pressure work. I think Rochko and his team have pulled off great work over the past six years, and I think the tendency to assume the worst motivations for every action maintainers take is a great example of the way that treating open-source projects like merchants and behaving like enraged customers is gross and destructive. But I also think the best way out of the overloaded-maintainer nightmare is to:

  • communicate transparently—and mostly not in unfindable replies to random people,
  • to make alliances with people who have capacities you lack, like user research and distributed deliberation, and
  • to devolve power whenever you can.

I recognize that that last piece is incredibly difficult to do when you feel like your singular human judgment is at the core of something huge, because judgment doesn’t necessarily scale. But my big hope for Mastodon is that the core maintainers find a way to do it in the very near future—or that other organizations step up to fund and shepherd forks of the project.


If we want more people to enjoy what we believe are the benefits of something like Mastodon, it’s on us to make it delicious and convenient and multi-textured and fun instead of trying to shame people into eating their soysage and unsalted soup.

I hope all of that is actually possible for Mastodon, because a lot of great people very much want it to become a more welcoming place. But the longer Mastodon stays in Linux-on-the-desktop mode, the more likely those people are to take their energy somewhere where it’s valued.

Read the whole story
38 days ago
Pittsburgh, PA
Share this story

The First Substack Dedicated Exclusively to War Correspondence


By Tim Mak

Good morning to readers. Kyiv remains in Ukrainian hands.

I’ve written this opening many, many times as a correspondent for NPR. But now I’m leaving the company as part of the layoffs that have dramatically cut down the company’s workforce.

I’ve decided to go back into Ukraine to keep reporting.

This time, alone.

I’ll be launching a new email newsletter, The Counteroffensive, to cover the war going forward.

Here, you’ll find a direct connection to me and my reporting – whether you choose a free or paid subscription.

Subscribe now

The Counteroffensive will cover the expected Ukrainian counteroffensive. But the name is meant to signify a broader campaign that continues no matter what the result of these next few months. It is a campaign against apathy, cynicism and ignorance about world events in general and the emergence of a new Cold War in particular.

Many of you know me as an on-air correspondent and former U.S. Army combat medic. And many of you have read along as I traveled to nearly every major Ukrainian city, illustrating life during the war through daily vignettes and #DogsofWar.

You’ve shared your #DogsofPeace, and asked smart questions, and showed me your support when my gas tank was low.

I want to keep that conversation going.

Ukrainian soldier rests in a shelter at front line positions near Bakhmut on April 22, 2023. (Photo by ANATOLII STEPANOV/AFP via Getty Images)

After all, war correspondence is almost as old as war itself. As long as there have been humans fighting, those humans have wanted to find ways to bring their stories home.

In this second year of the devastating full-scale war between Russia and Ukraine, I’m looking to use modern means to satisfy these old instincts: a Substack newsletter from a war zone.

If you subscribe to this email newsletter, what you’ll get is an open reporter’s notebook from the frontlines of an emerging Cold War: war correspondence by subscription. You’ll get regular and intimate dispatches about what it’s like to cover the largest war in Europe since WWII.

The Counteroffensive with Tim Mak is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Leaving a steady job at an established institution like NPR for an unproven venture is a big bet.

It’s a bet that readers still care about the human toll that this war is taking.

It’s a bet against cynicism, and ignorance, and apathy.

It’s a bet that there are enough of you out there to support a project this ambitious.

As I was driving recently I listened to an interview with my mentor and colleague Michel Martin, explaining why she made a major career move.

“Be brave,” she said. “Do something hard.”

It’s in that spirit that I’m launching this dangerous and expensive venture.

Readers can't do anything about the danger of the war zone, but they can help us have the gear we need to mitigate risk. And they can help us not go into crushing debt in order to bring the news to the public.

Right now, I’m paying for all our operating expenses myself. For $8 per month – less than a bottle of Sriracha! Or a bowl of pho! – you can be a supporter of regular reporting on the war in Ukraine: a combination of investigative reporting and the moving, personal stories of this conflict.

Your support buys us what we need to report: body armor, medical kits, car rentals, recording equipment, and emergency supplies. And it’s not just gear – hiring my Ukrainian interpreter costs thousands of dollars a month.

Subscribe now

But we are not asking for contributions for contribution's sake.

You get something powerful in return: reporting that serves a public interest mission, that engages and educates.

Readers will get tenacious reporting about the battlefield situation, alternating between the forty-thousand-foot view of brigades and divisions, and the fascinating minutiae of how troops are faring on a human level. 

But you will also get a feeling of what it’s actually like on the ground in Ukraine. The Counteroffensive will also be an exploration of the culture, the language, the cuisine.

Many of you already know me from my reporting on the war in Ukraine thus far, and the thoughtful and empathetic reporting that I try to do. Together we've watched as the war unfolded, with disastrous effects on the civilian population.

Now I need your help for that to continue. Twitter is becoming a more difficult place for us to connect.

If you subscribe to my newsletter – either free or paid – we can engage more directly without the capricious moves of social media owners.

And I'll make every effort to respond to your email, DM, and chat messages.

If you subscribe, I write and work for you now.

Today's dog of war is Rex, our new mascot. “Soldiers call each other ‘Rex’ when they are really impressed with how good you are — or sometimes they say it sarcastically when you’ve messed up,” said Ross, my interpreter and reporting partner.

Let’s hope for more Rexes of the former, and less of the latter.

Share Tim Mak

Read the whole story
44 days ago
Pittsburgh, PA
Share this story

The Prospect of an AI Winter

1 Comment and 2 Shares


Summary #

  • William Eden forecasts an AI winter. He argues that AI systems (1) are too unreliable and too inscrutable, (2) won’t get that much better (mostly due to hardware limitations) and/or (3) won’t be that profitable. He says, “I’m seeing some things that make me think we are in a classic bubble scenario, and lots of trends that can’t clearly continue.”
  • I put 5% on an AI winter happening by 2030, with all the robustness that having written a blog post inspires, and where AI winter is operationalised as a drawdown in annual global AI investment of ≥50%.[1] (I reckon a winter must feature not only decreased interest or excitement, but always also decreased funding, to be considered a winter proper.)
  • There have been two previous winters, one 1974-1980 and one 1987-1993. The main factor causing these seems to have been failures to produce formidable results, and as a consequence wildly unmet expectations. Today’s state-of-the-art AI systems show impressive results and are more widely adopted (though I’m not confident that the lofty expectations people have for AI today will be met).
  • I think Moore’s Law could keep going for decades.[2] But even if it doesn’t, there are many other areas where improvements are being made allowing AI labs to train ever larger models: there’s improved yields and other hardware cost reductions, improved interconnect speed and better utilisation, algorithmic progress and, perhaps most importantly, an increased willingness to spend. If 1e35 FLOP is enough to train a transformative AI (henceforth, TAI) system, which seems plausible, I think we could get TAI by 2040 (>50% confidence), even under fairly conservative assumptions. (And a prolonged absence of TAI wouldn’t necessarily bring about an AI winter; investors probably aren’t betting on TAI, but on more mundane products.)
  • Reliability is definitely a problem for AI systems, but not as large a problem as it seems, because we pay far more attention to frontier capabilities of AI systems (which tend to be unreliable) than long-familiar capabilities (which are pretty reliable). If you fix your gaze on a specific task, you usually see a substantial and rapid improvement in reliability over the years.
  • I reckon inference with GPT-3.5-like models will be about as cheap as search queries are today in about 3-6 years. I think ChatGPT and many other generative models will be profitable within 1-2 years if they aren’t already. There’s substantial demand for them (ChatGPT reached 100M monthly active users after two months, quite impressive next to Twitter’s ~450M) and people are only beginning to explore their uses.
  • If an AI winter does happen, I’d guess some of the more likely reasons would be (1) scaling hitting a wall, (2) deep-learning-based models being chronically unable to generalise out-of-distribution and/or (3) AI companies running out of good-enough data. I don’t think this is very likely, but I would be relieved if it were the case, given that we as a species currently seem completely unprepared for TAI.

The Prospect of a New AI Winter #

What does a speculative bubble look like from the inside? Trick question – you don’t see it.

Or, I suppose some people do see it. One or two may even be right, and some of the others are still worth listening to. William Eden tweeting out a long thread explaining why he’s not worried about risks from advanced AI is one example, I don’t know of which. He argues in support of his thesis that another AI winter is looming, making the following points:

  1. AI systems aren’t that good. In particular (argues Eden), they are too unreliable and too inscrutable. It’s far harder to achieve three or four nines reliability than merely one or two nines; as an example, autonomous vehicles have been arriving for over a decade. The kinds of things you can do with low reliability don’t capture most of the value.
  2. AI systems won’t get that much better. Some people think we can scale up current architectures to AGI. But, Eden says, we may not have enough compute to get there. Moore’s law is “looking weaker and weaker”, and price-performance is no longer falling exponentially. We’ll most likely not get “more than another 2 orders of magnitude” of compute available globally, and 2 orders of magnitude probably won’t get us to TAI.[3] “Without some major changes (new architecture/paradigm?) this looks played out.” Besides, the semiconductor supply chain is centralised and fragile and could get disrupted, for example by a US-China war over Taiwan.
  3. AI products won’t be that profitable. AI systems (says Eden) seem good for “automating low cost/risk/importance work”, but that’s not enough to meet expectations. (See point (1) on reliability and inscrutability.) Some applications, like web search, have such low margins that the inference costs of large ML models are prohibitive.

I’ve left out some detail and recommend reading the entire thread before proceeding. Also before proceeding, a disclosure: my day job is doing research on the governance of AI, and so if we’re about to see another AI winter, I’d pretty much be out of a job, as there wouldn’t be much to govern anymore. That said, I think an AI winter, while not the best that can happen, is vastly better than some of the alternatives, axiologically speaking.[4] I also think I’d be of the same opinion even if I had still worked as a programmer today (assuming I had known as much or little about AI as I actually do).

Past Winters #

There is something of a precedent.

The first AI winter – traditionally, from 1974 to 1980 – was precipitated by the unsympathetic Lighthill report. More fundamentally it was caused by AI researchers’ failure to achieve their grandiose objectives. In 1965, Herbert Simon famously predicted that AI systems would be capable of any work a human can do in 20 years, and Marvin Minsky wrote in 1967 that “within a generation […] the problem of creating ‘artificial intelligence’ will be substantially solved”. Of Frank Rosenblatt’s Perceptron Project the New York Times reported (claims of Rosenblatt which aroused ire among other AI researchers due to their extravagance), “[It] revealed an embryo of an electronic computer that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted” (Olazaran 1996). Far from human intelligence, not even adequate machine translation materialised (it took until the mid-2010s when DeepL and Google Translate’s deep learning upgrade were released for that to happen).

The second AI winter – traditionally, from 1987 to 1993 – again followed unrealised expectations. This was the era of expert systems and connectionism (in AI, the application of artificial neural networks). But expert systems failed to scale, and neural networks learned slowly, had low accuracy and didn’t generalise. It was not the era of 1e9 FLOP/s per dollar; I reckon the LISP machines of the day were ~6-7 orders of magnitude less price-performant than that.[5]

Wikipedia lists a number of factors behind these winters, but to me it is the failure to actually produce formidable results that seems most important. Even in an economic downturn, and even with academic funding dried up, you still would’ve seen substantial investments in AI had it shown good results. Expert systems did have some success, but nowhere near what we see AI systems do today, and with none of the momentum but all of the brittleness. This seems like an important crux to me: will AI systems fulfil the expectations investors have for them?

Moore’s Law and the Future of Compute #

Improving these days means scaling up. One reason why scaling might fail is if the hardware that is used to train AI models stops improving.

Moore’s Law is the dictum that the number of transistors on a chip will double every ~2 years, and as a consequence hardware performance is able to double every ~2 years (Hobbhahn and Besiroglu 2022). (Coincidentally, Gordon Moore died last week at the age of 94, survived by his Law.) It’s often claimed that Moore’s Law will slow as the size of transistors (and this fact never ceases to amaze me) approaches the silicon atom limit. In Eden’s words, Moore’s Law looks played out.

I’m no expert at semiconductors or GPUs, but as I understand things it’s (1) not a given that Moore’s Law will fail in the next decade and (2) quite possible that, even if it does, hardware performance will keep running on improvements other than increased transistor density. It wouldn’t be the first time something like this happened: single-thread performance went off-trend as Dennard scaling failed around 2005, but transistor counts kept rising thanks to increasing numbers of cores:


Some of the technologies that could keep GPU performance going as the atom limit approaches include vertical scaling, advanced packaging, new transistor designs and 2D materials as well as improved architectures and connectivity. (To be clear, I don’t have a detailed picture of what these things are, I’m mostly just deferring to the linked source.) TSMC, Samsung and Intel all have plans for <2 nm process nodes (the current SOTA is 3 nm). Some companies are exploring more out-there solutions, like analog computing for speeding up low-precision matrix multiplication. Technologies on exponential trajectories are always out of far-frontier ideas, until they aren’t (at least so long as there is immense pressure to innovate, as for semiconductors there is). Peter Lee said in 2016, “The number of people predicting the death of Moore’s law doubles every two years.” By the end of 2019, the Metaculus community gave “Moore’s Law will end by 2025” 58%, whereas now one oughtn’t give it more than a few measly per cent.[6]

Is Transformative AI on the Horizon? #

But the main thing we care about here is not FLOP/s, and not even FLOP/s per dollar, but how much compute AI labs can afford to pour into a model. That’s affected by a number of things beyond theoretical peak performance, including hardware costs, energy efficiency, line/die yields, utilisation and the amount of money that a lab is willing to spend. So will we get enough compute to train a TAI in the next few decades?

There are many sophisticated attempts to answer that question – here’s one that isn’t, but that is hopefully easier to understand.

Daniel Kokotajlo imagines what you could do with 1e35 FLOP of compute on current GPU architectures. That’s a lot of compute – about 11 orders of magnitude more than what today’s largest models were trained with (Sevilla et al. 2022). The post gives a dizzying picture of just how much you can do with such an abundance of computing power. Now it’s true that we don’t know for sure whether scaling will keep working, and it’s also true that there can be other important bottlenecks besides compute, like data. But anyway something like 1e34 to 1e36 of 2022-compute seems like it could be enough to create TAI.

Entertain that notion and make the following assumptions:

  • The price-performance of AI chips seems to double every 1.5 to 3.1 years (Hobbhahn and Besiroglu 2022); assume that that’ll keep going until 2030, after which the doubling time will double as Moore’s Law fails.
  • Algorithmic progress on ImageNet seems to effectively halve compute requirements every 4 to 25 months (Erdil and Besiroglu 2022); assume that the doubling time is 50% longer for transformers.[7]
  • Spending on training runs for ML systems seems to roughly double every 6 to 10 months; assume that that’ll continue until we reach a maximum of $10B.[8]

What all that gives you is 50% probability of TAI by 2040, and 80% by 2045:


That is a simple model of course. There’s a far more sophisticated and rigorous version, namely Cotra (2020) which gives a median of ~2050 (though she’s since changed her best guess to a median of ~2040). There are many reasons why my simple model might be wrong:

  • Scaling laws may fail and/or, as models get larger, scaling may get increasingly harder at a rate that exceeds ML researchers’ efforts to make scaling less hard.
  • Scaling laws may continue to hold but a model trained with 1e35 2022-FLOP does not prove transformative. Either more compute is needed, or new architectures are needed.
  • 1e35 FLOP may be orders of magnitude more than what is needed to create TAI. For example, this Metaculus question has a community prediction of 1e28 to 1e33 FLOP for the largest training run prior to the first year in which GWP growth exceeds 30%; plugging that range into the model as a 90% CI gives a terrifying median estimate of 2029.
  • Hardware price-performance progress slows more and/or earlier than assumed, or slows less and/or later than assumed.
  • The pace of algorithmic advancements may slow down or increase, or the doubling time of algorithmic progress for prospective-transformative models may be lesser or greater than estimated.
  • ML researchers may run out of data, or may run out of high-quality (like books, Wikipedia) or even low-quality (like Reddit) data; see e.g. Villalobos et al. (2022) which forecasts high-quality text data being exhausted in 2023 or thereabouts, or Chinchilla’s wild implications and the discussion there.
  • A severe extreme geopolitical tail event, such as a great power conflict between the US and China, may occur.
  • Increasingly powerful AI systems may help automate or otherwise speed up AI progress.
  • Social resistance and/or stringent regulations may diminish investment and/or hinder progress.
  • Unknown unknowns arise.

Still, I really do think a 1e35 2022-FLOP training run could be enough (>50% likely, say) for TAI, and I really do think, on roughly this model, we could get such a training run by 2040 (also >50% likely). One of the main reasons why I think so is that as AI systems get increasingly more powerful and useful (and dangerous), incentives will keep pointing in the direction of AI capabilities increases, and funding will keep flowing into efforts to keep scaling laws going. And if TAI is on the horizon, that suggests capabilities (and as a consequence, business opportunities) will keep improving.

You Won’t Find Reliability on the Frontier #

One way that AI systems can disappoint is if it turn out they are, and for the forseeable future remain, chronically unreliable. Eden writes, “[Which] areas of the economy can deal with 99% correct solutions? My answer is: ones that don’t create/capture most of the value.” And people often point out that modern AI systems, and large language models (henceforth, LLMs) in particular, are unreliable. (I take reliable to mean something like “consistently does what you expect, i.e. doesn’t fail”.) This view is both true and false:

  • AI systems are highly unreliable if you only look at frontier capabilities. At any given time, an AI system will tend to succeed only some of the time at the <10% most impressive tasks it is capable of. These tasks are the ones that will get the most attention, and so the system will seem unreliable.
  • AI systems are pretty reliable if you only look at long-familiar capabilities. For any given task, successive generations of AI systems will generally (not always) get better and better at it. These tasks are old news: we take it for granted that AIs will do them correctly.

John McCarthy lamented: “As soon as it works, no one calls it AI anymore.” Larry Tesler declared: “AI is whatever hasn’t been done yet.”

Take for example the sorting of randomly generated single-digit integer lists. Two years ago janus tested this on GPT-3 and found that, even with a 32-shot (!) prompt, GPT-3 managed to sort lists of 5 integers only 10/50 times, and lists of 10 integers 0/50 times. (A 0-shot, Python-esque prompt did better at 38/50 and 2/50 respectively). I tested the same thing with ChatGPT using GPT-3 and it got it right 5/5 times for 10-integer lists.[9] I then asked it to sort five 10-integer lists in one go, and it got 4/5 right! (NB: I’m pretty confident that this improvement didn’t come with ChatGPT exactly, but rather with the newer versions of GPT-3 that ChatGPT is built on top of.)

(Eden also brings up the problem of accountability. I agree that this is an issue. Modern AI systems are basically inscrutable. That is one reason why it is so hard to make them safe. But I don’t expect this flaw to stop AI systems from being put to use in any except the most safety-critical domains, so long as companies expect those systems to win them market dominance and/or make a profit.)

Autonomous Driving #

But then why are autonomous vehicles (henceforth, AVs) still not reliable enough to be widely used? I suspect because driving a car is not a single task, but a task complex, a bundle of many different subtasks with varying inputs. The overall reliability of driving is highly dependent on the performance of those subtasks, and failure in any one of them could lead to overall failure. Cars are relatively safety-critical: to be widely adopted, autonomous cars need to be able to reliably perform ~all subtasks you need to master to drive a car. As the distribution of the difficulties of these subtasks likely follows a power law (or something like it), the last 10% will always be harder to get right than the first 90%, and progress will look like it’s “almost there” for years before the overall system is truly ready, as has also transparently been the case for AVs. I think this is what Eden is getting at when he writes that it’s “hard to overstate the difference between solving toy problems like keeping a car between some cones on an open desert, and having a car deal with unspecified situations involving many other agents and uncertain info navigating a busy city street”.

This seems like a serious obstacle for more complex AI applications like driving. And what we want AI for is complicated tasks – simple tasks are easy to automate with traditional software. I think this is some reason to think an AI winter is more likely, but only a minor one.

One, I don’t think what has happened to AVs amounts to an AV winter. Despite expectations clearly having been unmet, and public interest clearly having declined, my impression (though I couldn’t find great data on this) is that investment in AVs hasn’t declined much, and maybe not at all (apparently 2021 saw >$12B of funding for AV companies, above the yearly average of the past decade[10]), and also that AV patents are steadily rising (both in absolute numbers and as a share of driving technology patents). Autonomous driving exists on a spectrum anyway; we do have “conditionally autonomous” L3 features like cruise control and auto lane change in cars on the road today, with adoption apparently increasing every year. The way I see it, AVs have undergone the typical hype cycle, and are now by steady, incremental change climbing the so-called slope of enlightenment. Meaning: plausibly, even if expectations for LLMs and other AI systems are mostly unmet, there still won’t be an AI winter comparable to previous winters as investment plateaus rather than declines.

Two, modern AI systems, and LLMs specifically, are quite unlike AVs. Again, cars are safety-critical machines. There’s regulation, of course. But people also just don’t want to get in a car that isn’t highly reliable (where highly reliable means something like “far more reliable than an off-brand charger”). For LLMs, there’s no regulation, and people are incredibly motivated to use them even in the absence of safeguards (in fact, especially in the absence of safeguards). I think there are lots of complex tasks that (1) aren’t safety-critical (i.e., where accidents aren’t that costly) but (2) can be automated and/or supported by AI systems.

Costs and Profitability #

Part of why I’m discussing TAI is that it’s probably correlated with other AI advancements, and part is that, despite years of AI researchers’ trying to avoid such expectations, people are now starting to suspect that AI labs will create TAI in this century. Investors mostly aren’t betting on TAI – as I understand it, they generally want a return on their investment in <10 years, and had they expected AGI in the next 10-20 years they would have been pouring far more than some measly hundreds of millions (per investment) into AI companies today. Instead, they expect – I’m guessing – tools that will broadly speed up labour, automate common tasks and make possible new types of services and products.

Ignoring TAI, will systems similar to ChatGPT, Bing/Sydney and/or modern image generators become profitable within the next 5 or so years? I think they will within 1-2 years if they aren’t already. Surely the demand is there. I have been using ChatGPT, Bing/Sydney and DALL-E 2 extensively since they were released, would be willing to pay non-trivial sums for all these services and think it’s perfectly reasonable and natural to do so (and I’m not alone in this, ChatGPT reportedly having reached 100M monthly active users two months after launch, though this was before the introduction of a paid tier; by way of comparison, Twitter reportedly has ~450M).[11]

Eden writes: “The All-In podcast folks estimated a ChatGPT query as being about 10x more expensive than a Google search. I’ve talked to analysts who carefully estimated more like 3-5x. In a business like search, something like a 10% improvement is a killer app. 3-5x is not in the running!”

An estimate by SemiAnalysis suggests that ChatGPT (prior to the release of GPT-4) costs $700K/day in hardware operating costs, meaning (if we assume 13M active users) ~$0.054/user/day or ~$1.6/user/month (the subscription fee for ChatGPT Plus is $20/user/month). That’s $700K × 365 = $255M/year in hardware operating costs alone, quite a sum, though to be fair these costs likely exceed operational costs, employee salaries, marketing and so on by an order of magnitude or so. OpenAI apparently expects $200M revenue in 2023 and a staggering $1B by 2024.

At the same time, as mentioned in a previous section, the hardware costs of inference are decreasing rapidly: the price-performance of AI accelerators doubles every ~2.1 years (Hobbhahn and Besiroglu 2022).[12] So even if Eden is right that GPT-like models are 3-5x too expensive to beat old-school search engines right now, based on hardware price-performance trends alone that difference will be ~gone in 3-6 years (though I’m assuming there’s no algorithmic progress for inference, and that traditional search queries won’t get much cheaper). True, there will be better models available in future that are more expensive to run, but it seems that this year’s models are already capable of capturing substantial market share from traditional search engines, and old-school search engines seem to be declining in quality rather than improving.

It does seem fairly likely (>30%?) to me that AI companies building products on top of foundation models like GPT-3 or GPT-4 are overhyped. For example, Character.AI recently raised >$200M at a $1B valuation for a service that doesn’t really seem to add much value on top of the standard ChatGPT API, especially now that OpenAI has added the system prompt feature. But as I think these companies may disappoint precisely because they are obsoleted by other, more general AI systems, I don’t think their failure would lead to an AI winter.

Reasons Why There Could Be a Winter After All #

Everything I’ve written so far is premised on something like “any AI winter would be caused by AI systems’ ceasing to get more practically useful and therefore profitable”. AIs being unreliable, hardware price-performance progress slowing, compute for inference being too expensive – these all matter only insofar as they affect the practical usefulness/profitability of AI. I think this is by far the most likely way that an AI winter happens, but it’s not the only plausible way; others possibilities include restrictive legislation/regulation, spectacular failures and/or accidents, great power conflicts and extreme economic downturns.

But if we do see a AI winter within a decade, I think the most likely reason will turn out to be one of:

  • Scaling hits a wall; the blessings of scale cease past a certain amount of compute/data/parameters. For example, OpenAI trains GPT-5 with substantially more compute, data and parameters than GPT-4, but it just turns out not to be that impressive.
    • There’s no sign of this happening so far, as far as I can see.
  • True out-of-distribution generalisation is far off, even though AIs keep getting better and more reliable at performing in-distribution tasks.[13] This would partly vindicate some of the LLM reductionists.
    • I find it pretty hard to say whether this is the case currently, maybe because the line between in-distribution and out-of-distribution inputs is often blurry.
    • I also think that plausibly there’d be no AI winter in the next decade even if AIs won’t fully generalise out-of-distribution, because in-distribution data covers a lot of economically useful ground.
  • We run out of high-quality data (cf. Villalobos et al. (2022)).
    • I’m more unsure about this one, but I reckon ML engineers will find ways around it. OpenAI is already paying workers in LMIC countries to label data; they could pay them to generate data, too.[14] Or you could generate text data from video and audio data. But more likely is perhaps the use of synthetic data. For example, you could generate training data with AIs (cf. Alpaca which was fine tuned on GPT-3-generated texts). ML researchers have surely already thought of these things, there just hasn’t been much of a need to try them yet, because cheap text data has been abundant.

I still think an AI winter looks really unlikely. At this point I would put only 5% on an AI winter happening by 2030, where AI winter is operationalised as a drawdown in annual global AI investment of ≥50%. This is unfortunate if you think, as I do, that we as a species are completely unprepared for TAI.

References #

Cotra, Ajeya. 2020. “Forecasting Tai with Biological Anchors.”
Erdil, Ege, and Tamay Besiroglu. 2022. “Revisiting Algorithmic Progress.”
Hobbhahn, Marius, and Tamay Besiroglu. 2022. “Trends in Gpu Price-Performance.”
Odlyzko, Andrew. 2010. “Collective Hallucinations and Inefficient Markets: The British Railway Mania of the 1840s.”
Olazaran, Mikel. 1996. “A Sociological Study of the Official History of the Perceptrons Controversy.” Social Studies of Science 26 (3): 611--59.
Sevilla, Jaime, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn, and Pablo Villalobos. 2022. “Compute Trends across Three Eras of Machine Learning.”
Villalobos, Pablo, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, and Anson Ho. 2022. “Will We Run out of Ml Data? Evidence from Projecting Dataset Size Trends.”

Footnotes #

  1. By comparison, there seems to have been a drawdown in corporate investment in AI from 2014 to 2015 of 49%, in solar energy from 2011 to 2013 of 24% and in venture/private investment in crypto companies from 2018 to 2019 of 48%. The share prices of railways in Britain declined by about 60% from 1845 to 1850 as the railway mania bubble burst (Odlyzko 2010), though the railway system of course left Britain forever changed nonetheless. ↩︎

  2. Well, this depends a bit on how you view Moore’s Law. Gordon Moore wrote: “The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.” Dennard scaling – which says that as transistors shrink, their performance improves while power consumption per unit area remains constant – failed around 2005. I think some traditionalists would say that Moore’s Law ended then, but clearly the number of transistors on a chip keeps doubling (only by other means). ↩︎

  3. William Eden actually only talks about artificial general intelligence (AGI), but I think the TAI frame is better when talking about winters, investment and profitability. ↩︎

  4. It’s interesting to note that the term AI winter was inspired by the notion of a nuclear winter. AI researchers in the 1980s used it to describe a calamity that would befall themselves, namely a lack of funding, and, true, both concepts involve stagnation and decline. But a nuclear winter happens after nuclear weapons are used. ↩︎

  5. Apparently the collapse of the LISP machine market was also a contributing factor. LISP machines were expensive workstations tailored to the use of LISP, at the time the preferred programming language of AI researchers. As AI programs were ~always written in LISP, and required a lot of compute and memory for the time, the loss of LISP machines was a serious blow to AI research. It’s a bit unclear to me how exactly the decline of LISP machines slowed AI progress beyond that, but perhaps it forced a shift to less compute- and/or memory-hungry approaches. ↩︎

  6. The question is actually operationalised as: “Will the transistors used in the CPU of Apple’s most modern available iPhone model on January 1st, 2030 be of the same generation as those used in the CPU of the most modern available iPhone on January 1st, 2025?” ↩︎

  7. That said, MosaicBERT (2023) achieves similar performance to BERT-Base (2018) with lower costs but seemingly more compute. I estimate that BERT-Base needed ~1.2e18 FLOP in pre-training, and MosaicBERT needed ~1.6e18. I’m not sure if this is an outlier, but it could suggest that the algorithmic doubling time is even longer for text models. When I asked about this, one of the people who worked on MosaicBERT told me: “[W]e ablated each of the other changes and all of them helped. We also had the fastest training on iso hardware a few months ago (as measured by MLPerf), and MosaicBERT has gotten faster since then.” ↩︎

  8. $10B may seem like a lot now, but I’m thinking world-times where this is a possibility are world-times where companies have already spent $1B on GPT-6 or whatever and seen that it does amazing things, and is plausibly not that far from being transformative. And spending $10B to get TAI seems like an obviously profitable decision. Companies spend 10x-100x that amount on some mergers and acquisitions, yet they’re trivial next to TAI or even almost-TAI. If governments get involved, $10B is half of a Manhattan-project-equivalent, a no-brainer. ↩︎

  9. Example prompt: “Can you sort this list in ascending order? [0, 8, 6, 5, 1, 1, 1, 8, 3, 7]”. ↩︎

  10. FT (2022): “It has been an outrageously expensive endeavour, of course. McKinsey put the total invested at over $100bn since 2010. Last year alone, funding into autonomous vehicle companies exceeded $12bn, according to CB Insights.” – If those numbers are right, that at least suggests the amount of funding in 2021 was substantially higher than the average over the last decade, a picture which seems inconsistent with an AV winter. ↩︎

  11. Well, there is the ethical concern. ↩︎

  12. I’m not exactly sure whether this analysis is done on training performance alone, but I expect trends in training performance to be highly correlated with trends in inference performance. Theoretical peak performance isn’t the only thing that matters – e.g. interconnect speed matters too – but it seems like the most important component.

    I’m also guessing that demand for inference compute is rising rapidly relative to training compute, and that we may be seeing R&D on GPUs specialised on inference in future. I think so far that hasn’t been the focus as training compute has been the main bottleneck. ↩︎

  13. By true out-of-distribution generalisation, I mean to point at something like “AI systems are able to find ideas obviously drawn from outside familiar distributions”. To make that more concrete, I mean the difference between (a) AIs generating entirely new Romantic-style compositions and (b) AIs ushering in novel kinds of music the way von Weber, Beethoven, Schubert and Berlioz developed Romanticism. ↩︎

  14. I’m not confident that this would scale, though. A quick back-of-the-envelope calculation suggests OpenAI would get the equivalent of about 0.016% of the data used to train Chinchilla if it spent the equivalent of 10 well-paid engineers’ salaries (in total ~$200K per month) for one year. That’s not really a lot.

    That also assumes:

    1. A well-paid engineer is paid $200K to $300K annually.
    2. A writer is paid $10 to $15 per hour (this article suggests OpenAI paid that amount for Kenyan labourers – themselves earning only $1.32 to $2 an hour – to provide feedback on data for ChatGPT’s reinforcement learning step).
    3. A writer generates 500 to 1,500 words per hour (that seems reasonable if they stick to writing about themselves or other things they already know well).
    4. A writer works 9 hours per day (the same Kenyan labourers apparently worked 9-hour shifts), about 21 days per month (assumes a 5-day work week).
    5. Chinchilla was trained on ~1.4T tokens which is the equivalent of ~1.05T words (compare with ~374B words for GPT-3 davinci and ~585B words for PaLM) (Sevilla et al. 2022). I use Chinchilla as a point of comparison since that paper, which came after GPT-3 and PaLM were trained, implied LLMs were being trained on too little data.

    Those assumptions imply OpenAI would afford ~88 labourers (90% CI: 66 to 118) who’d generate ~173M words per year (90% CI: 94M to 321M), as mentioned the equivalent of 0.016% of the Chinchilla training data set (90% CI: 0.009% to 0.031%). And that implies you’d need 6,000 years (90% CI: 3,300 to 11,100) to double the size of the Chinchilla data set. ↩︎

Read the whole story
72 days ago
Today in "fractally wrong", sigh.

If I was a betting man, I'd bet just the opposite here: >90% odds of a crash in investment in LLMs within a decade. They are right up against the limits of what they can reasonably be expected to do already; to get the machine to do what people actually want out of it will take an entirely different approach.

(Also, the Moore's Law that everyone is actually thinking of when they say Moore's Law -- *single thread* IPS -- ended no later than 2005. It's a testament only to marketing and memetic inertia that we're still talking like it's still a thing twenty years later.)
Pittsburgh, PA
Share this story
Next Page of Stories