Simon Willison’s Weblog

On openai 163 markdown 18 blogging 51 writing 23 ethics 106 ...

 

Recent

July 13, 2024

Give people something to link to so they can talk about your features and ideas

If you have a project, an idea, a product feature, or anything else that you want other people to understand and have conversations about... give them something to link to!

[... 672 words]

We respect wildlife in the wilderness because we’re in their house. We don’t fully understand the complexity of most ecosystems, so we seek to minimize our impact on those ecosystems since we can’t always predict what outcomes our interactions with nature might have.

In software, many disastrous mistakes stem from not understanding why a system was built the way it was, but changing it anyway. It’s super common for a new leader to come in, see something they see as “useless”, and get rid of it – without understanding the implications. Good leaders make sure they understand before they mess around.

Jacob Kaplan-Moss

# 2:12 pm / jacob-kaplan-moss, software-engineering, management

Add tests in a commit before the fix. They should pass, showing the behavior before your change. Then, the commit with your change will update the tests. The diff between these commits represents the change in behavior. This helps the author test their tests (I've written tests thinking they covered the relevant case but didn't), the reviewer to more precisely see the change in behavior and comment on it, and the wider community to understand what the PR description is about.

Ed Page

# 5:28 am / git

Third, X fails to provide access to its public data to researchers in line with the conditions set out in the DSA. In particular, X prohibits eligible researchers from independently accessing its public data, such as by scraping, as stated in its terms of service. In addition, X's process to grant eligible researchers access to its application programming interface (API) appears to dissuade researchers from carrying out their research projects or leave them with no other choice than to pay disproportionally high fees.

European Commission

# 3:52 am / apis, twitter, europe

July 12, 2024

Free-threaded CPython is ready to experiment with! The Python 3.13 beta releases that include a "free-threaded" version that removes the GIL are now available to test! A team from Quansight Labs, home of the PyData core team, just launched py-free-threading.github.io to help document the new builds and track compatibility with Python's larger ecosystem.

Free-threading mode will not be enabled in Python installations by default. You can install special builds that have the option enabled today - I used the macOS installer and, after enabling the new build in the "Customize" panel in the installer, ended up with a /usr/local/bin/python3.13t binary which shows "Python 3.13.0b3 experimental free-threading build" when I run it.

Here's my TIL describing my experiments so far installing and running the 3.13 beta on macOS, which also includes a correction to an embarrassing bug that Claude introduced but I failed to catch!

# 11:42 pm / concurrency, gil, python, threading

datasette-python. I just released a small new plugin for Datasette to assist with debugging. It adds a python subcommand which runs a Python process in the same virtual environment as Datasette itself.

I built it initially to help debug some issues in Datasette installed via Homebrew. The Homebrew installation has its own virtual environment, and sometimes it can be useful to run commands like pip list in the same environment as Datasette itself.

Now you can do this:

brew install datasette
datasette install datasette-python
datasette python -m pip list

I built a similar plugin for LLM last year, called llm-python - it's proved useful enough that I duplicated the design for Datasette.

# 11:17 pm / plugins, projects, python, datasette

Searching an aerial photo with text queries. Robin Wilson built a demo that lets you search a large aerial photograph of Southampton for things like "roundabout" or "tennis court". He explains how it works in detail: he used the SkyCLIP model, which is trained on "5.2 million remote sensing image-text pairs in total, covering more than 29K distinct semantic tags" to generate embeddings for 200x200 image segments (with 100px of overlap), then stored them in Pinecone.

# 6:07 pm / gis, embeddings, clip

The Death of the Junior Developer (via) Steve Yegge's speculative take on the impact LLM-assisted coding could have on software careers.

Steve works on Cody, an AI programming assistant, so he's hardly an unbiased source of information. Nevertheless, his collection of anecdotes here matches what I've been seeing myself.

Steve coins the term here CHOP, for Chat Oriented Programming, where the majority of code is typed by an LLM that is directed by a programmer. Steve describes it as "coding via iterative prompt refinement", and argues that the models only recently got good enough to support this style with GPT-4o, Gemini Pro and Claude 3 Opus.

I've been experimenting with this approach myself on a few small projects (see this Claude example) and it really is a surprisingly effective way to work.

Also included: a story about how GPT-4o produced a bewitchingly tempting proposal with long-term damaging effects that only a senior engineer with deep understanding of the problem space could catch!

I'm in strong agreement with this thought on the skills that are becoming most important:

Everyone will need to get a lot more serious about testing and reviewing code.

# 3:34 pm / steve-yegge, ai, generative-ai, llms, ai-assisted-programming

Fighting bots is fighting humans [...] remind you that "only allow humans to access" is just not an achievable goal. Any attempt at limiting bot access will inevitably allow some bots through and prevent some humans from accessing the site, and it's about deciding where you want to set the cutoff. I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction.

Molly White

# 2:45 pm / ai, molly-white

Why The Atlantic signed a deal with OpenAI. Interesting conversation between Nilay Patel and The Atlantic CEO (and former journalist/editor) Nicholas Thompson about the relationship between media organizations and LLM companies like OpenAI.

On the impact of these deals on the ongoing New York Times lawsuit:

One of the ways that we [The Atlantic] can help the industry is by making deals and setting a market. I believe that us doing a deal with OpenAI makes it easier for us to make deals with the other large language model companies if those come about, I think it makes it easier for other journalistic companies to make deals with OpenAI and others, and I think it makes it more likely that The Times wins their lawsuit.

How could it help? Because deals like this establish a market value for training content, important for the fair use component of the legal argument.

# 2:35 pm / new-york-times, ai, openai, llms, nilay-patel

July 11, 2024

[On Paddington 3] If this movie is anywhere near as good as the second one, we are going to need to have an extremely serious conversation about this being one of the greatest film trilogies ever made.

Brian Grubb

# 9:11 pm / film, brian-grubb

The economics of a Postgres free tier (via) Xata offer a hosted PostgreSQL service with a generous free tier (15GB of volume). I'm very suspicious of free tiers that don't include a detailed breakdown of the unit economics... and in this post they've described exactly that, in great detail.

The trick is that they run their free tier on shared clusters - with each $630/month cluster supporting 2,000 free instances for $0.315 per instance per month. Then inactive databases get downgraded to even cheaper auto-scaling clusters that can host 20,000 databases for $180/month (less than 1c each).

They also cover the volume cost of $0.10/GB/month - so up to $1.50/month per free instance, but most instances only use a small portion of that space.

It's reassuring to see this spelled out in so much detail.

# 7:26 pm / postgresql, startups, saas

My main concern is that the substantial cost to develop and run Al technology means that Al applications must solve extremely complex and important problems for enterprises to earn an appropriate return on investment.

We estimate that the Al infrastructure buildout will cost over $1tn in the next several years alone, which includes spending on data centers, utilities, and applications. So, the crucial question is: What $1tn problem will Al solve? Replacing low-wage jobs with tremendously costly technology is basically the polar opposite of the prior technology transitions I've witnessed in my thirty years of closely following the tech industry.

Jim Covello, Goldman Sachs

# 2:35 am / ai, generative-ai

July 10, 2024

Early Apple tech bloggers are shocked to find their name and work have been AI-zombified (via)

TUAW (“The Unofficial Apple Weblog”) was shut down by AOL in 2015, but this past year, a new owner scooped up the domain and began posting articles under the bylines of former writers who haven’t worked there for over a decade.

They're using AI-generated images against real names of original contributors, then publishing LLM-rewritten articles because they didn't buy the rights to the original content!

# 10:48 pm / ethics, ai, slop

Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.

In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.

Alex Albert (Anthropic)

# 6:56 pm / ai, prompt-engineering, generative-ai, llms, anthropic, claude, vision-llms, alex-albert

Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.

This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:

You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in <answer> tags, think step by step in <thinking> tags and analyze every part of the image.

# 6:38 pm / ai, jupyter, generative-ai, llms, anthropic, claude, vision-llms

Vision language models are blind (via) A new paper exploring vision LLMs, comparing GPT-4o, Gemini 1.5 Pro, Claude 3 Sonnet and Claude 3.5 Sonnet (I'm surprised they didn't include Claude 3 Opus and Haiku, which are more interesting than Claude 3 Sonnet in my opinion).

I don't like the title and framing of this paper. They describe seven tasks that vision models have trouble with - mainly geometric analysis like identifying intersecting shapes or counting things - and use those to support the following statement:

The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses.

While the failures they describe are certainly interesting, I don't think they justify that conclusion.

I've felt starved for information about the strengths and weaknesses of these vision LLMs since the good ones started becoming available last November (GPT-4 Vision at OpenAI DevDay) so identifying tasks like this that they fail at is useful. But just like pointing out an LLM can't count letters doesn't mean that LLMs are useless, these limitations of vision models shouldn't be used to declare them "blind" as a sweeping statement.

# 6:17 pm / ai, generative-ai, llms, vision-llms

Content slop has three important characteristics. The first being that, to the user, the viewer, the customer, it feels worthless. This might be because it was clearly generated in bulk by a machine or because of how much of that particular content is being created. The next important feature of slop is that feels forced upon us, whether by a corporation or an algorithm. It’s in the name. We’re the little piggies and it’s the gruel in the trough. But the last feature is the most crucial. It not only feels worthless and ubiquitous, it also feels optimized to be so. The Charli XCX “Brat summer” meme does not feel like slop, nor does Kendrick Lamar’s extremely long “Not Like Us” roll out. But Taylor Swift’s cascade of alternate versions of her songs does. The jury’s still out on Sabrina Carpenter. Similarly, last summer’s Barbenheimer phenomenon did not, to me, feel like slop. Dune: Part Two didn’t either. But Deadpool & Wolverine, at least in the marketing, definitely does.

Ryan Broderick

# 5:43 pm / marketing, ai, slop

July 9, 2024

Claude: You can now publish, share, and remix artifacts. Artifacts is the feature Anthropic released a few weeks ago to accompany Claude 3.5 Sonnet, allowing Claude to create interactive HTML+JavaScript tools in response to prompts.

This morning they added the ability to make those artifacts public and share links to them, which makes them even more useful!

Here's my box shadow playground from the other day, and an example page I requested demonstrating the Milligram CSS framework - Artifacts can load most code that is available via cdnjs so they're great for quickly trying out new libraries.

# 10:25 pm / ai, generative-ai, llms, anthropic, claude

hangout_services/thunk.js (via) It turns out Google Chrome (via Chromium) includes a default extension which makes extra services available to code running on the *.google.com domains - tweeted about today by Luca Casonato, but the code has been there in the public repo since October 2013 as far as I can tell.

It looks like it's a way to let Google Hangouts (or presumably its modern predecessors) get additional information from the browser, including the current load on the user's CPU. Update: On Hacker News a Googler confirms that the Google Meet "troubleshooting" feature uses this to review CPU utilization.

I got GPT-4o to help me figure out how to trigger it (I tried Claude 3.5 Sonnet first but it refused, saying "Doing so could potentially violate terms of service or raise security and privacy concerns"). Paste the following into your Chrome DevTools console on any Google site to see the result:

chrome.runtime.sendMessage(
  "nkeimhogjdpnpccoofpliimaahmaaome",
  { method: "cpu.getInfo" },
  (response) => {
    console.log(JSON.stringify(response, null, 2));
  },
);

I get back a response that starts like this:

{
  "value": {
    "archName": "arm64",
    "features": [],
    "modelName": "Apple M2 Max",
    "numOfProcessors": 12,
    "processors": [
      {
        "usage": {
          "idle": 26890137,
          "kernel": 5271531,
          "total": 42525857,
          "user": 10364189
        }
      }, ...

The code doesn't do anything on non-Google domains.

Luca says this - I'm inclined to agree:

This is interesting because it is a clear violation of the idea that browser vendors should not give preference to their websites over anyone elses.

# 5:50 pm / browsers, chrome, google, ai, chatgpt, llms, claude

Deactivating an API, one step at a time (via) Bruno Pedro describes a sensible approach for web API deprecation, using API keys to first block new users from using the old API, then track which existing users are depending on the old version and reaching out to them with a sunset period.

The only suggestion I'd add is to implement API brownouts - short periods of time where the deprecated API returns errors, several months before the final deprecation. This can help give users who don't read emails from you notice that they need to pay attention before their integration breaks entirely.

I've seen GitHub use this brownout technique successfully several times over the last few years - here's one example.

# 5:23 pm / apis, github

Chrome's biggest innovation was the short release cycle with a silent unceremonious autoupdate.

When updates were big, rare, and manual, buggy and outdated browsers were lingering for soo long, that we were giving bugs names. We documented the bugs in magazines and books, as if they were a timeless foundation of WebDev.

Nowadays browser vendors can fix bugs in 6 weeks (even Safari can…). New-ish stuff is still buggy, but rarely for long enough for the bugs to make it to schools' curriculums.

Kornel Lesiński

# 10:43 am / browsers, chrome, web-standards

Inside the labs we have these capable models, and they're not that far ahead from what the public has access to for free. And that's a completely different trajectory for bringing technology into the world that what we've seen historically. It's a great opportunity because it brings people along. It gives them intuitive sense for the capabilities and risks and allows people to prepare for the advent of bringing advanced AI into the world.

Mira Murati

# 3:07 am / ai, openai, llms

July 8, 2024

Jevons paradox (via) I've been thinking recently about how the demand for professional software engineers might be affected by the fact that LLMs are getting so good at producing working code, when prompted in the right way.

One possibility is that the price for writing code will fall, in a way that massively increases the demand for custom solutions - resulting in a greater demand for software engineers since the increased value they can provide makes it much easier to justify the expense of hiring them in the first place.

TIL about the related idea of the Jevons paradox, currently explained by Wikipedia like so:

[...] when technological progress increases the efficiency with which a resource is used (reducing the amount necessary for any one use), but the falling cost of use induces increases in demand enough that resource use is increased, rather than reduced.

# 11:23 pm / ai, generative-ai, llms

Someone elsewhere left a comment like "I CAN’T BELIEVE IT TOOK HER 15 YEARS TO LEARN BASIC READLINE COMMANDS". those comments are very silly and I'm going to keep writing “it took me 15 years to learn this basic thing" forever because I think it's important for people to know that it's normal to take a long time to learn “basic" things

Julia Evans

# 11:15 pm / julia-evans

Type click type by Brian Grubb. I just found out my favourite TV writer, Brian Grubb, is no longer with Uproxx and is now writing for his own newsletter - free on Sunday, paid-subscribers only on Friday. I hit subscribe so fast.

In addition to TV, Brian's coverage of heists - most recently Lego and an attempted heist of Graceland ("It really does look like a bunch of idiots tried to steal and auction off Graceland using Hotmail accounts and they almost got away with it") - is legendary.

I'd love to see more fun little Friday night shows too.

# 9:58 pm / blogging, tv, brian-grubb

Box shadow CSS generator (via) Another example of a tiny personal tool I built using Claude 3.5 Sonnet and artifacts. In this case my prompt was:

CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS

I changed my mind half way through typing the prompt and asked it for a custom tool, and it built me this!

Box shadow CSS generator. Shows a preview, then provides sliders to set Horizontal Offset, Vertical Offset, Blur Radius,  Spread Radius,  Color and Opacity - plus the generated CSS and a Copy to Clipboard button

Here's the full transcript - in a follow-up prompt I asked for help deploying it and it rewrote the tool to use <script type="text/babel"> and the babel-standalone library to add React JSX support directly in the browser - a bit of a hefty dependency (387KB compressed / 2.79MB total) but I think acceptable for this kind of one-off tool.

Being able to knock out tiny custom tools like this on a whim is a really interesting new capability. It's also a lot of fun!

# 7:30 pm / css, projects, ai, generative-ai, llms, anthropic, claude

Geomys, a blueprint for a sustainable open source maintenance firm (via) Filippo Valsorda has been working as a full-time professional open source maintainer for nearly two years now, accepting payments on retainer from companies that depend on his cryptography Go packages.

This has worked well enough that he's now expanding: Geomys (a genus of gophers) is a new company which adds two new "associate maintainers" and an administrative director, covering more projects and providing clients with access to more expertise.

Filipino describes the model like this:

If you’re betting your business on a critical open source technology, you

  1. want it to be sustainably and predictably maintained; and
  2. need occasional access to expertise that would be blisteringly expensive to acquire and retain.

Getting maintainers on retainer solves both problems for a fraction of the cost of a fully-loaded full-time engineer. From the maintainers’ point of view, it’s steady income to keep doing what they do best, and to join one more Slack Connect channel to answer high-leverage questions. It’s a great deal for both sides.

For more on this model, watch Filippo's FOSDEM talk from earlier this year.

# 3:40 pm / go, opensource, filippo-valsorda

Voters in the Clapham and Brixton Hill constituency can rest easy - despite appearances, their Reform candidate Mark Matlock really does exist. [...] Matlock - based in the South Cotswolds, some 100 miles from the constituency in which he is standing - confirmed: "I am a real person." Although his campaign image is Al-generated, he said this was for lack of a real photo of him wearing a tie in Reform's trademark turquoise.

Private Eye

# 3:20 pm / politics, ai, generative-ai

July 7, 2024

Reasons to use your shell’s job control. Julia Evans summarizes an informal survey of useful things you can do with shell job control features - fg, bg, Ctrl+Z and the like. Running tcdump in the background so you can see its output merged in with calls to curl is a neat trick.

# 4:30 pm / unix, julia-evans

Highlights