There’s an image that has stuck with me. Last spring, we took our 18-month-old to a hotel in the south of England. It had a spa, a pool, some gardens. And, crucially, a playroom for children. I’d taken a book, hoping I’d get a chance to catch up on reading.
I didn’t get a chance to catch up on reading.
Instead, my son woke early every day, and I’d made an espresso in our room before taking him downstairs to the playroom, to give my wife a lie-in. Here I was pre-6am, sitting there with my unread book and my coffee. Watching him learn to draw, clambering onto his tiny chair with his crayons and crumpled paper.
And it was wonderful. I was tired and if you’d asked me before what I’d be like to be doing at 5.45am on holiday, I’d have probably said anything but this. But I was wrong.
How do we know what we want? One of the landmark papers that influenced the development of ChatGPT is called ‘Deep Reinforcement Learning from Human Preferences’. Rather than get humans to articulate what they wanted machines to do, researchers at OpenAI took a simpler approach. They showed humans two videos of AI behaviour, then asked them to choose which was better. Then they did it again and again. And the AI got better and better, until it mastered a range of tasks.
In other words, humans can’t always explain what they want, but they know it when they see it.
Which makes me think of that image, of my son playing with crayons while I sit tired on the carpet next to him. Of all the images I’ve looked through while putting together a collection for his upcoming second birthday, that’s the one that’s stuck with me. The one pre-dawn, exhausted, looking at crayons instead of a book. Something I’d never have said I’d wanted if you’d asked me beforehand. But something that is everything I want once I’ve seen it.
Thank you for sharing such an intimate moment of deep observation and learning.
That's a lovely moment. You can never be prepared for the joys that parenthood can bring.
What's interesting about the AI story is that it isn't really AI, given the need for constant human intervention. This is definitely a good thing, unless we try to pretend that it's all done by computers.