Learning and Working in the Age of AI: The Best Tools, the Worst Temptations

The tools for learning have never been better. They're astonishing. The temptation to skip the learning part has also never been stronger. Work and learning are different activities, but they blur together. The same tool can serve one while quietly sabotaging the other.

Gym or construction site?

Helen Toner, most famous for being on the OpenAI board that fired Sam Altman (the board that was subsequently fired by the same Sam Altman following his rapid return) has a very useful framing for the core issue. On a construction site, machines are essential for enhancing how much work people can do. We go to a gym to deliberately work our bodies.

On the building site, a (skilled) worker will outsource everything that can be outsourced to machines. Cranes will lift, power saws will cut, electric screwdrivers will drive screws. You've got limited hours and you want the outcome, the finished product, as efficiently as possible. Most work is like this.

At the gym, the whole point is to use your muscles deliberately. Even when you use machines there, it is to work particular muscles. The product is a worked body. It is supposed to be difficult, even painful (though still enjoyable - most of the time). Learning is usually like this.

Of course, we must learn at work too, often subjects that would not interest us if we were not being paid to think about them - it is here that the temptation to produce without learning is greatest.

New AI tools are overwhelmingly useful for knowledge work - but skilled workers are still required to wield them. Skilled workers come from deep understanding of their workspace, their problems, their processes. As we gain the ability to produce quickly with AI there's a risk that we lose these skills if we forget to workout at the gym.

NotebookLM: a masterpiece with a catch

Google's NotebookLM is the best specialised learning tool I've ever used. What started as an AI-generated podcast app has become a multimodal marvel - animations, decks, flashcards, the famous podcast-style overviews, all grounded in sources you provide or that it pulls via search, a Google forte of course.

NotebookLM

Here's a video essay on British rationing for kids - I think this is fantastic. I can catch some classic LLM style turns of phases but this is an excellent source for my son. We generated a 15 slide dossier-style deck pitched at the right level for him in one-shot. It's historically accurate and very well done. I caught a single arrow not pointing to the right part of the map, and the food on a plate was the wrong size, both image issues that I could have easily fixed in the editor.

NotebookLM dossier deck on British rationing

The set of artefact options are great, and their quality is surprisingly high. Which is the catch.

They're so good, and so easy to generate, that you can picture a child (or anyone) "doing a project" on the water cycle by uploading three PDFs, clicking a button, and handing in a decent deck. A passable artefact has been produced. Work has been done. How much learning happened is another question. My kid's school asked for new emblem ideas last week and had to specify "no AI images please".

Even vanilla Claude (or ChatGPT) can produce outstanding learning paths and visualisations - just extraordinary for learning...or cheating :)

Used well, engaging with the summaries, testing yourself on the flashcards, treating the podcasts as a prompt to go read the source, it's a genuine force-multiplier for learning. Used the easy way, it's polished output with no understanding underneath. Same tools, same outputs, very different outcomes for the human. Are you working the tool or working your mind? Gym or construction site?

Claude explainer flashcard

What the evidence says

A 2025 RCT from Bastani et al tested this directly. They split 120 people into a traditional study group and a ChatGPT group, then ran a surprise retention test 45 days later. The traditional group scored ~69%. The ChatGPT group scored ~58%. The AI group showed a steeper forgetting curve, 'consistent with weaker initial encoding'. The work was done at the construction site, but their minds were not worked as hard as the traditional groups. They understood less later on. Listed here with many other study links.

Out of the lab and in the real world Anthropic's AI Fluency Index (Feb 2026) analysed around 9,800 multi-turn conversations and measured 11 specific fluency behaviours.

They were able to show higher AI fluency in users who iterated and refined their work. Pushing the model, by questioning its reasoning or lack of full context, drove better results. Accepting the first version was often suboptimal.

Yet interestingly, when the first version of a particular artifact looked great, even fluent users accepted it - they abandoned their good habits.

In conversations where artifacts are created, users are less likely to identify missing context (-5.2pp), check facts (-3.7pp), or question the model's reasoning by asking it to explain its rationale (-3.1pp). anthropic

Just like me accepting the impressive first versions from NotebookLM. When the output looks finished, we stop interrogating it. Exactly the wrong reflex, as the report notes the ability to critically evaluate those outputs will become more valuable.

The three tips for developing AI fluencey from the report

Stay in the conversation - refine, iterate, push back and question.
Question polished outputs - don't be wowed by the style, check the substance. Ask the model is it accurate, what is missing?
Set the terms of the collaboration - Explicitly tell Claude to push back on weak assumptions, to look for gaps in thinking, to explain its own reasoning. I found asking for confidence intervals works well (what % certain for any output).

On understanding

How much do you learn from slick videos, polished content, even if you generate it yourself? We need to know when to reach for the tools and when to do without, to not trade ease over depth, speed over struggle. Kids need to learn to use their minds at the gym, especially when the gyms are this flashy.

We can outsource output, even some thinking, but not understanding. The gym is supposed to be hard.

Thinking