The pipeline learns to watch television
In Report 1 we built a machine that takes a scene and returns its O*NET vector โ the specific federal occupation codes it depicts, and the tasks, work activities, and skills inside them. That version read words. It worked on transcripts.
But most of what a job looks like never makes it into dialogue. A doctor does not say "I am now bag-valve-masking the patient." She just does it, on camera, while saying something else entirely. So we taught the pipeline to watch: it now samples still frames from a clip, hands them to Claude alongside the audio transcript, and grounds each extracted task in either a line of dialogue or a thing visible in the frame.
Then we pointed it at the most obvious target available โ the television doctor โ and asked a question you can only answer with thirty years of footage: has the job changed? We took six clips from ER (which premiered in 1994) and six from The Pitt (2025), both minute-to-minute emergency-medicine dramas, and ran them through the same O*NET machine.
One frame in, four tasks out
Here is the method, made concrete. The model sees a frame, identifies the occupations present, and returns the specific O*NET profile items it can justify. Items tagged visual are ones it drew from the image rather than the soundtrack โ the part a transcript-only reading would have missed.
ER ยท "Carter Running a Trauma" ยท scene 12
Occupations detected: Emergency Medicine Physicians, Paramedics. Items returned:
Still: Warner Bros. Television / ER (NBC). Low-resolution frame reproduced for non-commercial research commentary. visual marks evidence taken from the image, not the dialogue.
The same frame, thirty years later, in a show built around a single emergency department shift:
The Pitt ยท "Broken Pacemaker" ยท scene 18
Occupations detected: Emergency Medicine Physicians, Registered Nurses, Paramedics. Items returned:
Still: Warner Bros. Television / The Pitt (HBO Max). Low-resolution frame reproduced for non-commercial research commentary.
Across all twelve clips, 16% of the evidence the model returned was visual โ grounded in the frame, invisible to the transcript. Crash carts, defibrillator paddles, IV lines, the choreography of a trauma bay. That is the part of the job television shows you instead of telling you, and it is exactly the part the old pipeline was blind to.
The job barely moved โ and moved completely
Measured by the raw skill emphasis, the two eras are almost the same job. The cosine similarity between ER's and The Pitt's O*NET skill vectors is 0.91. Registered Nurses and Emergency Medicine Physicians are the top two occupations in both, by a wide margin. It is still, unmistakably, the emergency room.
What changed is the edges of the portrayal โ and they changed in two clear directions.
First, the camera moved into the bay and stayed there. Healthcare occupations rose from 82% of all evidence in ER to 91% in The Pitt. ER's clips spend real time on the civic apparatus around the medicine: firefighters wheeling in the wounded, a hospital administrator, attendings teaching residents, even a news crew and a 911 dispatcher. The Pitt strips almost all of it away and holds on the resuscitation.
Second, a whole new kind of work appears. An entire branch of the occupation taxonomy that is absent from the ER clips โ Community & Social Service โ shows up in The Pitt at 8% of all evidence, driven by Mental Health and Substance Abuse Social Workers. The tasks that appear only in 2025 are telling: substance-abuse counseling, mental-health assessment, educating patients about their illness and community resources, care coordination, discharge planning. The tasks that appear only in ER are the opposite flavor: hands-on CPR, fire-and-rescue radio traffic, facility supervision, a press briefing.
So the one-line reading: the 1990s show framed the emergency room as a hub in a wider civic system โ fire, police, press, hospital management, teaching. The 2025 show reframes the same job as medicine plus behavioral health, addiction, and the psychosocial work of holding a patient together. The doctor will see you differently now.
A quick aside on what this might say about work, not just television. It's tempting to read the drift as a change in what we find interesting about medicine itself. In 1994, the technology was the spectacle โ the paddles, the monitors, the crash cart, the sheer procedural novelty of a trauma bay was the draw, the thing that signaled this is what a doctor does. Thirty years on, all of that is ambient; we take it for granted that the machines work. What's left to dramatize โ what now reads as the essence of the job โ is the psychological and the interpersonal: breaking the bad news, steadying the frightened patient, the social worker's caseload, the judgment call. The equipment became furniture and the relationships became the story. And that's the part worth dwelling on: as technology quietly absorbs the technical core of a profession, the thing we come to see as the real job โ the part with the status, the drama, the value โ migrates to whatever the machines can't touch.
What this is and isn't
This is a pilot, and an honest one. It is six clips per era, drawn from whatever is officially posted online โ not a scene-matched sample. ER's longer trauma clips and The Pitt's shorter ones each carry their own quirks; some of the specific percentages would move with a larger, hand-matched corpus. The direction of the finding is the interesting part, not the third decimal place. Automatic transcription and visual extraction add their own noise.
But the method clearly works. The pipeline now reads a moving image and returns the federal taxonomy of work it depicts โ frame by frame, task by task. Television is just the first thing we pointed it at. Again.