AI in the Workplace Works, But MIT Has Found What No One Was Measuring

In January 2026, about half of American workers reported using AI. However, when we look at what they are doing with these tools—and especially what has changed in their daily work—the story becomes less clear than the headlines suggest.

MIT has just published "Humans in the Loop," a report that collects three years of direct observation from over twenty large American companies across healthcare, retail, finance, and manufacturing. This is not a theoretical paper on tasks that AI could automate, nor yet another simulation about at-risk jobs. It is a logbook of what happens when companies truly try to make generative AI work in their processes, with real people, real problems, and budgets that, at some point, someone asks to justify.

The picture that emerges is more interesting than the prophecies we are used to hearing, and quite a bit more complicated.

Three Problems, None of Which Is "Replacing People"

One of the most useful things the report does is to name the problems companies are trying to solve with generative AI. These are not new problems, and this alone is news.

The first is what researchers call the bottleneck problem: professionals are spending an increasing amount of their time on semi-repetitive tasks that require expertise but do not truly leverage it, keeping them away from higher value-added work. This is the case with a doctor who, in the evening, still has to respond to patient messages and complete clinical notes, a lawyer who finds themselves sifting through entire corpuses of documents in search of similar but not identical clauses, or a programmer dealing with writing unit tests, a task that requires attention but not real reasoning. In these cases, generative AI produces a draft that the professional reviews and corrects, resulting in real time savings, although difficult to quantify precisely.

The second is the cafeteria problem, which has a delightful name: to complete certain projects, it is necessary to gather input from experts scattered throughout the organization, and until yesterday, the only way was to literally go to the cafeteria to find them and ask for their opinion. For example, Bristol Myers Squibb used AI to accelerate the preparation of drug approval applications to the FDA, a process that requires assembling information from years of experimentation, clinical trials, and internal documentation. The idea is that AI can anticipate what experts would have said based on what they have already written in the past.

The third problem is the learning curve: helping those new to a field to orient themselves more quickly. An American retailer (referred to in the report as Tau) provided clerks with a tablet equipped with a chatbot powered by company data, and discovered that employees were not only using it to respond to customers. Some asked questions about team dynamics, or expressed insecurities they would never have communicated to a manager. A tool designed for productivity unexpectedly became a channel for bottom-up feedback.

From Executors to Supervisors, with All Associated Risks

The common pattern in all these applications is that the worker shifts from executing a task directly to supervising the output of an automated system. MIT researchers call this supervisory control, noting that it is not a new concept: airline pilots, nuclear plant operators, and technicians in automated factories have been doing exactly this work for decades, and there is extensive literature on the conditions that make these roles effective (or disastrous).

The point is that the quality of these supervisory jobs varies enormously. Those who supervise complex systems in aviation or nuclear fields typically enjoy interesting and well-paid roles, whereas supervision of automatic machinery in a factory is generally lower-paid and tends to diminish in motivation over time. With generative AI, the same risks may arise: there will be supervisory roles that require judgment and domain expertise to be performed seriously, and roles that merely involve clicking "approve" on automatically generated content without the operator having the tools to genuinely assess it.

This brings us to one of the report's most serious concerns regarding so-called mental offloading. A study cited on the results of college students using ChatGPT for research assignments shows that these students had lower brain activity and a much poorer ability to remember their work compared to those using a traditional search engine or their own memory. The task was completed nonetheless, but the student learned nothing in the process. If this dynamic is extended to the world of work, the risk is that tools designed to help people climb the learning curve end up keeping them permanently on the flatlands.

The CFO's Revenge

There is a passage in the report worth reading on its own. After the initial phase of enthusiastic experimentation, where companies opened sandboxes for employees and measured success based on the number of users, came what the researchers call "the revenge of the CFO". At some point, someone asked: well, how much does it cost and how much does it earn us?

And the answers, for many applications, were nonexistent. Companies studied time savings but often neglected to measure the consequences on work quality, the need for AI output revisions, and indirect costs. A manufacturing producer eventually stopped trying to measure productivity gains from AI altogether because there were too many variables at play: product demand, team capacity, the supply chain. Where they could measure something was in the new capabilities AI provided workers, which is a different indicator than classic productivity and much harder to convey in an Excel sheet for the board.

A predictable but instructive phenomenon also occurred: many companies transitioned from building their AI applications internally (the hackathon and customized tools phase) to purchasing ready-made solutions from vendors. This makes economic sense but raises a strategic question the report poses without fully answering: if companies stop developing internal technical capabilities and rely on suppliers, what happens to their ability to really understand what the tools they use are doing?

Agents, the Next Chapter of a Familiar Story

The report dedicates a section to AI agents, and does so in a tone I would describe as sober to the point of severity. Agents, the researchers observe, represent an incremental evolution over Robotic Process Automation, which already existed and had known limitations: it only worked on completely routine tasks, and when the process changed, everything needed to be reprogrammed. Agents can manage variations and semi-structured tasks, which is a real advancement. But the notion that they can remove humans from the flow for most high-stakes tasks remains, at the moment, more of a commercial promise than an observable fact.

For tasks where quality matters and variables are many, human supervisors end up playing an even more critical role than in traditional automation systems, precisely because the system makes less predictable decisions. It is a paradox known to those who have worked with automated systems: the more autonomous the system, the more competent the supervisor needs to be to intervene at the right moments.

What Did Not Happen

Perhaps the most significant part of the report is the list of things that did not happen. The companies observed by MIT did not carry out mass layoffs related to AI. The adoption processes have been incremental, with most organizations still in the proof-of-concept stage after three years. The promise that AI would level skills, benefiting especially the less experienced, has only partially materialized: for tasks related to the learning curve, yes, the less experienced do gain more from using AI. But for tasks that require integrating information from different sources or managing document bottlenecks, solid domain expertise is still needed to evaluate whether AI output makes sense or produces well-packaged garbage.

And the idea that companies could use AI to bypass their infrastructural limits (disorganized data, fragmented documentation, lack of structured databases) collided with reality: without an organized data foundation to reference, there is no way to verify whether what AI returns is reliable, and therefore, there is no way to build trust in the tool. Those who haven't done their homework with their data won't find in generative AI a shortcut to avoid it.

Ten Lessons That Also Apply to Italy

The report ends with ten practical recommendations, some of which are more interesting than they initially appear.

The first is the most obvious and the most ignored: gather evidence before scaling. The AI applications that performed best were those addressing already identified and studied problems before generative AI existed. Those with a clear problem knew exactly what to look for, had an idea of how to measure potential success, and had asked in advance when to pull the plug.

The second concerns diversity in the use of tools. Workers use AI in very different ways even when doing the same job, and this variability is an advantage for organizations that have the patience to observe what works and under which conditions, instead of imposing a single top-down model.

But the recommendations that struck me most relate to preserving teamwork and domain expertise. The report notes that some companies have observed a decrease in collaboration among colleagues since AI allows individual completion of tasks that previously required team input. This provides immediate and measurable time savings but loses everything that collaborative work produces in terms of shared learning, mutual trust, and the group's ability to tackle future problems.

On domain expertise, the researchers reach a conclusion that sounds counterintuitive but is perfectly logical: precisely because generative AI can produce large volumes of output in fields such as medicine, computer science, and life sciences, more domain experts will be needed than before, because someone must interpret what AI produces and understand whether it can be taken seriously. Those who envision that AI will render junior professionals obsolete may find, in a few years, that they have created an even bigger bottleneck than those they were trying to resolve.

A Story Just Beginning

Three years of observation on large American companies are a limited sample, and MIT researchers know this. The organizations studied are mostly established large firms in traditional sectors, not AI-native startups that might one day reshape entire industries starting from scratch. And the fact that the changes observed so far have been incremental does not mean they will remain so forever.

But this is precisely why the report has value: it captures the moment when companies are still deciding how to use these tools, and documents the choices, mistakes, and surprises of this phase. Those making decisions today about how to integrate AI into work would do well to read it, because it is in this phase that the degree of control over the transition will be established, and this depends on operational and design choices that now seem small but will continue to create effects for years.

The complete report "Humans in the Loop: The Evolution of Work in Early Experiments with Generative AI" is freely available on the MIT Industrial Performance Center website.