AI workers transform generative AI from being a toy and a niche application into genuine productivity drivers. Let’s take a closer look at how this works, and how it could completely change the way we do business.
A Letter from the CEOs
In January this year, the CEOs of the world’s largest companies met in Davos. A key phrase we repeatedly heard from these leaders was that in 2024, we won’t just be discussing AI, but we’ll finally be implementing platforms with real impact: These AI platforms will fundamentally transform how we work, conduct research, provide customer services and generate revenue. Almost every CEO promised to invest massively in 2024.
Now, it’s clear that 2024 is already well underway. At this point, we can assess the progress made in AI:
Hmm. Uh. Aha. So, beyond tech companies, not much has happened yet.
Sure, sure, everybody and their moms use ChatGPT and co. to write letters, articles, posts, and to create Python functions, images and videos. And there is a myriad of new chatbots in customer service. All of this has the potential to boost productivity — by let’s say 3%. Maybe even 5%. Across the whole economy.
Nice. Nice. Very nice.
But far from a game changer.
Why is that? Why isn’t AI having a more significant impact yet?
Because AI is not yet ready?
No! Because we are not ready.
We are using it wrong!
The two-gig applications mentioned earlier—everyday assistants and first-level support—are useful and important, but they won’t fundamentally transform the way we do business.
The Lion of the AI Zoo
Let’s look at the history of AI-based systems for white collar tasks and see what they can do.
A short note on the classic, pre-generative AI systems (conversational AI-based utterance-intent matching). Although these systems are still in use, they are no longer being built today. These are the old school assistants that usually don’t understand you because they basically must have every question and answer scripted by humans. They are yelled at by people on the phone and threatened with sexualized violence in the car. The poorest of all bots. Really.
When it comes to genAI systems, many are now only available as augmented genAI — because the pure genAI systems have hallucinated themselves out of the shortlist of relevant applications. What is interesting that augmented genAI systems can use hallucination control and RAG to answer user questions reliably based on documents, databases and APIs. I have helped develop some of these systems myself: They are great. Period. For the time being, at least.
A small but decisive limitation for the augmented genAI systems: They can only solve minor tasks: Answering user queries, classifying and extracting documents, writing answers, etc. All these above-mentioned gig jobs.
Most of us white-collar workers work differently: we work for hours, days, weeks, sometimes years on a single task. We can navigate through relatively complex and multi-layered processes to reach a certain result. We are used to waiting for an answer from a customer or a decision from the boss and can then seamlessly integrate the new information into a task. Our tasks are not to “answer a user’s question”, rather they are:
- Teach grade 4 algebra
- Defend an accident driver in court
- Plan and execute a marketing campaign for a new product
- Optimize the logistics processes in our food division
- Do a visual and content refresh of a website
- Process all incoming insurance claims
- Reduce the power consumption of the lighting in our offices
- Write a quote for a complex RFQ
All these processes can only be achieved through a multitude of individual activities consisting of the classification, extraction and generation of information. This is where agentic AI comes into play. In principle, we are talking about agents that can pursue a goal over many steps. They can wait for days and keep you up to date on your project. They can also communicate with various technical and human counterparts to retrieve the necessary data.
The Grasshopper and the Ant: First Generation AI Agents vs. AI Workers
Of course, there is not just one form of agent-based AI but several. Here, I’ll focus on the two most prominent and important ones:
1) Grasshoppers: The first-generation AI agents, so called ReAct agents, are completely free in their task processing, meaning you can ask them to do virtually anything. They first plan for a specific task (like “Stop climate change”, “Destroy the world”, “Plan a business trip to Singapore”), then carry out the individual steps (e.g. flight planning or evening arrangements) themselves which model queries, API queries, web searches they needed to obtain or generate the data. The idea is cool and it’s very close to AGI (artificial general intelligence).
2) Ants. We have coined the term AI Worker for it. The AI workers have a much smaller problem and solution space compared to the agents. For example, a worker can easily find a lost parcel, but it can’t post a job ad and then evaluate the applicants. But another worker could, as each worker is specialized and follows a pre-defined plan issued by the product owner. This gives them a kind of framework, a kind of exoskeleton whereby processing can take place.
At first glance, the grasshoppers are more fascinating than the ants. But only at first glance: The sad thing is, the ReAct agents don’t work at all. Apart from perhaps amusing test cases, they hardly provide any meaningful results. One reason for this could be that they build up larger and larger errors as they work through the individual steps. At the moment, first-generation free AI agents are the agentic counterparts to GPT-2 models. You can’t really use this technology for business applications. But after GPT-2 came GPT-3. Maybe we’ll get backpropagation to work via a complete agent workflow. This could turn workflow creation into a trainable task. I’m saving it for the end of 2025.
But How Does an AI Worker Work?
The crazy thing is: The AI worker works like a professional human white collar worker.
Imagine a customer submits an insurance claim: ‘Hello, there’s been heavy rain at our house, and parts of the first floor have been flooded and damaged. Here is …’ The AI worker doesn’t consider how to solve the issue on its own. Instead, it follows a strict, predefined procedure:
- He reads the customer email; he looks at the attachments such as invoices and damage images.
- He creates a claim.
- He checks whether it is a duplicate.
- He checks whether the customer is insured with his insurer.
- What policy the customer has.
And so on. 15 -20 steps. With 3–5 sub-steps each.
The worker proceeds strictly according to his company’s instructions. It uses its intelligence when evaluating the documents, extracting information, classifying and evaluating, i.e. in the individual sub-steps. The overall process always remains the same for him. He is just a worker.
Now you might think … that’s boring! The AI could be creative and come up with its own approach to solving the problem. Yes, it could. But that’s not what we need, nor what we want in most cases. Not even with human clerks.
We want the human insurance employee and the AI worker to act according to a set of defined and transparent rules, because then …
- We act in accordance with contracts and regulations.
- We avoid or minimize customer complaints.
- We save money and only reimburse justified claims.
- We have a consistent, reproducible, verifiable process that can be supported by software.
The point of the AI worker is not that it solves the problem better or more creatively, but more reliably (such as no typos or transcription errors) than a human. And above all, much, much faster and cheaper.
And this is exactly how 70% of our white-collar jobs actually work: we solve problems, but without reinventing the problem-solving process for each and every case: We follow curriculums, procedural instructions, development models, operating instructions, laws and regulations. We bring it to life for a specific case. And THAT is what the AI worker can do for us. 100x faster than any human.
So, what does an AI worker look like?
You can read the full story here.
Follow me on Medium or LinkedIn for updates and new stories on generative AI and prompt engineering.