ChatGPT, Next Level: Meet 10 Autonomous AI Agents - Auto-GPT, BabyAGI, AgentGPT, Microsoft Jarvis, ChaosGPT & Friends

7. June, 2023 | Maximilian Vogel

The ultimate curated list of autonomous AI agents: complete with tools, resources and examples

ChatGPT and many of the other current foundation models are great. They can answer innumerable questions, create AI art that rivals human masterpieces, analyze photos, and in some cases, they even show what we would call intelligence.

But there’s one simple challenge they’ve yet to conquer — to efficiently complete a laborious task made up of distinct steps.

Currently, AI models are like eager office interns, tireless and enthusiastic but desperately in need of guidance. They require monitoring, frequent directions, and vigilance against fudging or half-truths (aka “hallucinations”).

This is where AI agents step in. Agent AIs can do this autonomously. These autonomous helpers take user input, break it down into smaller tasks with the assistance of LLMs, and tackle them one at a time. The agents store the results and use them, if necessary, for subsequent steps in the process. As a result, AI Agents can handle complex tasks and access various foundation models that are not limited to language alone. For example, an agent might independently decide to utilize code, video or voice models; employ search engines or calculation tools to accomplish the task you’ve given it

Image Credit: Maximilian Vogel

The autonomous agents are not simply smarter than the foundation models on which they are based, but open up a completely new dimension: They are capable of “slow thinking” (Kahneman’s “system 2”). They solve complicated questions — in which one crawls to the goal bit by bit via intermediate results. Until now, slow thinking was only possible for LLMs via prompting techniques such as chain-of-thought, and here also only to a very limited extent.

While the addressable level of complexity does not increase significantly with agent AIs, they cover an incredible amount of additional area in the problem space (dashed red box) due to their ability to solve complicated problems: In other words, everything that requires more than a few steps to solve.

Content:

Intro — What are autonomous agent AI

Let’s say we want to use an AI model to create a deck of 52 cards, with each card featuring a different musician. We’d also like to substitute the usual card suits such as clubs or hearts to different music genres, such as soul or house.

Is it possible for an AI model to complete such a complex task?

The simple answer is no.

While a language model can compile a list of genres and artists, we need at least one additional model (AI art model such as Midjourney) to produce the visuals. We may also need additional systems to search the internet and to store contents.

We could write a batch processing script doing all this.

Or — and here our agent AIs fly in — we could just provide a prompt telling what we want to do, and the agent writes the batch script, executes it and monitors the outcome.

Usually, AI agents use both for the single steps (ie. selecting an artist for a single card) and for framework tasks (ie. generating a task list) various external models. They are outsourcing the thinking steps while storing information, tracking tasks, managing interface and orchestrating the entire process.

Image credit: Maximilian Vogel, note: This is an illustrative example only — the results of most current AI agents are not as overwhelming.

Autonomous AI agents have only emerged in the last few weeks, but they’re already developing at breakneck speed. Even Microsoft is getting in on the action with Jarvis / HuggingGPT. I’ll give a brief introduction to some of the main AI agents and discuss possible impacts on application development, along with AI safety.

AgentGPT

Assemble, configure, and deploy autonomous AI Agents in your browser.

This is the first model in the list, not because it is the most important, but because no installation or OpenAI keys are needed.

You can just try it right now.

Features:

Platform: https://agentgpt.reworkd.ai/
Developer: Asim Shresta

Demo

Let’s deep-dive into how AgentGPT managed a job I gave it:

My task: „Find the 3 most widely used task management software tools for usage in a small company and compare them in terms of price, scope, ease of installation“

Reasoning:

Image credit: Maximilian Vogel / AgentGPT

Some intermediate output:

Image credit: Maximilian Vogel / AgentGPT

Many, many more lines of output later, we have the final result (the whole process took approximately 3 minutes):

Image credit: Maximilian Vogel / AgentGPT

Auto-GPT

An experimental and open-source agent library based on GPT-4. It chains together LLM “thoughts” to autonomously achieve whatever task you set. Auto-GPT is one of the first platforms to run GPT-4 fully and autonomously, pushing the boundaries of what is possible with AI.

Features:

Repository: https://github.com/Significant-Gravitas/Auto-GPT
Developer: https://www.significantgravitas.com/

SetupGuide

Demo-Task: Look for a seasonal event on the internet and create a recipe for it.

Video credit: Ran Ding

Baby AGI

Baby AGI is an AI-powered task management system. The system uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks. The appeal of Baby AGI is in its ability to autonomously solve tasks based on the results of previous tasks and to keep a predefined objective. It also prioritizes tasks efficiently.

Mode of work:

  1. Pulls up the first task from the task list.
  2. Sends the task to the execution agent, which uses OpenAI’s API and Llama to complete the task based on the context.
  3. Enriches the result and stores it in Pinecone.
  4. Creates new tasks and reprioritizes the task list based on the objective and the result of the previous task.

Repository: https://github.com/yoheinakajima/babyagi
Developer: TwitterBlog
Setup Guide and Background: http://babyagi.org/
Test Baby AGI (bring your OpenAI key): Hugging Face

Task example: Find popular topics that don’t have enough documentation, for articles for my Linux tutorial blog:

Video Credit: ByteXD

Task Example: Plan a romantic dinner for my wife this Friday night in central Singapore:

Video Credit: Sam Witteveen

JARVIS / HuggingGPT

Jarvis, or HuggingGPT, is a collaborative system comprising a Large Language Model (LLM) as the central controller and numerous expert models as collaborative executors, sourced from the Hugging Face Hub. This agent can employ LLMs as well as other models. The workflow of the system consists of four stages:

• Task Planning: Uses ChatGPT to analyze user requests to discern intent and breaks them down into manageable tasks.

• Model Selection: To solve the given tasks, ChatGPT selects the best suited expert models from Hugging Face, based on their descriptions.

• Task Execution: Invokes and executes each selected model, subsequently returning the results to ChatGPT.

• Response Generation: Finally, it uses ChatGPT to integrate the prediction of all models, and generate a comprehensive response.

Repository: https://github.com/microsoft/JARVIS
Detailed setup guide: How to use Jarvis / HuggingGPT
Paper: Arxiv 
How it works:

Image credit: Yongliang Shen, et. al, Microsoft

For the complete article head over to medium.com/@maximilianvogel