The ultimate curated list of autonomous AI agents: complete with tools, resources and examples
ChatGPT and many of the other current foundation models are great. They can answer innumerable questions, create AI art that rivals human masterpieces, analyze photos, and in some cases, they even show what we would call intelligence.
But there’s one simple challenge they’ve yet to conquer — to efficiently complete a laborious task made up of distinct steps.
Currently, AI models are like eager office interns, tireless and enthusiastic but desperately in need of guidance. They require monitoring, frequent directions, and vigilance against fudging or half-truths (aka “hallucinations”).
This is where AI agents step in. Agent AIs can do this autonomously. These autonomous helpers take user input, break it down into smaller tasks with the assistance of LLMs, and tackle them one at a time. The agents store the results and use them, if necessary, for subsequent steps in the process. As a result, AI Agents can handle complex tasks and access various foundation models that are not limited to language alone. For example, an agent might independently decide to utilize code, video or voice models; employ search engines or calculation tools to accomplish the task you’ve given it
The autonomous agents are not simply smarter than the foundation models on which they are based, but open up a completely new dimension: They are capable of “slow thinking” (Kahneman’s “system 2”). They solve complicated questions — in which one crawls to the goal bit by bit via intermediate results. Until now, slow thinking was only possible for LLMs via prompting techniques such as chain-of-thought, and here also only to a very limited extent.
While the addressable level of complexity does not increase significantly with agent AIs, they cover an incredible amount of additional area in the problem space (dashed red box) due to their ability to solve complicated problems: In other words, everything that requires more than a few steps to solve.
Content:
- Intro: What are Autonomous AI Agents?
- Must-know AI Platforms: A deep-dive into AgentGPT, Auto-GPT, BabyAGI, Jarvis & more, resources included
- The Completely Incomplete List of AI Agent Platforms
- Outlook: From sterile AI to powerful and dangerous agents
Intro — What are autonomous agent AI
Let’s say we want to use an AI model to create a deck of 52 cards, with each card featuring a different musician. We’d also like to substitute the usual card suits such as clubs or hearts to different music genres, such as soul or house.
Is it possible for an AI model to complete such a complex task?
The simple answer is no.
While a language model can compile a list of genres and artists, we need at least one additional model (AI art model such as Midjourney) to produce the visuals. We may also need additional systems to search the internet and to store contents.
We could write a batch processing script doing all this.
Or — and here our agent AIs fly in — we could just provide a prompt telling what we want to do, and the agent writes the batch script, executes it and monitors the outcome.
Usually, AI agents use both for the single steps (ie. selecting an artist for a single card) and for framework tasks (ie. generating a task list) various external models. They are outsourcing the thinking steps while storing information, tracking tasks, managing interface and orchestrating the entire process.
Autonomous AI agents have only emerged in the last few weeks, but they’re already developing at breakneck speed. Even Microsoft is getting in on the action with Jarvis / HuggingGPT. I’ll give a brief introduction to some of the main AI agents and discuss possible impacts on application development, along with AI safety.
AgentGPT
Assemble, configure, and deploy autonomous AI Agents in your browser.
This is the first model in the list, not because it is the most important, but because no installation or OpenAI keys are needed.
You can just try it right now.
Features:
- browser-based
- simple to use
- based on OpenAI models
- No OpenAI keys needed for test usage
Platform: https://agentgpt.reworkd.ai/
Developer: Asim Shresta
Demo
Let’s deep-dive into how AgentGPT managed a job I gave it:
My task: „Find the 3 most widely used task management software tools for usage in a small company and compare them in terms of price, scope, ease of installation“
Reasoning:
Some intermediate output:
Many, many more lines of output later, we have the final result (the whole process took approximately 3 minutes):
Auto-GPT
An experimental and open-source agent library based on GPT-4. It chains together LLM “thoughts” to autonomously achieve whatever task you set. Auto-GPT is one of the first platforms to run GPT-4 fully and autonomously, pushing the boundaries of what is possible with AI.
Features:
- Accesses the internet for queries and gathering information
- Long and short-term memory management
- GPT-4 instances for text generation
- Accesses popular websites and platforms
- File storage and summarization with GPT-3.5
Repository: https://github.com/Significant-Gravitas/Auto-GPT
Developer: https://www.significantgravitas.com/
Setup: Guide
Demo-Task: Look for a seasonal event on the internet and create a recipe for it.
Baby AGI
Baby AGI is an AI-powered task management system. The system uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks. The appeal of Baby AGI is in its ability to autonomously solve tasks based on the results of previous tasks and to keep a predefined objective. It also prioritizes tasks efficiently.
Mode of work:
- Pulls up the first task from the task list.
- Sends the task to the execution agent, which uses OpenAI’s API and Llama to complete the task based on the context.
- Enriches the result and stores it in Pinecone.
- Creates new tasks and reprioritizes the task list based on the objective and the result of the previous task.
Repository: https://github.com/yoheinakajima/babyagi
Developer: Twitter, Blog
Setup Guide and Background: http://babyagi.org/
Test Baby AGI (bring your OpenAI key): Hugging Face
Task example: Find popular topics that don’t have enough documentation, for articles for my Linux tutorial blog:
Task Example: Plan a romantic dinner for my wife this Friday night in central Singapore:
JARVIS / HuggingGPT
Jarvis, or HuggingGPT, is a collaborative system comprising a Large Language Model (LLM) as the central controller and numerous expert models as collaborative executors, sourced from the Hugging Face Hub. This agent can employ LLMs as well as other models. The workflow of the system consists of four stages:
• Task Planning: Uses ChatGPT to analyze user requests to discern intent and breaks them down into manageable tasks.
• Model Selection: To solve the given tasks, ChatGPT selects the best suited expert models from Hugging Face, based on their descriptions.
• Task Execution: Invokes and executes each selected model, subsequently returning the results to ChatGPT.
• Response Generation: Finally, it uses ChatGPT to integrate the prediction of all models, and generate a comprehensive response.
Repository: https://github.com/microsoft/JARVIS
Detailed setup guide: How to use Jarvis / HuggingGPT
Paper: Arxiv
How it works:
For the complete article head over to medium.com/@maximilianvogel