Working with AI Agents
Do you think you’ll be the one prompting AI and telling it what to do? Or, might it be the one giving you the work? I hadn’t considered the later, but a conversation with a peer overseeing experiments with AI agents at a leading tech company made me realize the future is likely a mix of both.
Today I want to talk to you about AI Agents and how they'll change the way we use AI, as the technology matures. I believe usage will dramatically increase when:
- AIs do more than just talk, like update a calendar or a customer database
- AI come to you, instead of waiting for you to ask them to do something
Imagine an AI helper acting as a project manager. It wouldn't just listen to meetings to take notes and update the project plan. It would also figure out the goals and rules of a project from documents and talks. Then it would break the project into smaller tasks and give them out to the team. It would keep adjusting the tasks as the project goes on.
This is way different from an AI assistant that just does simple work when you ask. Some of the top AI companies are already testing this out.
Tomorrow is today, folks. 🙂 These capabilities already exist.
<aside> 📝 In this article, when I say "agent," I mean an AI that's more independent than a regular chatbot. It can do more complex tasks, use outside tools, and maybe even navigate talking to a group of people or bots. When I say "assistant," I mean a chatbot that uses Generative AI (LLMs) but completes one request at a time, with little autonomy.
</aside>
Overview
Over 100 companies are working on AI agents. This graphic illustrates their focus areas, revealing an even split between open-source and proprietary tools. It’s notable that a majority of these companies are dedicated to creating agents that excel in specific functions, with about half of the companies focusing on software engineering. 🤯
- Functional Agents: These agents are optimized for specific organizational functions, such as finance, HR, marketing, or research.
- General Purpose Agents: Agents like Ema can be tailored for various use cases and may come with numerous integrations and core skills, such as data analysis and planning. Many offer user interfaces to create agents tailored to specific tasks, similar to how GPTs can be created within ChatGPT.
- Frameworks: The backend that allows you to configure capabilities (like searching the web), tasks (like creating a marketing strategy), and roles (made up of tasks). Many even have capabilities for enabling agents to work together, whether to review work, provide input, or offer feedback.
- Infrastructure: The software components and hardware on which frameworks run.
Research
Fascinating research is being conducted in this field. In their survey titled "Understanding the planning LLM agents: A survey," Huang et al. summarize research on planning and categorize it into five areas: decomposition, selection, leveraging external planners and LLMs together, reflection, and memory.
The key takeaway is that the skill of planning with LLMs requires improvement before it can be widely adopted. Future research will need to address the following main issues:
- Reducing hallucinations that result in plans that do not adhere to constraints or are irrational
- Enhancing the ability to adhere to several constraints when generating plans to create feasible plans
- Improving plan efficiency and optimal plan selection
- Enabling planning in a multi-modal environment, incorporating images, audio, and video into the planning process
My Experience
This week, I explored setting up my own agents on AutoGen, an open-source framework from Microsoft (github). It boasts an active Discord community and has been referenced by Andrew Ng in his recent 4-part series on Agent design patterns.
A peer shared the Marketing Team they built with AutoGen, aptly called "agentcy" (github), providing a good demonstration of how easily AutoGen can be set up. Despite running out of time due to environment issues, I was able to go through the setup and was surprised by its intuitiveness.
- Agents are assigned a role (system_message) and skills (a map of functions, like research)
- Each skill has one or more associated functions, such as querying Google and scraping a website, which are implemented in separate functions
To obtain the input parameters required for an agent to accomplish a task using their skills (e.g., a website URL to scrape), I learned that there are three options:
- Human-in-the-loop: The agent will ask a human to provide the input
- Configuration: If the input is always the same, it can be saved within the program
- Context: The agent can ask other agents it collaborates with for the input
I’ll update you as I continue to learn about this topic.
The deployment of agents within organizations is in its early days. I'm excited about the future and the potential for humans to leverage agents for mundane, tedious, or error-prone tasks.