Introduction to Operator & Agents

Operator is a major step in OpenAI’s roadmap for AI agents. It blends GPT-4’s reasoning and vision with a universal “screen + mouse + keyboard” interface, letting it tackle day-to-day chores online. The platform remains under active development, with strong emphasis on user confirmations, safety checks, and reliable performance. Over time, OpenAI aims to refine Operator’s capabilities, incorporate public feedback, and make it available—both as a consumer product and an API—for broader, real-world AI-driven automation.

1. On the power of “Operator” and CUA

“Operator is based on the new model we’ve trained at OpenAI, which we’re calling the computer using agent, or CUA for short. … By teaching a model how to use the same basic interface that we use on a daily basis, it just unlocks a whole new range of software that was previously inaccessible.”

2. On using raw screenshots (no APIs needed)

“Before, if you wanted to build something like Operator without CUA, you’d need to use some specialized APIs. … But if your site … did not have an API, then you’re out of luck. So this is just using screenshots, no API, nothing, just working.”

3. How Operator navigates websites

“CUA understands this. It’s just seeing the raw pixels. And after CUA sees this image, it decides what to do next. … It figures out what the next action it should take is.”

4. “Human in the loop” design

“At any point in time, a user should be able to take control and give Operator instructions, or tell a little bit more, guide a little bit more, etc. … We almost think of Operator as also keyed to how we think about user and user controls.”

5. On confirmations and alignment

“Operator comes back and asks for confirmation when it’s about to do anything kind of impactful. … We have moderation models. We have post hoc detection. We have blocked websites. … That’s really how we think about it—this stack of mitigations.”

6. Real-world side effects

“It is one of the first agents that we’re putting out in the world and which has real-world side effects. … For example, what if the user is misaligned? … If the agent is misaligned? … If the website is misaligned? … We hope to learn a lot from this deployment and iterate on our mitigations as we go.”