Day 19 of 28 · ChatGPT Course

ChatGPT Agent

⏱ 7 min 📊 Medium

Until now, everything we have covered requires you to read and act on ChatGPT's output. It gives you text, you do something with it. But what if ChatGPT could take action for you -- browsing websites, filling out forms, clicking buttons, and completing real-world tasks on the internet?

That is what ChatGPT Agent does. It is an AI agent that can control a web browser, navigate websites, and perform multi-step tasks on your behalf. Think of it as a capable assistant who can actually do things on the web, not just tell you how.

Today you will learn what the ChatGPT Agent can do, how it works under the hood, and how to use it effectively.

ChatGPT Agent workflow — describe task, agent plans, agent executes, you get results

The Agent handles multi-step tasks autonomously while you do other things.

From Operator to ChatGPT Agent

In early 2025, OpenAI launched Operator -- a research preview of an AI that could use a web browser. It was impressive but limited. It ran in a separate interface, had a small set of supported websites, and felt like an experiment.

In 2026, Operator evolved into ChatGPT Agent -- fully integrated into the main ChatGPT interface. Instead of switching to a separate tool, you simply describe what you want done, and the agent handles it right from your ChatGPT conversation.

The key improvements:

- Integrated experience -- it works inside the ChatGPT interface you already know

- Powered by o3 reasoning -- uses advanced reasoning to plan and execute complex multi-step tasks

- Broader website support -- works across a much wider range of websites

- Better error recovery -- when something goes wrong, it can diagnose the problem and try alternative approaches

Knowledge Check

What is the main evolution from Operator to ChatGPT Agent?

A

It is now available for free

B

It uses a completely different AI model

C

It is fully integrated into the ChatGPT interface and uses o3 reasoning for better multi-step task execution

D

It can now control desktop applications

The key evolution is integration and capability. Instead of being a separate tool, ChatGPT Agent lives inside the main ChatGPT experience. It also uses o3 reasoning, which allows it to plan, reason through obstacles, and execute complex multi-step tasks more reliably.

What the agent can do

ChatGPT Agent can perform a wide range of web-based tasks:

Browse and research. Navigate websites, read content, extract specific information, and compile findings. More hands-on than Deep Research -- it actually interacts with web pages.

Fill out forms. Complete online applications, registration forms, surveys, and checkout processes.

Book and reserve. Search for flights, hotels, and restaurants. Compare options and make bookings on your behalf.

Order and purchase. Add items to shopping carts, apply coupon codes, and complete purchases (with your confirmation before payment).

Manage accounts. Update settings, change preferences, and navigate account dashboards on supported websites.

Compare and analyze. Visit multiple competitor websites, extract pricing information, and compile comparison tables.

Here is the critical distinction: the agent works on the web only. It controls a web browser. It cannot control desktop applications, local files, or mobile apps. If a task requires something that exists only on a website, the agent can probably handle it.

🤖

ChatGPT

online

I need to book a round-trip flight from San Francisco to New York for March 15-18. Find me the best options under $400.

I'll search for flights for you. Let me check multiple airline and travel sites. **Searching...** - Checking Google Flights for SFO to JFK/LGA/EWR - Comparing prices on Kayak - Checking direct airline sites (United, JetBlue, Delta) **Results: Top 3 Options Under $400** 1. **JetBlue** -- SFO > JFK - Depart: Mar 15, 8:20 AM (nonstop, 5h 15m) - Return: Mar 18, 6:00 PM (nonstop, 5h 45m) - **$327 round trip** 2. **United** -- SFO > EWR - Depart: Mar 15, 11:45 AM (nonstop, 5h 30m) - Return: Mar 18, 9:15 PM (nonstop, 6h) - **$349 round trip** 3. **Delta** -- SFO > JFK - Depart: Mar 15, 6:00 AM (nonstop, 5h 10m) - Return: Mar 18, 4:30 PM (nonstop, 5h 50m) - **$389 round trip** Would you like me to proceed with booking any of these? I'll need your passenger details and payment method to complete the reservation.

↻ Replay conversation

Knowledge Check

What is the key limitation of ChatGPT Agent?

A

It can only control a web browser -- it cannot control desktop applications, local files, or mobile apps

B

It can only perform one task at a time

C

It can only work on Google websites

D

It requires a separate subscription

ChatGPT Agent works by controlling a web browser. It can navigate websites, fill forms, and click buttons on web pages. But it cannot interact with desktop software like Excel, local files on your computer, or native mobile apps. If the task happens on the web, it can handle it.

How it works behind the scenes

When you give the agent a task, here is what happens:

1. Planning. The agent uses o3 reasoning to break your request into steps. For booking a flight, it might plan: search multiple sites, compare prices, filter by your criteria, present options, and then book upon confirmation.

2. Execution. It opens a browser session and starts navigating. It reads page content, identifies buttons and form fields, types text, clicks links, and scrolls -- just like a human would.

3. Reasoning through obstacles. If a page layout is unexpected, a pop-up appears, or a form requires information it does not have, the agent reasons through the problem. It might close a pop-up, try a different navigation path, or ask you for clarification.

4. Confirmation before critical actions. The agent will always pause and ask for your confirmation before taking irreversible actions like submitting a purchase, signing up for a service, or making a booking. You stay in control.

5. Reporting back. Once the task is complete, the agent summarizes what it did, what it found, and any actions it took.

🤖

ChatGPT

online

Go to our competitor's website at example-competitor.com and find their current pricing for all their plans. Also check if they offer a free trial.

I've visited example-competitor.com and navigated to their pricing page. Here's what I found: **Competitor Pricing (as of today):** | Plan | Monthly | Annual (per month) | |------|---------|--------------------| | Starter | $19/mo | $15/mo | | Professional | $49/mo | $39/mo | | Enterprise | Custom | Custom | **Free Trial:** Yes -- 14-day free trial on Starter and Professional plans. No credit card required. **Notable extras:** - Starter is limited to 3 users - Professional includes API access - Enterprise requires contacting sales Want me to check any other competitors or compare this with your current pricing?

↻ Replay conversation

Availability and best practices

Availability. ChatGPT Agent is currently available on the Pro plan first, with a gradual rollout to Plus users. OpenAI is expanding access as the feature stabilizes and improves.

Best practices for using the agent effectively:

Be specific about what you want. Instead of "find me a hotel," say "find me a hotel in downtown Chicago for April 5-7 with a gym and free cancellation, under $200 per night." The more detail you provide, the better the agent performs.

Let it ask clarifying questions. If the agent is unsure about something, it will ask. Answer those questions rather than assuming it will figure it out.

Review before confirming. Always review the agent's findings and proposed actions before giving the go-ahead on bookings, purchases, or sign-ups. The confirmation step exists for a reason.

Start with lower-stakes tasks. Get comfortable with the agent on research and comparison tasks before using it for purchases or bookings. Build trust in how it works.

Be patient with complex tasks. Multi-step tasks take time. The agent might need to navigate several pages, wait for content to load, and try different approaches. This is normal.

Knowledge Check

When does ChatGPT Agent ask for your confirmation?

A

It never asks -- it completes everything automatically

B

Before every single click on a webpage

C

Only when it encounters an error

D

Before taking irreversible actions like submitting a purchase, signing up, or making a booking

The agent is designed to pause before critical, irreversible actions. It will autonomously navigate, search, and gather information, but it will ask for your explicit confirmation before spending money, creating accounts, or making commitments. You always stay in control of consequential decisions.

Final Check

Which AI model powers ChatGPT Agent's reasoning capabilities?

A

GPT-4o

B

DALL-E

C

GPT-4

D

o3 reasoning model

ChatGPT Agent is powered by the o3 reasoning model, which excels at planning and executing complex multi-step tasks. The reasoning capability is what allows the agent to break down complex requests, navigate obstacles, and recover from unexpected situations during task execution.

🤖

Day 19 Complete

"ChatGPT Agent bridges the gap between getting answers and getting things done. Describe the task, let the agent execute, and confirm before anything irreversible happens."

Tomorrow — Day 20

Group Chats & Collaboration

Learn how to invite others into a shared chat with ChatGPT for team brainstorming, planning, and collaboration.

Continue to Day 20 →