Test Driving Jules: Google's Coding Agent

I’ve seen Jules mentioned a few times by Google engineers on X, but I didn’t pay much attention until I noticed the logo up on agents.md website. That overlap made it worth testing: less out of curiosity, more to see whether it actually changes how work gets done compared to just chatting with Gemini.

If you have Google One AI Premium, you already have access to Gemini 3. Jules sits on top of those models, but the interaction model is very different. It’s closer to a task runner that enforces a specific way of delegating software work.

Setup: Real Repos, Real Environments

The biggest difference shows up immediately: you don’t paste snippets into a text box, the first time you use it you are prompted to link Github.

Jules clones your repository into a dedicated Google Cloud VM and it interacts with the code within it.

The “Lazy Developer” Model (In Practice)

Jules is built around asynchronous work. You assign a task, it disappears for a while, and eventually you get a result. I tried it on things I’d been consciously avoiding because they were boring, not hard:

Responsive image logic: I worked with Jules to plan changing the image resizing used for having responsive images on my website, and then asked to execute the plan. It worked in the background while I did something else.
Repo cleanup: Unused files, dead CSS, leftover experiments. Jules identified and removed them, then proposed the deletions.

The output is a GitHub pull request.

You can auto-merge if tests pass, but that feels reckless for anything non-trivial. Manual review makes more sense. Conceptually, this feels closer to assigning a ticket to a junior engineer than talking to an assistant.

Suggestions, Audits, and Light Autonomy

There’s a beta “Suggestions” feature that scans the repo and flags things like #TODO comments. It doesn’t automatically change anything, it asks whether you want it to proceed.

You can also schedule recurring tasks, like performance, security or UI audits. I tried a UI audit, and Jules (via the “Palette” tool) suggested an accessibility improvement to pagination for screen-reader users. The change was narrow, justified, and easy to verify, so I merged it.

This is one of the areas where the system works best: constrained, standards-driven improvements with clear definitions of “correct.”

Environment Setup

Jules can run without much setup, but that only really works for small or straightforward projects.

For anything larger, spending time in the Environment tab matters. This is where you define scripts, environment variables, and other assumptions so the VM actually matches your local development setup. Without this, Jules will still execute tasks, but you’re more likely to see mismatches or partial fixes that don’t reflect how the project really works.

Bigger codebases benefit the most here. The closer the VM is to your real environment, the less cleanup you’ll need after reviewing the PR.

There’s also a per-repo “Memories” feature where Jules tracks preferences like architectural patterns and stylistic choices. That cuts down on repeated instructions over time, but it also means the agent is accumulating assumptions in the background. It’s useful, but something you should check occasionally rather than ignore.

Jules isn’t designed for conversation.

It changes how you interact with Gemini by moving the interface away from chat and toward delegation. Instead of iterating through messages, you describe a bounded task, let it run in the background, and come back to a pull request.

You still need to define the scope, review the changes, and understand what’s being modified. But for well-defined tasks (cleanup, audits, or routine refactors you’re comfortable handing off), it’s a different and occasionally useful way to get work done without staying in the loop the whole time.