The Gemini CLI has, in eleven months, gone from a quiet Google Labs project to one of the three default terminal-native coding agents engineers actually use in production. The other two are Anthropic's Claude Code and OpenAI's Codex CLI. Each has a distinct personality, a distinct cost profile, and a distinct edge case where it is the obvious right answer.
This is the definitive May 2026 Gemini CLI guide: install, commands, agentic mode, the 2M-token context superpower, the VS Code integration, and a head-to-head with the other two agents in the same category. If you are evaluating which terminal coding agent to default to in 2026, this post is built for that decision.
What the Gemini CLI Is
Google announced the Gemini CLI at I/O 2025 and shipped 1.0 in June 2025. It is an open-source (Apache 2.0) command-line agent that runs Gemini 3.x models against your local filesystem, your shell, and a configurable set of Model Context Protocol (MCP) servers.
The architectural shape is the same as Claude Code and Codex CLI: terminal-native, file-aware, git-aware, persistent conversation history per repo, and a built-in tool surface (read file, write file, run shell, edit, search). Where Gemini CLI differentiates is along three axes:
- Context window: 2 million tokens on Gemini 3.1 Pro — by far the largest of any production coding agent.
- Free tier: 1,000 requests / day with a personal Google account — no other agent in this class has a free tier remotely this generous.
- Open-source under Apache 2.0 — the agent itself, not just the model API, is forkable. Claude Code and Codex CLI are both source-available with more restrictive terms.
Those three properties combine to make Gemini CLI the obvious default for (a) hobbyist use, (b) very long context workloads, and (c) any team that needs to fork or self-host the agent layer. We will walk through each below.
Install
npm install -g @google/gemini-cli
That is the canonical install path. Node.js 20+ is required. The CLI ships with a self-contained MCP runtime, so no separate dependencies. First-run authentication is interactive:
gemini
On first run you will be prompted to authenticate. The two paths:
- Personal Google account — free tier, 1,000 requests/day, Gemini 3.1 Pro access.
- API key — paid, no daily limit, configurable model selection. Get a key from
https://aistudio.google.com.
To switch auth methods later:
gemini --auth-method=api-key
gemini --auth-method=oauth
For team deployments, you can also configure auth via environment variable:
export GEMINI_API_KEY="your-key-here"
A ~/.gemini/settings.json file is created on first run; this is where you configure default model, MCP servers, and tool permissions. We will return to this.
The Core Command Surface
Once authenticated, the conversational loop is the standard terminal-agent pattern:
cd /path/to/your/repo
gemini
> Refactor the user-auth module to use the new session API.
The agent reads files, writes files, runs shell commands, and runs tools — pausing for confirmation on any destructive action by default. Confirmation behavior is configurable.
Built-in slash commands inside the REPL:
| Command | What it does |
|---|---|
/help | List all commands |
/model | Switch the active model (Gemini 3.1 Pro, Flash, etc.) |
/context | Show current token usage of the conversation |
/compress | Summarize the conversation to free context |
/clear | Reset the conversation |
/save <name> | Save the conversation to a checkpoint |
/restore <name> | Resume a saved conversation |
/yolo | Skip all confirmation prompts (use with caution) |
/mcp list | List configured MCP servers |
/tools | List all available tools |
/quit | Exit |
The two slash commands worth highlighting: /compress and /save. /compress lets you keep working past the 2M ceiling by summarizing earlier turns, and /save lets you preserve a long-running session across days — useful for refactors that span a week.
Agentic Mode
The default conversational mode is single-turn agentic — the model plans, executes tools, and reports back, pausing for input between major steps. Agentic mode (sometimes called "headless mode" in the docs) runs the loop unattended:
gemini --agentic --task "Migrate all class components in src/components to function components and ensure all tests still pass."
Agentic mode is what makes Gemini CLI competitive with Claude Code's claude --dangerously-skip-permissions mode and Codex CLI's --auto. It runs the full plan-execute-verify loop without prompting you. By default it stops on:
- Test suite failure (after configurable retries)
- Any tool error not matched by the recovery prompt
- Reaching the configured
--max-turns(default 50) - A user
Ctrl-C
For CI use, the recommended invocation is:
gemini --agentic --task-file .gemini/refactor-task.md \
--output-format json \
--max-turns 100 \
--no-confirm
The JSON output mode emits a structured summary suitable for CI status reporting.
The 2M Context Window: What It Actually Buys You
The headline spec is 2,000,000 input tokens on Gemini 3.1 Pro. Anthropic's Claude Opus 4.7 and OpenAI's GPT-5.5 both ship with 1M context. The 2x is real, but the more interesting question is what fits in 2M tokens.
Order-of-magnitude reference points:
| Asset | Approximate tokens |
|---|---|
| War and Peace, full text | ~580k |
Linux kernel fs/ subdirectory | ~720k |
| Stripe's public API docs (v202404) | ~340k |
| A 50,000-line TypeScript monorepo | ~900k |
| The full SEC 10-K corpus for the S&P 500 top 50 | ~2.4M |
| The complete React 19 source tree | ~1.1M |
For coding, the practical implication is that Gemini CLI on Gemini 3.1 Pro can hold an entire mid-sized monorepo in context simultaneously. No retrieval, no chunking, no embedding store. You point it at the repo and ask the question.
But context-window numbers are not free. Two real-world frictions:
- In-context recall degrades past 1M tokens. Our internal evaluation on a 1.4M-token financial-filings comprehension task showed Gemini 3.1 Pro at 92% factual recall — strong, but not the 99% it hits at 200k. The "lost in the middle" effect is still present, just shifted further out. We covered this in detail in our LLM context window explainer.
- You pay for every token. At Gemini 3.1 Pro's $3.50 input / $10.50 output pricing, a 2M-token prompt costs $7.00 just to load. That adds up quickly across an agentic loop with dozens of tool turns.
The right mental model: 2M is the ceiling, not the operating point. The actual workflow is "load enough to answer this question, compress aggressively, keep the working window under 500k unless you need more."
VS Code Integration
The official VS Code integration shipped in November 2025 as the Gemini Code Assist + CLI Bridge extension. Install it from the VS Code marketplace:
code --install-extension google.gemini-code-assist
The extension adds three things:
- Inline completions — the original Code Assist surface, unchanged from 2024.
- CLI-bridge mode — a sidebar panel that mirrors the Gemini CLI REPL, sharing the same auth, model selection, and conversation history.
- MCP integration — VS Code can register itself as an MCP client, exposing the open file, current selection, and active terminal to the Gemini CLI.
The CLI-bridge mode is the part that materially changes the workflow. You can start a conversation in the terminal, walk to the kitchen, and resume it in VS Code with the conversation state intact. For teams where some engineers prefer terminals and others prefer IDEs, this removes a real friction.
Gemini CLI vs Claude Code vs Codex CLI
The honest comparison, May 2026:
| Dimension | Gemini CLI | Claude Code | Codex CLI |
|---|---|---|---|
| Default model | Gemini 3.1 Pro | Claude Opus 4.7 / Sonnet 4.7 | GPT-5.5 / GPT-5.5 Codex |
| Context window | 2,000k | 1,000k | 1,000k |
| SWE-Bench Verified | 68.9% | 82.1% | 71.2% |
| SWE-Bench Pro | 51.4% | 64.3% | 56.1% |
| Free tier | 1,000 req/day (personal Google) | None | $5 credit on signup |
| Pricing (input/output) | $3.50 / $10.50 per 1M | $15 / $75 per 1M | $5 / $15 per 1M |
| License (agent) | Apache 2.0 | Source-available | Source-available |
| MCP support | Native, server + client | Native, server + client | Native, client only |
| Best at | Long context, free tier, breadth | Hard agentic + coding work | Tight OpenAI ecosystem fit |
The plain-English version:
- Gemini CLI is the obvious default if cost or context size is the constraint. Free tier for hobbyists, 2M context for codebase-wide reasoning, Apache 2.0 if you want to fork.
- Claude Code is the obvious default for hard coding work. SWE-Bench Pro 64.3% is in a different league; the agentic-loop quality justifies the price for code-heavy production teams.
- Codex CLI is the obvious default if you are deep in the OpenAI ecosystem — using GPT-5.5 elsewhere, on Azure OpenAI, or building on top of structured outputs v3.
Most teams we work with end up running two of the three: a "default" agent and a "hard problem" agent. Common pairings: Gemini CLI for daily work plus Claude Code for the gnarly diffs; or Codex CLI for inside the OpenAI walled garden plus Gemini CLI for everything that needs to read a whole repo at once.
For a deeper coding-tool comparison that includes the IDE-native side of the market (Cursor, Lovable, Base44), see our Claude Code vs Cursor vs Lovable vs Base44 piece.
For a model-level view of where Gemini 3.1 Pro fits across all 2026 frontier releases, our AI Model Leaderboard tracks quality, speed, and value side-by-side.
Configuring Gemini CLI for a Real Project
Below is the ~/.gemini/settings.json we recommend as a starting point for a production engineering team:
{
"model": "gemini-3.1-pro",
"fallbackModel": "gemini-3.1-flash",
"agentic": {
"maxTurns": 60,
"stopOnTestFailure": true,
"confirmDestructive": true
},
"tools": {
"shell": { "enabled": true, "allowList": ["git", "npm", "pnpm", "pytest", "rg"] },
"fileWrite": { "enabled": true, "denyPaths": [".env", ".git/", "node_modules/"] },
"fileRead": { "enabled": true }
},
"mcpServers": {
"github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] },
"postgres": { "command": "uvx", "args": ["mcp-server-postgres", "--read-only"] }
},
"telemetry": { "enabled": false }
}
The two settings most teams overlook:
stopOnTestFailure: true— turns the agent from a destructive yolo machine into a test-driven contributor. Without this, agentic mode happily marches past failing tests.denyPaths— the deny list..envshould always be on it. So should.git/(the agent should use the git tool, not write to git internals directly) and any infra-as-code directory you do not want vibe-coded.
For the secrets-management side specifically, see our companion piece on vibe coding and what goes wrong.
A Realistic First Hour
If you have not used Gemini CLI before, here is the loop we recommend for your first hour with it:
# 1. Install and authenticate.
npm install -g @google/gemini-cli
gemini # walk through the OAuth flow, accept the free tier
# 2. Try a tiny single-turn task in a throwaway directory.
mkdir /tmp/gemini-demo && cd /tmp/gemini-demo
gemini
> Initialize a TypeScript project with Vitest and a hello-world function with one passing test.
# 3. Try a longer task in your real repo.
cd ~/code/your-project
gemini
> Read the codebase. Summarize the architecture in 5 bullets. Then propose the 3 highest-leverage refactors.
# 4. Try agentic mode with stopping conditions.
gemini --agentic --max-turns 20 --task \
"Add a new endpoint POST /api/v2/health that returns { status, version, uptime } as JSON. Add tests. Run the test suite."
By the end of those four steps you will have a calibrated sense of the latency, the cost, and the quality. From there it is a matter of layering on MCP servers, tightening the deny lists, and integrating into CI.
FAQ: Gemini CLI
1. Is Gemini CLI free? The CLI is open-source under Apache 2.0. Model usage on a personal Google account is free up to 1,000 requests / day on Gemini 3.1 Pro plus higher limits on Flash. For team or production use you need an API key, which is metered at standard Gemini pricing ($3.50 input / $10.50 output per 1M tokens for 3.1 Pro as of May 2026).
2. Can I run Gemini CLI offline? No. The agent runs locally, but every model call goes to Google's API. There is no offline mode. If you need offline, you would need to run a self-hosted model and adapt the agent's model adapter — possible because the agent is Apache 2.0, but not supported out of the box.
3. How does Gemini CLI handle secrets?
It does not, by default. Secrets management is a deny-list configuration concern. Add .env, *.pem, *.key, and your specific secrets-bearing files to the tools.fileRead.denyPaths and tools.fileWrite.denyPaths lists in settings.json. For production deployments, also restrict shell access via tools.shell.allowList.
4. What is the difference between Gemini CLI and Gemini Code Assist? Gemini Code Assist is the original IDE inline-completion product (the Copilot competitor), launched in 2023. Gemini CLI is the newer terminal-native agent, launched June 2025. They share authentication and billing but are otherwise separate products. The VS Code extension we mentioned bridges them.
5. Does Gemini CLI support MCP servers? Yes, natively, in both directions. The CLI is an MCP client (consuming external servers like the GitHub or Postgres MCP servers) and can also expose itself as an MCP server (so other agents can call it). This is one of the cleanest MCP integrations among the three major terminal agents.
6. Should my team default to Gemini CLI or Claude Code? Use both. The right pattern: Gemini CLI as the daily driver for the cost reasons (free tier or $3.50 input vs $15 input), Claude Code on the same workstation for hard diffs that benefit from Opus 4.7's SWE-Bench Pro 64.3%. The two share the filesystem, so handoffs are zero-friction. Most teams we work with end up with this setup within a quarter.
7. Can I use Gemini CLI with non-Google models? Officially, no — the agent is hard-coded to the Gemini API surface. Unofficially, the Apache 2.0 license means anyone can fork and add adapters, and at least two community forks (visible on GitHub) have done so. For a supported multi-model agent, Cursor or Aider are better-maintained options, or use Swfte Connect underneath the official CLI to route Gemini calls through your own gateway.
Where to Go Next
Gemini CLI is the right answer for an increasingly large slice of the terminal-coding-agent market. The 2M context window, the free tier, and the Apache 2.0 license combine to make it the lowest-friction entry point into agentic coding in 2026. It is not the strongest agent on the hardest coding problems — Claude Code keeps that crown — but it is the agent most engineers should install first, learn the patterns on, and then pair with a heavier-duty agent for the long-tail diffs.
If you are running multiple agents already, the next step is a routing layer that picks the right one for the task. That is exactly what Swfte Connect does: write your prompts once, route across Gemini, Claude, and OpenAI by cost and quality criteria you set.
Route across Gemini, Claude, and OpenAI with Swfte Connect, build agentic workflows with Swfte Studio, upskill your team on the right CLI defaults, and ship with enterprise-grade security. See pricing or browse case studies.