Skip to content

Latest commit

Β 

History

History
981 lines (746 loc) Β· 38.3 KB

tutorial.md

File metadata and controls

981 lines (746 loc) Β· 38.3 KB

How to Build a Code-Editing Agent in Python with Gemini

(or: It's Easier Than You Think!)

Adapted from Thorsten Ball's "How to Build an Agent" Current Date: Tuesday, April 15, 2025

It seems magical, doesn't it? Watching a coding agent browse files, write code, fix errors, and try different approaches feels like witnessing some deeply complex AI sorcery. You might think there's a hidden, intricate mechanism powering it all.

Spoiler alert: There isn't.

At its core, a functional code-editing agent often boils down to three things:

  1. A powerful Large Language Model (LLM).
  2. A loop that keeps the conversation going.
  3. The ability for the LLM to use "tools" to interact with the outside world (like your file system).

Everything else – the slick UIs, the fancy error recovery, the context-awareness that makes tools like Google's Project IDX or GitHub Copilot Workspace so impressive – is often built upon this foundation with clever prompting, robust engineering, and yes, a good amount of "elbow grease."

But you don't need all that complexity to build something genuinely impressive. You can create a capable agent, right here, right now, in surprisingly few lines of Python code.

I strongly encourage you to code along. Reading is one thing, but feeling how little code it takes to achieve this is another. Seeing it run in your own terminal, modifying your own files? That's where the "aha!" moment happens.

Here’s what you’ll need:

  1. Python 3.7+ installed.
  2. A Google AI Studio API Key for the Gemini API. You can get one for free here.
  3. Set your API key as an environment variable named GOOGLE_API_KEY.

Ready? Let's dive in.

Step 1: Project Setup and Basic Chat Loop

First, let's get our project environment ready. Open your terminal:

# Create and navigate to the project directory
mkdir python-gemini-agent
cd python-gemini-agent

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`

# Create main file and requirements file
touch main.py requirements.txt

# Create a placeholder for your API key (or set it directly)
echo "export GOOGLE_API_KEY='YOUR_API_KEY_HERE'" > .envrc # Optional: if you use direnv
# Remember to replace 'YOUR_API_KEY_HERE' and potentially source the file
# Or set it directly in your shell: export GOOGLE_API_KEY='YOUR_API_KEY_HERE'

Now, let's install the necessary library: the Google Generative AI SDK. Add this line to requirements.txt:

# requirements.txt
google-genai

And install it:

pip install google-genai

Note: The previous SDK (google-generativeai) is deprecated. Use google-genai for all new projects. See Gemini API Libraries for details.

Next, open main.py and put in the basic structure. We'll create a simple Agent class to hold our logic.

# main.py
from google import genai
from google.genai import types
import os
import sys
from pathlib import Path
import json

# --- Configuration ---
# Load the API key from environment variable
try:
    GOOGLE_API_KEY = os.environ['GOOGLE_API_KEY']
    client = genai.Client(api_key=GOOGLE_API_KEY)  # Uses GOOGLE_API_KEY from environment
except KeyError:
    print("πŸ›‘ Error: GOOGLE_API_KEY environment variable not set.")
    sys.exit(1)

# Choose your Gemini model
MODEL_NAME = "gemini-2.0-flash" # Or "gemini-2.5-pro-preview-03-25"

# --- Agent Class ---
class Agent:
    def __init__(self, model_name: str):
        print(f"✨ Initializing Agent with model: {model_name}")
        self.model_name = model_name
        self.model = None
        self.chat = None
        self.tool_functions = [self.read_file, self.list_files, self.edit_file]
        print(f"πŸ€– Agent initialized with model: {self.model_name}")

    def read_file(self, path: str) -> str:
        """Reads the content of a file at the given relative path."""
        print(f"πŸ› οΈ Executing read_file: {path}")
        if not self._is_safe_path(path):
            raise SecurityError(f"Access denied: Path '{path}' is outside the allowed directory.")
        try:
            file_path = Path(path).resolve()
            if not file_path.is_file():
                return f"Error: Path '{path}' is not a file or does not exist."
            return file_path.read_text(encoding='utf-8')
        except Exception as e:
            return f"Error reading file '{path}': {e}"

    def list_files(self, directory: str = '.') -> str:
        """Lists files and directories within a given relative path."""
        print(f"πŸ› οΈ Executing list_files: {directory}")
        if not self._is_safe_path(directory):
            raise SecurityError(f"Access denied: Path '{directory}' is outside the allowed directory.")
        try:
            dir_path = Path(directory).resolve()
            if not dir_path.is_dir():
                return f"Error: Path '{directory}' is not a directory."
            items = [f.name + ('/' if f.is_dir() else '') for f in dir_path.iterdir()]
            return f"Contents of '{directory}':\n" + "\n".join(items)
        except Exception as e:
            return f"Error listing directory '{directory}': {e}"

    def edit_file(self, path: str, content: str) -> str:
        """Writes content to a file at the given relative path, overwriting it."""
        print(f"πŸ› οΈ Executing edit_file: {path}")
        if not self._is_safe_path(path):
            raise SecurityError(f"Access denied: Path '{path}' is outside the allowed directory.")
        try:
            file_path = Path(path).resolve()
            file_path.write_text(content, encoding='utf-8')
            return f"Successfully wrote to '{path}'."
        except Exception as e:
            return f"Error writing to file '{path}': {e}"

    def _is_safe_path(self, path_str: str) -> bool:
        """Check if the path is within the project directory."""
        try:
            project_root = Path.cwd().resolve()
            target_path = (project_root / path_str).resolve()
            return target_path.is_relative_to(project_root)
        except ValueError: # Handles invalid path characters
            return False

    def start_interaction(self):
        """Starts the main interaction loop."""
        print("\nπŸ’¬ Chat with Gemini (type 'quit' or 'exit' to end)")
        print("-" * 30)

        while True:
            try:
                user_input = input("πŸ”΅ \x1b[94mYou:\x1b[0m ").strip()
                if user_input.lower() in ["quit", "exit"]:
                    print("πŸ‘‹ Goodbye!")
                    break
                if not user_input:
                    continue

                print("🧠 Thinking...")
                tool_config = types.GenerateContentConfig(
                    tools=self.tool_functions
                )
                response = self.client.models.generate_content(
                    model=self.model_name,
                    contents=user_input,
                    config=tool_config,
                )

                if response.text:
                    print(f"🟠 \x1b[93mGemini:\x1b[0m {response.text}")
                else:
                    print("🟠 \x1b[93mGemini:\x1b[0m (No text response received)")
                    # Log the full response if needed for debugging
                    # print(f"DEBUG: Full response: {response}")

            except KeyboardInterrupt:
                print("\nπŸ‘‹ Exiting...")
                break
            except Exception as e:
                print(f"❌ An error occurred: {e}")
                # Consider more robust error handling or logging

# --- Main Execution ---
if __name__ == "__main__":
    print("πŸš€ Starting Agent...")
    class SecurityError(Exception): pass
    agent = Agent(MODEL_NAME)
    agent.start_interaction()

What's happening here?

  1. Import necessary libraries: google-genai for the API, os for environment variables, sys for exit, pathlib for file paths, and json for tool data.
  2. Configure API Key: Reads the GOOGLE_API_KEY from your environment. Crucial!
  3. Agent Class: A simple container for our logic.
    • __init__: Sets up the model name and defines tool functions.
    • start_interaction: The main loop. It gets user input, sends it to self.client.models.generate_content(), and prints the response.text.

πŸ§ͺ Quick Test #1: Basic Conversation

Make sure your GOOGLE_API_KEY is set in your environment. Run the script:

python main.py

You should see:

✨ Initializing Agent with model: gemini-2.0-flash
πŸ€– Agent initialized with model: gemini-2.0-flash

πŸ’¬ Chat with Gemini (type 'quit' or 'exit' to end)
------------------------------
πŸ”΅ You: Hi Gemini! How are you today?
🧠 Thinking...
🟠 Gemini: I'm doing great! Ready to help with anything you need. How can I assist you today?
πŸ”΅ You: quit
πŸ‘‹ Goodbye!

If this works, your basic connection to the Gemini API is successful! This is the foundation of every AI chat application. Notice Gemini maintains context within the self.client.models.generate_content() session implicitly.

Step 2: Understanding and Implementing Tools

An "agent" becomes truly powerful when the LLM can interact with the world outside its text window. This is done via tools (often called "function calling" in API terms).

The Concept:

  1. Define Tools: You tell the LLM about available tools: their names, what they do, and what input parameters they expect.
  2. LLM Request: When the LLM thinks a tool can help answer the user's query, instead of just generating text, it generates a special message asking to call that tool with specific arguments.
  3. Execute Tool: Your code detects this request, extracts the tool name and arguments, runs the actual tool code (e.g., reading a file), and gets the result.
  4. Send Result: You send the tool's result back to the LLM.
  5. LLM Response: The LLM uses the tool's result to formulate its final text response to the user.

Gemini API Tool Structure (New SDK):

With the new SDK, you simply define Python functions as your tools and register them directly with the client. No more FunctionDeclaration or Tool objects!

# Example tool function
def read_file(path: str) -> str:
    """Reads the content of a file at the given relative path."""
    print(f"πŸ› οΈ Executing read_file tool with path: {path}")
    try:
        file_path = Path(path).resolve()
        if not file_path.is_relative_to(Path.cwd()):
            return f"Error: Path '{path}' is outside the project directory."
        if not file_path.is_file():
            return f"Error: Path '{path}' is not a file or does not exist."
        return file_path.read_text(encoding='utf-8')
    except Exception as e:
        return f"Error reading file '{path}': {e}"

# Register your tool(s) with the chat config
tool_config = types.GenerateContentConfig(
    tools=[read_file],  # Add more tool functions here as needed
)

Now, when Gemini wants to use a tool, it will call your Python function directly!

Step 3: Implementing the read_file Tool (New SDK)

Let's give our agent the ability to read files. With the new SDK, just define a Python function and register it as a tool.

1. Define the Tool Function:

Add this Python function before the Agent class definition in main.py:

# main.py
# ... (imports and tool function definition) ...

def read_file(path: str) -> str:
    """
    Reads the content of a file at the given relative path.

    Args:
        path: The relative path to the file.

    Returns:
        The content of the file as a string, or an error message.
    """
    print(f"πŸ› οΈ Executing read_file tool with path: {path}")
    if not self._is_safe_path(path):
        raise SecurityError(f"Access denied: Path '{path}' is outside the allowed directory.")
    try:
        file_path = Path(path).resolve() # Resolve to absolute path for security/clarity
        # Basic security check: ensure file is within the project directory
        if not file_path.is_relative_to(Path.cwd()):
             raise SecurityError(f"Access denied: Path '{path}' is outside the project directory.")

        if not file_path.is_file():
            return f"Error: Path '{path}' is not a file or does not exist."
        content = file_path.read_text(encoding='utf-8')
        print(f"βœ… read_file successful for: {path}")
        return content
    except SecurityError as e:
        print(f"🚨 Security Error in read_file: {e}")
        return f"Error: {e}"
    except Exception as e:
        print(f"πŸ›‘ Error in read_file for '{path}': {e}")
        return f"Error reading file '{path}': {e}"

# ... (SecurityError class) ...
  • We define a standard Python function read_file that takes a path.
  • It uses pathlib for robust path handling.
  • Important: It includes a basic security check (is_relative_to(Path.cwd())) to prevent the agent from reading files outside the project directory. You might want more sophisticated checks in a real application.
  • It returns the file content or an error message.

2. Integrate Tools into the Agent:

Now, register your tool with the chat session:

# main.py
# ... (imports and tool function definition) ...

from google import genai
from google.genai import types

client = genai.Client(api_key=API_KEY)
MODEL_NAME = "gemini-2.0-flash"  # Or another Gemini model

tool_config = types.GenerateContentConfig(
    tools=[read_file],  # Add more tool functions here as needed
)

Step 4: Adding More Tools: list_files

Let's give the agent the ability to see what files are in a directory, similar to the ls command.

1. Define the Tool Function:

Add this before the Agent class:

# main.py
# ... (imports, read_file function) ...

def list_files(directory: str = '.') -> str:
    """
    Lists files and directories within a given relative path.

    Args:
        directory: The relative path of the directory to list. Defaults to the current directory.

    Returns:
        A string representing a list of files and directories,
        with directories indicated by a trailing slash, or an error message.
    """
    print(f"πŸ› οΈ Executing list_files tool with path: {directory}")
    if not self._is_safe_path(directory):
        raise SecurityError(f"Access denied: Path '{directory}' is outside the allowed directory.")
    try:
         # Security check
        if not Path(directory).is_relative_to(Path.cwd()):
            raise SecurityError(f"Access denied: Path '{directory}' is outside the project directory.")

        if not Path(directory).is_dir():
            return f"Error: Path '{directory}' is not a directory or does not exist."

        items = []
        for item in Path(directory).iterdir():
            # Construct relative path from CWD for consistent view
            relative_item_path = item.relative_to(Path.cwd())
            if item.is_dir():
                items.append(f"{relative_item_path}/")
            else:
                items.append(str(relative_item_path))

        print(f"βœ… list_files successful for: {directory}")
        return "\n".join(items) # Return as a string

    except SecurityError as e:
        print(f"🚨 Security Error in list_files: {e}")
        return f"Error: {e}"
    except Exception as e:
        print(f"πŸ›‘ Error in list_files for '{directory}': {e}")
        return f"Error listing files in '{directory}': {e}"

# ... (SecurityError class) ...
  • Uses pathlib again.
  • Includes the same security check.
  • Defaults to the current directory (.) if no path is given.
  • Returns a string containing a list of file/directory names. Directories have a trailing /. This structured format is often easier for the LLM to parse reliably than plain text.

2. Add to the Agent:

Modify the Agent.__init__ method:

# main.py
# ... (imports, tool functions, tool declarations) ...

class Agent:
    def __init__(self, model_name: str):
        print(f"✨ Initializing Agent with model: {model_name}")
        self.model_name = model_name
        self.model = None
        self.chat = None
        # Store tool functions and their definitions
        self.tool_functions = [self.read_file, self.list_files]
        print(f"πŸ€– Agent initialized with model: {self.model_name}")
        # The rest of the class remains the same...

    # ... (_initialize_chat, run methods) ...

# ... (SecurityError class and Main Execution) ...

πŸ§ͺ Quick Test #2: Using list_files

  1. Create a dummy file in your project directory:

    echo "This is the content of my test file." > my_test_file.txt
  2. Make sure your GOOGLE_API_KEY is set and run the agent:

    python main.py
  3. Interact with the agent:

    ✨ Initializing Agent with model: gemini-2.0-flash
    πŸ€– Agent initialized with model: gemini-2.0-flash
    
    πŸ’¬ Chat with Gemini (type 'quit' or 'exit' to end)
    ------------------------------
    πŸ”΅ You: What files are in the current directory?
    🧠 Thinking...
    πŸ› οΈ Executing list_files tool with path: .   # <-- Tool called!
    βœ… list_files successful for: .
    🟠 Gemini: The current directory contains the following items:
    my_test_file.txt
    πŸ”΅ You: Thanks! Now tell me what's in main.py
    🧠 Thinking...
    πŸ› οΈ Executing read_file tool with path: main.py # <-- Step 1: Read the relevant file
    βœ… read_file successful for: main.py
    🟠 Gemini: Okay, I've read the `main.py` file. It sets up a Gemini-based chat agent with the capability to read local files (`read_file` tool) and list directory contents (`list_files` tool). It uses the `google-genai` library and handles user interaction in a loop.
    
    πŸ”΅ You: quit
    πŸ‘‹ Goodbye!
    

Fantastic! The agent first used list_files to see main.py, then used read_file to examine it, just like a human would. It's combining tools to fulfill a more complex request.

Step 5: The Grand Finale: The edit_file Tool

This is where it gets really interesting. Let's allow the agent to modify files.

⚠️ Warning: Giving an AI write access to your file system is inherently risky. The simple string replacement method here is less dangerous than arbitrary code execution, but still, use with caution, especially outside this controlled example. The security checks are minimal.

1. Define the Tool Function:

This implementation uses a simple, yet surprisingly effective, string replacement strategy. It's not as robust as diff/patch or AST manipulation but works well for many common edits LLMs suggest.

Add this before the Agent class:

# main.py
# ... (imports, other tool functions/declarations) ...

def edit_file(self, path: str, content: str) -> str:
    """
    Writes content to a file at the given relative path, overwriting it.
    Creates the file (and directories) if it doesn't exist.

    Args:
        path: The relative path of the file to edit (e.g., 'src/code.js', 'config.txt').
        content: The content to write to the file.

    Returns:
        "OK" on success, or an error message.
    """
    print(f"πŸ› οΈ Executing edit_file: {path}")
    if not self._is_safe_path(path):
        raise SecurityError(f"Access denied: Path '{path}' is outside the allowed directory.")
    try:
        file_path = Path(path).resolve()
        # Optional: Add check if it's a directory?
        file_path.parent.mkdir(parents=True, exist_ok=True)
        file_path.write_text(content, encoding='utf-8')
        print(f"βœ… edit_file successful for: {path}")
        return "OK"
    except SecurityError as e:
        print(f"🚨 Security Error in edit_file: {e}")
        return f"Error: {e}"
    except Exception as e:
        print(f"πŸ›‘ Error in edit_file for '{path}': {e}")
        return f"Error editing file '{path}': {e}"

# ... (SecurityError class) ...
  • Takes path, content.
  • Crucially: If the file doesn't exist, it creates the file (and directories) and writes the content.
  • If the file exists, it overwrites the content.
  • Creates parent directories if needed (file_path.parent.mkdir).

2. Add to the Agent:

Modify Agent.__init__:

# main.py
# ... (imports, tool functions, tool declarations) ...

class Agent:
    def __init__(self, model_name: str):
        print(f"✨ Initializing Agent with model: {model_name}")
        # ... (model_name, model, chat setup) ...
        self.tool_functions = [self.read_file, self.list_files, self.edit_file]
        print(f"πŸ€– Agent initialized with model: {self.model_name}")
        # ... (rest of __init__) ...

    # ... (_initialize_chat, run methods) ...

# ... (SecurityError class and Main Execution) ...

πŸ§ͺ Quick Test #3: Creating and Editing Files

Let's put it through its paces. Run python main.py.

Scenario 1: Create a FizzBuzz script

πŸ”΅ You: create a python script named fizzbuzz.py that prints fizzbuzz from 1 to 20 and then runs itself. make sure it exits when done.
🧠 Thinking...
πŸ› οΈ Executing edit_file: fizzbuzz.py   # <-- Tool called with new file!
βœ… edit_file successful for: fizzbuzz.py
🟠 Gemini: OK. I have created the `fizzbuzz.py` script. It defines a `fizzbuzz` function and prints the sequence from 1 to 20 when executed.

You can run it from your terminal using: `python fizzbuzz.py`

Check: A fizzbuzz.py file should now exist in your directory! Run it:

python fizzbuzz.py

You should see the FizzBuzz output up to 20. It worked!

Scenario 2: Edit the FizzBuzz script

πŸ”΅ You: please edit fizzbuzz.py so it only prints up to 10 instead of 20
🧠 Thinking...
πŸ› οΈ Executing read_file tool with path: fizzbuzz.py # <-- Reads first to understand
βœ… read_file successful for: fizzbuzz.py
πŸ› οΈ Executing edit_file: fizzbuzz.py # <-- Edits the number
βœ… edit_file successful for: fizzbuzz.py
πŸ› οΈ Executing edit_file: fizzbuzz.py # <-- Edits the print statement
βœ… edit_file successful for: fizzbuzz.py
πŸ› οΈ Executing edit_file: fizzbuzz.py # <-- Even edits the docstring!
βœ… edit_file successful for: fizzbuzz.py
🟠 Gemini: OK, I've edited the `fizzbuzz.py` script. It will now print the FizzBuzz sequence only up to 10 instead of 20. I also updated the relevant comment and print statement to reflect this change.

Check: Run python fizzbuzz.py again. It should now only print up to 10. The agent read the file, identified the necessary changes (including in comments!), and used the edit_file tool multiple times.

Scenario 3: A slightly more complex request (ROT13 Decode)

πŸ”΅ You: Create a python script congrats.py that rot13 decodes the string 'Pbatenghyngvbaf ba ohvyqvat n clguba ntrag!' and prints the result.
🧠 Thinking...
πŸ› οΈ Executing edit_file: congrats.py   # <-- Tool called with new file!
βœ… edit_file successful for: congrats.py
🟠 Gemini: OK, I have created the `congrats.py` script. It uses the `codecs` module to perform ROT13 decoding on the provided string and print the result.

You can run it using: `python congrats.py`

Check: Run python congrats.py. It should print:

Congratulations on building a python agent!

It does!

Conclusion: The Emperor Has No Clothes (Just an LLM, a Loop, and Tools)

If you followed along, you now have a functioning, file-reading, file-listing, code-editing agent built with Python and the Gemini API. And how much code was it really? Excluding comments and blank lines, likely under 200 lines.

This is the fundamental loop. The "magic" is largely the incredible capability of modern LLMs like Gemini to understand requests, plan steps, and correctly utilize the tools you provide. They are trained for this!

Of course, building a production-ready agent involves more:

  • Robust Error Handling: What if a tool fails? What if the LLM gets stuck in a loop?
  • Better Editing: Using diff/patch or Abstract Syntax Trees (ASTs) for more reliable code changes.
  • Context Management: Providing more relevant context (e.g., open files in an IDE, project structure).
  • Planning & Reasoning: More sophisticated prompts or multi-step agent frameworks (like ReAct or LangChain/LlamaIndex agents) for complex tasks.
  • Sandboxing: Safely executing code generated by the LLM.
  • User Interface: Integrating into editors or chat interfaces.

But the core mechanism? You just built it. It's an LLM, a loop, and tools. The rest is engineering.

These models are powerful. Go experiment! See how far you can push this simple agent. Ask it to refactor code, add features, write tests. You might be surprised by how capable it already is. The landscape of software development is changing, and you now have a foundational understanding of how these agents work under the hood.

Congratulations on building a Python agent!

Bonus Section: Adding Bash Command Execution

Want to give your agent the ability to run shell commands? You can add another tool, but be extremely careful about security. Only allow specific, safe commands.

Here's how we added a tool to execute a restricted set of bash commands:

1. The Tool Function (execute_bash_command)

We need the subprocess module. The function checks a command against a hardcoded whitelist before executing it using subprocess.run in the project's root directory.

# main.py
import subprocess

def execute_bash_command(command: str) -> str:
    """Executes a whitelisted bash command in the project's root directory.

    Allowed commands (including arguments):
    - ls ...
    - cat ...
    - git add ...
    - git status ...
    - git commit ...
    - git push ...

    Args:
        command: The full bash command string to execute.

    Returns:
        The standard output and standard error of the command, or an error message.
    """
    print(f"\n\u2692\ufe0f Tool: Executing bash command: {command}")

    # Define the whitelist of allowed command *prefixes*
    whitelist = ["ls", "cat", "git add", "git status", "git commit", "git push"]

    is_whitelisted = False
    for prefix in whitelist:
        if command.strip().startswith(prefix):
            is_whitelisted = True
            break

    if not is_whitelisted:
        return f"Error: Command '{command}' is not allowed. Only specific commands (ls, cat, git add/status/commit/push) are permitted."

    try:
        # Execute in project root, capture output, don't check exit code directly
        result = subprocess.run(
            command, 
            shell=True, 
            capture_output=True, 
            text=True, 
            cwd=project_root, # project_root should be defined globally
            check=False 
        )
        # --- Print stdout directly to console for visibility ---
        if result.stdout:
            print(f"\n\u25b6\ufe0f Command Output (stdout):\n{result.stdout.strip()}")
        # --- Format output for LLM --- 
        output = f"--- stdout ---\n{result.stdout}\n--- stderr ---\n{result.stderr}"
        if result.returncode != 0:
            output += f"\n--- Command exited with code: {result.returncode} ---"
        return output.strip()

    except Exception as e:
        return f"Error executing command '{command}': {e}"

2. Register the Tool

Add the new function to the self.tool_functions list in CodeAgent.__init__.

# In CodeAgent.__init__
self.tool_functions = [read_file, list_files, edit_file, execute_bash_command]

3. Example Interaction

Now you can ask the agent to run allowed commands:

πŸ”΅ You: show the git status using bash

⏳ Sending message and processing...

βš’οΈ Tool: Executing bash command: git status

▢️ Command Output (stdout):
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   implementation_plan.md

no changes added to commit (use "git add" and/or "git commit -a")


🟒 Agent: --- stdout ---
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   implementation_plan.md

no changes added to commit (use "git add" and/or "git commit -a")
--- stderr ---

This adds significant power, but also risk. Expand the whitelist cautiously!

Advanced: Sandboxed Execution with Docker

For running potentially complex or less trusted commands (like test suites or code generation checks), simple whitelisting isn't enough. Sandboxing provides better isolation.

Prerequisites:

  • Docker Desktop (or daemon): Must be installed and running on your system.
  • docker Python Library: Add docker = "^7.0.0" (or latest) to your pyproject.toml and run poetry install.

1. Checking Docker Status

First, a helper function to see if Docker is ready:

# main.py
import docker
from docker.errors import DockerException

def _check_docker_running() -> tuple[bool, docker.DockerClient | None, str]:
    """Checks if the Docker daemon is running and returns client or error."""
    try:
        client = docker.from_env()
        client.ping() # Verify connection
        return True, client, "Docker daemon is running."
    except DockerException as e:
        error_msg = (
            f"Docker connection failed: {e}\n"
            "Please ensure Docker Desktop (or docker daemon) is running."
        )
        return False, None, error_msg
    except Exception as e:
        # Catch other potential issues like docker library not installed
        return False, None, f"Error checking Docker status: {e}"

2. The Tool Function (run_in_sandbox)

This function takes a command and an optional image name. It checks Docker status, then attempts to run the command inside a container with specific security settings (no network, resource limits).

# main.py
def run_in_sandbox(command: str, image: str = "python:3.11-slim") -> str:
    """Executes a command inside a sandboxed Docker container.

    Uses a specified image (defaulting to python:3.11-slim).
    The project directory is mounted at /app.
    Network access is disabled for security.
    Resource limits (CPU, memory) are applied.

    Args:
        command: The command string to execute inside the container's /app directory.
        image: The Docker image to use (e.g., 'python:3.11', 'node:18').

    Returns:
        The combined stdout/stderr from the container, or an error message.
    """
    print(f"\n\u2692\ufe0f Tool: Running in sandbox (Image: {image}): {command}")

    is_running, client, message = _check_docker_running()
    if not is_running:
        return f"Error: Cannot run sandbox. {message}"

    try:
        print(f"\n\u23f3 Starting Docker container (image: {image})...")
        # Note: project_root must be defined globally
        container_output = client.containers.run(
            image=image,
            command=f"sh -c '{command}'", 
            working_dir="/app",
            volumes={str(project_root): {'bind': '/app', 'mode': 'rw'}},
            remove=True,        
            network_mode='none',
            mem_limit='512m',   
            cpus=1,             
            detach=False,       
            stdout=True,        
            stderr=True         
        )
        output_str = container_output.decode('utf-8').strip()
        print(f"\n\u25b6\ufe0f Sandbox Output:\n{output_str}")
        return f"--- Container Output ---\n{output_str}"

    except DockerException as e:
        error_msg = f"Docker error during sandbox execution: {e}"
        print(f"\n\u274c {error_msg}")
        if "not found" in str(e).lower() or "no such image" in str(e).lower():
             error_msg += f"\nPlease ensure the image '{image}' exists locally or can be pulled."
        return f"Error: {error_msg}"
    except Exception as e:
        error_msg = f"Unexpected error during sandbox execution: {e}"
        print(f"\n\u274c {error_msg}")
        return f"Error: {error_msg}"

3. Register the Tool

Add run_in_sandbox to the self.tool_functions list in CodeAgent.__init__.

# In CodeAgent.__init__
self.tool_functions = [
    read_file, list_files, edit_file, 
    execute_bash_command, run_in_sandbox
]

4. Example Interaction

Assuming you have pytest installed within your project environment (and thus available in the mounted /app directory within the container):

πŸ”΅ You: run pytest using the sandbox

⏳ Sending message and processing...

βš’οΈ Tool: Running in sandbox (Image: python:3.11-slim): pytest

⏳ Starting Docker container (image: python:3.11-slim)...

▢️ Sandbox Output:
============================= test session starts ==============================
platform linux -- Python 3.11.7, pytest-7.4.3, pluggy-1.3.0
rootdir: /app
plugins: anyio-4.2.0
collected 1 item

tests/test_example.py .                                                [100%]

============================== 1 passed in 0.01s ===============================


🟒 Agent: --- Container Output ---
============================= test session starts ==============================
platform linux -- Python 3.11.7, pytest-7.4.3, pluggy-1.3.0
rootdir: /app
plugins: anyio-4.2.0
collected 1 item

tests/test_example.py .                                                [100%]

============================== 1 passed in 0.01s ===============================

Security Considerations:

  • This provides significantly better isolation than the simple bash tool.
  • Disabling the network (network_mode='none') is a major security enhancement.
  • Resource limits (mem_limit, cpus) prevent denial-of-service.
  • Risk: The project directory is mounted read-write ('mode': 'rw'). A malicious command could still modify or delete files within your project directory inside the container. For higher security, explore mounting read-only ('mode': 'ro') where possible, or mounting only specific subdirectories needed by the command.
  • This assumes the Docker daemon itself is secure.

## ✨ Next Steps: Persistent History & Token Counting

Right now, our agent is stateless. Each time you send a message, it only sees *that message*. It has no memory of the previous turns. While the `google-genai` SDK offers a `ChatSession` object (`client.chats.create()`) that handles history automatically for conversation context, the basic `client.models.generate_content` method we used here doesn't.

Let's add a feature we implemented in our main project: displaying the token count of the conversation history in the prompt. This requires us to *manually* track the history *just for counting purposes*.

1.  **Add History Tracking to `__init__`:**

    Modify the `Agent.__init__` method to include a list to store the conversation and a variable for the current count:

    ```python
    # Inside Agent class
    def __init__(self, model_name: str):
        # ... (existing code) ...
        self.tool_functions = [self.read_file, self.list_files, self.edit_file]
        self.conversation_history = [] # Manual history for token counting ONLY
        self.current_token_count = 0 # Store token count for the next prompt
        print(f"πŸ€– Agent initialized with model: {self.model_name}")
    ```

2.  **Modify `start_interaction`:**

    Update the loop to manage the history, count tokens, and display the count:

    ```python
    # Inside Agent class
    def start_interaction(self):
        """Starts the main interaction loop."""
        print("\nπŸ’¬ Chat with Gemini (type 'quit' or 'exit' to end)")
        print("-" * 30)

        while True:
            try:
                # Display token count from *previous* turn
                prompt_text = f"\nπŸ”΅ You ({self.current_token_count}): "
                user_input = input(prompt_text).strip()

                if user_input.lower() in ["quit", "exit"]:
                    print("πŸ‘‹ Goodbye!")
                    break
                if not user_input:
                    continue

                # Add user message to manual history BEFORE sending
                new_user_content = types.Content(parts=[types.Part(text=user_input)], role="user")
                self.conversation_history.append(new_user_content)

                print("🧠 Thinking...")
                tool_config = types.GenerateContentConfig(
                    tools=self.tool_functions
                )

                # IMPORTANT: Still sending only the LATEST input to generate_content
                response = self.client.models.generate_content(
                    model=self.model_name,
                    contents=user_input, # This remains unchanged!
                    config=tool_config,
                )

                # Add agent response to manual history AFTER getting it
                agent_response_content = None
                if response.candidates and response.candidates[0].content:
                    agent_response_content = response.candidates[0].content
                    self.conversation_history.append(agent_response_content)

                # Print response
                if response.text:
                    print(f"🟠 \x1b[93mGemini:\x1b[0m {response.text}")
                else:
                    print("🟠 \x1b[93mGemini:\x1b[0m (No text response received)")

                # Calculate and store token count for the *next* prompt
                try:
                    token_count_response = self.client.models.count_tokens(
                        model=self.model_name,
                        contents=self.conversation_history # Use the updated manual history
                    )
                    self.current_token_count = token_count_response.total_tokens
                except Exception as count_error:
                    # Don't block interaction if counting fails
                    print(f"\n⚠️ Could not update token count: {count_error}")

            except KeyboardInterrupt:
                print("\nπŸ‘‹ Exiting...")
                break
            except Exception as e:
                print(f"❌ An error occurred: {e}")
    ```

With these changes, your tutorial agent will now show the running token count based on the manually tracked history, making it behave more like the final version we built!

Remember, this manual history (`self.conversation_history`) is *only* used for the `count_tokens` call. The actual `generate_content` call still only sends the single, latest `user_input`. For true conversational context within the LLM itself using this SDK, you'd typically use `client.chats.create()` and `chat.send_message()`.