This recipe demonstrates how to build a secure code execution assistant that combines:
- Llama-3.1-8B-Instruct for code generation
- E2B Code Interpreter for secure code execution in sandboxed environments
- OpenAI's function calling format for structured outputs
- Rich for beautiful terminal interfaces
- MAX Serve for efficient model serving
The assistant provides:
- Secure code execution in isolated sandboxes
- Interactive Python REPL with natural language interface
- Beautiful output formatting with syntax highlighting
- Clear explanations of code and results
Please make sure your system meets our system requirements.
To proceed, ensure you have the magic
CLI installed with the magic --version
to be 0.7.2 or newer:
curl -ssL https://magic.modular.com/ | bash
or update it via:
magic self-update
Then install max-pipelines
via:
magic global install -u max-pipelines
This recipe requires a GPU with CUDA 12.5 support. Recommended GPUs:
- NVIDIA H100 / H200, A100, A40, L40
-
E2B API Key: Required for sandbox access
- Sign up at e2b.dev
- Get your API key from the dashboard
- Add to
.env
file:E2B_API_KEY=your_key_here
-
Hugging Face Token (optional): For faster model downloads
- Get token from Hugging Face
- Add to
.env
file:HF_TOKEN=your_token_here
-
Download the code using the
magic
CLI:magic init code-execution-sandbox-agent-with-e2b --from modular/max-recipes/code-execution-sandbox-agent-with-e2b cd code-execution-sandbox-agent-with-e2b
-
Copy the environment template:
cp .env.example .env
-
Add your API keys to
.env
-
Test the sandbox:
magic run hello
This command runs a simple test to verify your E2B sandbox setup. You'll see a "hello world" output and a list of available files in the sandbox environment, confirming that code execution is working properly.
-
Start the LLM server:
Make sure the port
8010
is available. You can adjust the port settings in pyproject.toml.magic run server
This launches the Llama model with MAX Serve, enabling structured output parsing for reliable code generation. The server runs locally on port
8010
and uses the--enable-structured-output
flag for OpenAI-compatible function calling. -
Run the interactive agent:
magic run agent
This starts the interactive Python assistant. You can now type natural language queries like:
- "calculate factorial of 5"
- "count how many r's are in strawberry"
- "generate fibonacci sequence up to 10 numbers"
The demo below shows the agent in action, demonstrating:
- Natural language code generation
- Secure execution in the E2B sandbox
- Beautiful output formatting with syntax highlighting
- Clear explanations of the code and results
The system follows a streamlined flow for code generation and execution:
graph TB
subgraph User Interface
CLI[Rich CLI Interface]
end
subgraph Backend
LLM[Llama Model]
Parser[Structured Output Parser]
Sandbox[E2B Sandbox]
Executor[Code Executor]
end
CLI --> LLM
LLM --> Parser
Parser --> Executor
Executor --> Sandbox
Sandbox --> CLI
Here's how the components work together:
-
Rich CLI Interface:
- Provides a beautiful terminal interface
- Handles user input in natural language
- Displays code, results, and explanations in formatted panels
-
Llama Model:
- Processes natural language queries
- Generates Python code using structured output format
- Runs locally via MAX Serve with function calling enabled
-
Structured Output Parser:
- Validates LLM responses using Pydantic models
- Ensures code blocks are properly formatted
- Handles error cases gracefully
-
Code Executor:
- Prepares code for execution
- Manages the execution flow
- Captures output and error states
-
E2B Sandbox:
- Provides secure, isolated execution environment
- Handles file system operations
- Manages resource limits and timeouts
The flow ensures secure and reliable code execution while providing a seamless user experience with clear feedback at each step.
Hello world example (hello.py)
The hello.py
script demonstrates basic E2B sandbox functionality:
from e2b_code_interpreter import Sandbox
from dotenv import load_dotenv
load_dotenv()
sbx = Sandbox() # Creates a sandbox environment
execution = sbx.run_code("print('hello world')") # Executes Python code
# Access execution results
for line in execution.logs.stdout:
print(line.strip())
# List sandbox files
files = sbx.files.list("/")
Key features:
- Sandbox initialization with automatic cleanup
- Code execution in isolated environment
- Access to execution logs and outputs
- File system interaction capabilities
Interactive agent (agent.py)
The agent implements a complete code execution assistant with these additional key features:
- Environment Configuration:
LLM_SERVER_URL = os.getenv("LLM_SERVER_URL", "http://localhost:8010/v1")
LLM_API_KEY = os.getenv("LLM_API_KEY", "local")
MODEL = os.getenv("MODEL", "modularai/Llama-3.1-8B-Instruct-GGUF")
- Tool Definition for Function Calling:
tools = [{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute python code blocks in sequence",
"parameters": CodeExecution.model_json_schema()
}
}]
- Enhanced Code Execution with Rich Output:
def execute_python(blocks: List[CodeBlock]) -> str:
with Sandbox() as sandbox:
full_code = "\n\n".join(block.code for block in blocks)
# Step 1: Show the code to be executed
console.print(Panel(
Syntax(full_code, "python", theme="monokai"),
title="[bold blue]Step 1: Code[/bold blue]",
border_style="blue"
))
execution = sandbox.run_code(full_code)
output = execution.logs.stdout if execution.logs and execution.logs.stdout else execution.text
output = ''.join(output) if isinstance(output, list) else output
# Step 2: Show the execution result
console.print(Panel(
output or "No output",
title="[bold green]Step 2: Result[/bold green]",
border_style="green"
))
return output
-
Three-Step Output Process:
- Code Display: Shows the code to be executed with syntax highlighting
- Result Display: Shows the execution output in a green panel
- Explanation: Provides a natural language explanation of the code and its result
-
Interactive Session Management:
def main():
console.print(Panel("Interactive Python Assistant (type 'exit' to quit)",
border_style="cyan"))
while True:
query = console.input("[bold yellow]Your query:[/bold yellow] ")
if query.lower() in ['exit', 'quit']:
console.print("[cyan]Goodbye![/cyan]")
break
# ... process query ...
- Explanation Generation:
explanation_messages = [
{
"role": "system",
"content": "You are a helpful assistant. Explain what the code did and its result clearly and concisely."
},
{
"role": "user",
"content": f"Explain this code and its result:\n\nCode:\n{code}\n\nResult:\n{result}"
}
]
The agent uses OpenAI's structured output format to ensure reliable code generation and execution. Here's how it works:
- Structured Data Models:
from pydantic import BaseModel
from typing import List
# Define the expected response structure
class CodeBlock(BaseModel):
type: str
code: str
class CodeExecution(BaseModel):
code_blocks: List[CodeBlock]
- Tool Definition:
# Define the function calling schema
tools = [{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute python code blocks in sequence",
"parameters": CodeExecution.model_json_schema()
}
}]
- LLM Client Setup:
from openai import OpenAI
# Configure the client with local LLM server
client = OpenAI(
base_url=LLM_SERVER_URL, # "http://localhost:8010/v1"
api_key=LLM_API_KEY # "local"
)
- Message Construction:
messages = [
{
"role": "system",
"content": """You are a Python code execution assistant. Generate complete, executable code based on user queries.
Important rules:
1. Always include necessary imports at the top
2. Always include print statements to show results
3. Make sure the code is complete and can run independently
4. Test all variables are defined before use
"""
},
{
"role": "user",
"content": query
}
]
- Structured Response Parsing:
try:
# Parse the response into structured format
response = client.beta.chat.completions.parse(
model=MODEL,
messages=messages,
response_format=CodeExecution
)
# Extract code blocks from the response
code_blocks = response.choices[0].message.parsed.code_blocks
# Execute the code
result = execute_python(code_blocks)
except Exception as e:
console.print(Panel(f"Error: {str(e)}", border_style="red"))
- Example Response Structure:
{
"code_blocks": [
{
"type": "python",
"code": "def factorial(n):\n if n == 0:\n return 1\n return n * factorial(n-1)\n\nresult = factorial(5)\nprint(f'Factorial of 5 is: {result}')"
}
]
}
- Explanation Generation:
# Generate explanation using vanilla completion
explanation_messages = [
{
"role": "system",
"content": "You are a helpful assistant. Explain what the code did and its result clearly and concisely."
},
{
"role": "user",
"content": f"Explain this code and its result:\n\nCode:\n{code_blocks[0].code}\n\nResult:\n{result}"
}
]
final_response = client.chat.completions.create(
model=MODEL,
messages=explanation_messages
)
explanation = final_response.choices[0].message.content
Key benefits of this structured approach:
- Type Safety: Pydantic models ensure response validation
- Reliable Parsing: Structured format prevents parsing errors
- Consistent Output: Guaranteed code block structure
- Error Handling: Clear error messages for parsing failures
- Separation of Concerns:
- Code generation with structured output
- Code execution in sandbox
- Explanation generation with free-form text
This structured approach ensures that:
- The LLM always generates valid, executable code
- The response can be reliably parsed and executed
- Error handling is consistent and informative
- The execution flow is predictable and maintainable
You can interact with the agent using natural language queries like:
- Test with querying "Hi" and see the agent responds by generating
print("Hello")
and executing it - Find fibonacci 100
- Sum of all twin prime numbers below 1000
- How many r's are in the word strawberry?
-
System Prompt:
- Ensures complete, executable code
- Requires necessary imports
- Mandates print statements for output
- Enforces variable definition
-
Code Execution Flow:
- Code generation by LLM
- Parsing into structured blocks
- Secure execution in sandbox
- Result capture and formatting
- Explanation generation
-
Error Handling:
- Sandbox execution errors
- JSON parsing errors
- LLM response validation
- Model Selection:
MODEL = os.getenv("MODEL", "modularai/Llama-3.1-8B-Instruct-GGUF")
- Sandbox Configuration:
Sandbox(timeout=300) # Configure timeout
- Output Formatting:
# Customize Rich themes and styles
console.print(Panel(..., theme="custom"))
-
Sandbox Issues
- Error: "Failed to create sandbox"
- Solution: Check E2B API key
- Verify network connection
-
LLM Issues
- Error: "Failed to parse response"
- Solution: Check server is running
- Verify structured output format
-
Code Execution Issues
- Error: "No output"
- Solution: Check print statements
- Verify code completeness
-
Enhance the System
- Add file upload capabilities
- Implement persistent sessions
- Add support for more languages
- Implement caching for responses
-
Deploy to Production
- Deploy MAX Serve on AWS, GCP or Azure
- Set up CI/CD for documentation generation
- Add monitoring and observability
- Implement rate limiting and authentication
-
Join the Community
- Explore MAX documentation
- Join our Modular Forum
- Share your projects with
#ModularAI
on social media
We're excited to see what you'll build with this foundation!