The AI Browser Automation Framework
Read the Docs
If you're looking for other languages, you can find them here
Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.
Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.
-
Choose when to write code vs. natural language: use AI when you want to navigate unfamiliar pages, and use code when you know exactly what you want to do.
-
Go from AI-driven to repeatable workflows: Stagehand lets you preview AI actions before running them, and also helps you easily cache repeatable actions to save time and tokens.
-
Write once, run forever: Stagehand's auto-caching combined with self-healing remembers previous actions, runs without LLM inference, and knows when to involve AI whenever the website changes and your automation breaks.
A Rust client library for Stagehand, the AI-powered browser automation framework. This SDK provides an async-first, type-safe interface for controlling Browserbase browsers and performing AI-driven web interactions.
Caution
This is an ALPHA release and is not production-ready. Please provide feedback and let us know if you have feature requests / bug reports!
- Browserbase Cloud Support: Drive Browserbase cloud browser sessions (local coming soon)
- AI-Driven Actions: Use natural language instructions to interact with web pages
- Structured Data Extraction: Extract typed data from pages using Serde schemas
- Element Observation: Identify and analyze interactive elements on pages
- Agent Execution: Run multi-step AI agents with the
executemethod - Streaming Responses: Real-time progress updates via Server-Sent Events (SSE)
- CDP Access: Get the CDP WebSocket URL to connect external tools like chromiumoxide
Add to your Cargo.toml:
[dependencies]
stagehand_sdk = "0.3"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
futures = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
dotenvy = "0.15"The SDK supports both tokio and async-std runtimes. Tokio is enabled by default.
Using tokio (default):
[dependencies]
stagehand_sdk = "0.3"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }Using async-std:
[dependencies]
stagehand_sdk = { version = "0.3", default-features = false, features = ["async-std-runtime"] }
async-std = { version = "1", features = ["attributes"] }use stagehand_sdk::{Stagehand, V3Options, Env, Model, TransportChoice};
use stagehand_sdk::{ActResponseEvent, ExtractResponseEvent};
use futures::StreamExt;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
#[derive(Serialize, Deserialize, Debug)]
struct Quote {
text: String,
author: String,
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
// Load environment variables from .env file
dotenvy::dotenv().ok();
// Environment variables required:
// - BROWSERBASE_API_KEY
// - BROWSERBASE_PROJECT_ID
// - MODEL_API_KEY (LLM provider API key)
// 1. Connect to Stagehand cloud API (uses STAGEHAND_BASE_URL env var or default)
let mut stagehand = Stagehand::connect(TransportChoice::default_rest()).await?;
// 2. Start session
let opts = V3Options {
env: Some(Env::Browserbase),
model: Some(Model::String("openai/gpt-5-nano".into())),
verbose: Some(2),
..Default::default()
};
stagehand.start(opts).await?;
println!("Session ID: {:?}", stagehand.session_id());
// 3. Navigate to a page
let mut act_stream = stagehand.act(
"Go to https://quotes.toscrape.com/",
None,
HashMap::new(),
Some(60_000),
None,
).await?;
while let Some(res) = act_stream.next().await {
if let Ok(response) = res {
if let Some(ActResponseEvent::Success(success)) = response.event {
println!("Navigation success: {}", success);
}
}
}
// 4. Extract structured data
let schema = serde_json::json!({
"type": "object",
"properties": {
"text": { "type": "string" },
"author": { "type": "string" }
}
});
let mut extract_stream = stagehand.extract(
"Extract the first quote on the page",
schema,
None,
Some(60_000),
None,
None,
).await?;
while let Some(res) = extract_stream.next().await {
if let Ok(response) = res {
if let Some(ExtractResponseEvent::DataJson(json)) = response.event {
println!("Quote: {}", json);
}
}
}
// 5. End session
stagehand.end().await?;
Ok(())
}Create a .env file in your project root:
# Browserbase API credentials (required)
BROWSERBASE_API_KEY=your_browserbase_api_key_here
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here
# Model API key
MODEL_API_KEY=your_api_key # OpenAI, Anthropic, Gemini, etc. key
# Optional: Custom API URLs
STAGEHAND_BASE_URL=https://api.stagehand.browserbase.com/v1 # Stagehand API (default)
BROWSERBASE_API_URL=https://api.browserbase.com/v1 # Browserbase API (default)The SDK checks for model API keys in the order listed above and uses the first one found.
The main configuration struct for initializing Stagehand:
pub struct V3Options {
// Environment: Local or Browserbase
pub env: Option<Env>,
// Browserbase credentials (auto-loaded from env vars)
pub api_key: Option<String>,
pub project_id: Option<String>,
pub browserbase_session_id: Option<String>,
pub browserbase_session_create_params: Option<serde_json::Value>,
// Local browser options (coming soon)
// pub local_browser_launch_options: Option<LocalBrowserLaunchOptions>,
// AI model configuration
pub model: Option<Model>,
pub system_prompt: Option<String>,
// Behavior settings
pub self_heal: Option<bool>,
pub wait_for_captcha_solves: Option<bool>,
pub experimental: Option<bool>,
pub dom_settle_timeout_ms: Option<u32>,
pub act_timeout_ms: Option<u32>,
// Logging verbosity (0, 1, or 2)
pub verbose: Option<i32>,
}Specify AI models in two ways:
// Simple string format (recommended)
let model = Model::String("openai/gpt-5-nano".into());
// Detailed configuration with custom API key/base URL
let model = Model::Config {
model_name: "gpt-5-nano".to_string(),
api_key: Some("sk-...".to_string()),
base_url: Some("https://api.openai.com/v1".to_string()),
};Establishes a connection to the Stagehand service.
pub async fn connect(
transport_choice: TransportChoice,
) -> Result<Self, StagehandError>Parameters:
transport_choice-TransportChoice::Rest(base_url)for REST API with explicit URL, or useTransportChoice::default_rest()to use theSTAGEHAND_BASE_URLenv var (falls back to default)
Example:
// Using default (recommended) - checks STAGEHAND_BASE_URL env var, falls back to default
let stagehand = Stagehand::connect(TransportChoice::default_rest()).await?;
// Or with explicit URL
let stagehand = Stagehand::connect(
TransportChoice::Rest("https://api.stagehand.browserbase.com/v1".to_string()),
).await?;Starts a browser session.
pub async fn start(&mut self, opts: V3Options) -> Result<(), StagehandError>Example:
let opts = V3Options {
env: Some(Env::Browserbase),
model: Some(Model::String("openai/gpt-5-nano".into())),
verbose: Some(1),
..Default::default()
};
stagehand.start(opts).await?;
println!("Session: {}", stagehand.session_id().unwrap());Performs browser actions based on natural language instructions.
pub async fn act(
&mut self,
instruction: impl Into<String>,
model: Option<Model>,
variables: HashMap<String, String>,
timeout: Option<u32>,
frame_id: Option<String>,
) -> Result<Pin<Box<dyn Stream<Item = Result<ActResponse, StagehandError>> + Send>>, StagehandError>Parameters:
instruction- Natural language instruction (e.g., "Click the login button")model- Override the default AI modelvariables- Variable substitution map for the instructiontimeout- Operation timeout in millisecondsframe_id- Target a specific iframe
Response Events:
ActResponseEvent::Log(LogLine)- Progress logsActResponseEvent::Success(bool)- Action completion status
Example:
let mut stream = stagehand.act(
"Navigate to https://example.com and click 'More information...'",
None,
HashMap::new(),
Some(60_000),
None,
).await?;
while let Some(res) = stream.next().await {
if let Ok(response) = res {
if let Some(ActResponseEvent::Success(success)) = response.event {
println!("Action succeeded: {}", success);
}
}
}Extracts structured data from web pages using a schema.
pub async fn extract<S: Serialize>(
&mut self,
instruction: impl Into<String>,
schema: &S,
model: Option<Model>,
timeout: Option<u32>,
selector: Option<String>,
frame_id: Option<String>,
) -> Result<Pin<Box<dyn Stream<Item = Result<ExtractResponse, StagehandError>> + Send>>, StagehandError>Parameters:
instruction- What data to extractschema- A Serde-serializable struct defining the expected shapemodel- Override the default AI modeltimeout- Operation timeoutselector- CSS selector to narrow extraction scopeframe_id- Target a specific iframe
Response Events:
ExtractResponseEvent::Log(LogLine)- Progress logsExtractResponseEvent::DataJson(String)- JSON string matching the schema
Example:
#[derive(Serialize, Deserialize, Debug)]
struct ProductInfo {
name: String,
price: String,
description: String,
}
let schema = ProductInfo {
name: String::new(),
price: String::new(),
description: String::new(),
};
let mut stream = stagehand.extract(
"Extract the product information from this page",
&schema,
None,
Some(30_000),
None,
None,
).await?;
while let Some(res) = stream.next().await {
if let Ok(response) = res {
if let Some(ExtractResponseEvent::DataJson(json)) = response.event {
let product: ProductInfo = serde_json::from_str(&json)?;
println!("Product: {:?}", product);
}
}
}Identifies interactive elements on a page.
pub async fn observe(
&mut self,
instruction: Option<String>,
model: Option<Model>,
timeout: Option<u32>,
selector: Option<String>,
frame_id: Option<String>,
) -> Result<Pin<Box<dyn Stream<Item = Result<ObserveResponse, StagehandError>> + Send>>, StagehandError>Parameters:
instruction- Optional AI instruction for analysismodel- Override the default AI modeltimeout- Operation timeoutselector- CSS selector to narrow observation scopeframe_id- Target a specific iframe
Response Events:
ObserveResponseEvent::Log(LogLine)- Progress logsObserveResponseEvent::ElementsJson(String)- JSON array of observed elements
Example:
let mut stream = stagehand.observe(
Some("Find all clickable buttons".to_string()),
None,
Some(30_000),
None,
None,
).await?;
while let Some(res) = stream.next().await {
if let Ok(response) = res {
if let Some(ObserveResponseEvent::ElementsJson(json)) = response.event {
println!("Elements: {}", json);
}
}
}Executes an AI agent with multi-step capabilities.
pub async fn execute(
&mut self,
agent_config: AgentConfig,
execute_options: AgentExecuteOptions,
frame_id: Option<String>,
) -> Result<Pin<Box<dyn Stream<Item = Result<ExecuteResponse, StagehandError>> + Send>>, StagehandError>Parameters:
agent_config- Agent configuration (provider, model, system prompt, CUA mode)execute_options- Execution options (instruction, max steps, highlight cursor)frame_id- Target a specific iframe
Response Events:
ExecuteResponseEvent::Log(LogLine)- Execution progressExecuteResponseEvent::ResultJson(String)- Final result
Example:
use stagehand_sdk::{AgentConfig, AgentExecuteOptions, ModelConfiguration};
let agent_config = AgentConfig {
provider: None,
model: Some(ModelConfiguration::String("openai/gpt-5-nano".into())),
system_prompt: None,
cua: None,
};
let execute_options = AgentExecuteOptions {
instruction: "What is the URL of the current page?".to_string(),
max_steps: Some(10),
highlight_cursor: None,
};
let mut stream = stagehand.execute(
agent_config,
execute_options,
None,
).await?;
while let Some(res) = stream.next().await {
if let Ok(response) = res {
if let Some(ExecuteResponseEvent::ResultJson(result)) = response.event {
println!("Result: {}", result);
}
}
}Ends the browser session.
pub async fn end(&mut self) -> Result<(), StagehandError>Example:
stagehand.end().await?;Returns the CDP WebSocket URL for connecting external tools like chromiumoxide.
pub fn browserbase_cdp_url(&self) -> Option<String>The URL format is: wss://connect.browserbase.com?sessionId={sessionId}&apiKey={apiKey}
Example:
// After init(), get the CDP URL to connect chromiumoxide
let cdp_url = stagehand.browserbase_cdp_url()
.expect("CDP URL available after init");
// Connect chromiumoxide to the remote browser
let (browser, handler) = Browser::connect(&cdp_url).await?;See tests/chromiumoxide_integration.rs for a complete example.
See tests/browserbase_live.rs for a complete working example that demonstrates act, extract, and execute.
See tests/chromiumoxide_integration.rs for connecting chromiumoxide to a Browserbase session:
use chromiumoxide::browser::Browser;
use stagehand_sdk::{Stagehand, V3Options, Env, Model, TransportChoice};
async fn example() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
// 1. Create Stagehand session
let mut stagehand = Stagehand::connect(TransportChoice::default_rest()).await?;
stagehand.start(V3Options {
env: Some(Env::Browserbase),
model: Some(Model::String("openai/gpt-5-nano".into())),
..Default::default()
}).await?;
// 2. Get CDP URL and connect chromiumoxide
let cdp_url = stagehand.browserbase_cdp_url().unwrap();
let (browser, mut handler) = Browser::connect(&cdp_url).await?;
// Spawn handler
tokio::spawn(async move {
while let Some(event) = handler.next().await {
if event.is_err() { break; }
}
});
// 3. Use chromiumoxide for direct CDP control
let page = browser.pages().await?.into_iter().next().unwrap();
let screenshot = page.screenshot(Default::default()).await?;
// 4. Or use Stagehand's AI methods
let mut stream = stagehand.act("Click the login button", None, Default::default(), None, None).await?;
// ...
stagehand.end().await?;
Ok(())
}The SDK uses StagehandError for all error cases:
pub enum StagehandError {
Transport(String), // Network/connection errors
Api(String), // API response errors
MissingApiKey(String), // Missing required environment variable
}All errors implement std::error::Error and Display.
# Set up environment variables
cp .env.example .env
# Edit .env with your credentials
# Run all tests
cargo test
# Run specific integration test with output
cargo test test_browserbase_live_extract -- --nocapture
# Run chromiumoxide integration test
cargo test test_chromiumoxide_browserbase_connection -- --nocaptureApache-2.0