Download all docs

Recipe: Authenticated Browser Scrape

This recipe drives a logged-in browser session and reads data that only appears behind a sign-in wall. A cookie-jar holds the session credentials outside any single browser, and a chromeless browser bonds to that jar, restores the session, and navigates the authenticated pages.

The problem it solves

The data you want is behind a login, and a bare HTTP fetch only ever sees the logged-out page. You also do not want to paste a password into a script, or lose your session every time the browser is rebuilt. This recipe keeps the auth state durable and separate, so a fresh browser can pick up exactly where the last one left off.

Elements

ElementRole
cookie-jarDurable store for browser cookies, bonded to one browser at a time.
chromelessHeadless browser session that navigates, reads the DOM, and captures content.

The legacy user-browser element modelled a paired human browser, but it is deprecated — new pairings are chromeless (or a browser with backend=user-extension). Reach for chromeless for new work.

Flow

  1. Create a cookie-jar. Seed it with your session cookies using the jar’s set operation, or import a whole cookie set at once. A jar follows Chrome’s cookie model and survives browser rebuilds.
  2. Create a chromeless browser.
  3. Bond the jar to the browser with the cookie-jar’s attach operation (it sets attached_browser_id and enforces the single-browser invariant). A jar will only hand cookies to the browser it has bonded to.
  4. Hydrate the session: from the browser, call load-cookies-from-jar to seed the live session from the jar. (The reverse, save-cookies-to-jar, copies a freshly logged-in session into the jar so future runs skip the login.)
  5. Navigate with the browser’s goto, then read the page with snapshot (accessibility tree), dom, or source. Interact where you need to with click, fill, type, and select.
  6. Pull per-origin client state with the browser’s storage operation, which reads localStorage and sessionStorage for a given origin.
  7. Capture evidence with screenshot or capture as you go.

What this shows

Auth state is a durable element you own, not a fragile in-memory cookie that dies with the process. The attach bond is the authorization edge — a jar’s load/save only works for the one browser it trusts — so credentials are not sprayed across every browser in the circle. And because chromeless exposes the same CDP surface whether you script it directly or hand it to an agent, the same logged-in session can be read by a workflow today and driven by an agent tomorrow.

Next pages