Recipe: Authenticated Browser Scrape
This recipe drives a logged-in browser session and reads data that only appears behind a sign-in wall. A cookie-jar holds the session credentials outside any single browser, and a chromeless browser bonds to that jar, restores the session, and navigates the authenticated pages.
The problem it solves
The data you want is behind a login, and a bare HTTP fetch only ever sees the logged-out page. You also do not want to paste a password into a script, or lose your session every time the browser is rebuilt. This recipe keeps the auth state durable and separate, so a fresh browser can pick up exactly where the last one left off.
Elements
| Element | Role |
|---|---|
cookie-jar | Durable store for browser cookies, bonded to one browser at a time. |
chromeless | Headless browser session that navigates, reads the DOM, and captures content. |
The legacy
user-browserelement modelled a paired human browser, but it is deprecated — new pairings arechromeless(or abrowserwithbackend=user-extension). Reach forchromelessfor new work.
Flow
- Create a
cookie-jar. Seed it with your session cookies using the jar’ssetoperation, orimporta whole cookie set at once. A jar follows Chrome’s cookie model and survives browser rebuilds. - Create a
chromelessbrowser. - Bond the jar to the browser with the cookie-jar’s
attachoperation (it setsattached_browser_idand enforces the single-browser invariant). A jar will only hand cookies to the browser it has bonded to. - Hydrate the session: from the browser, call
load-cookies-from-jarto seed the live session from the jar. (The reverse,save-cookies-to-jar, copies a freshly logged-in session into the jar so future runs skip the login.) - Navigate with the browser’s
goto, then read the page withsnapshot(accessibility tree),dom, orsource. Interact where you need to withclick,fill,type, andselect. - Pull per-origin client state with the browser’s
storageoperation, which readslocalStorageandsessionStoragefor a given origin. - Capture evidence with
screenshotorcaptureas you go.
What this shows
Auth state is a durable element you own, not a fragile in-memory cookie that dies with the process. The attach bond is the authorization edge — a jar’s load/save only works for the one browser it trusts — so credentials are not sprayed across every browser in the circle. And because chromeless exposes the same CDP surface whether you script it directly or hand it to an agent, the same logged-in session can be read by a workflow today and driven by an agent tomorrow.