Production Engineering · PWA Internals · Practical Notes

EVERYTHING
I WISH I KNEW
BEFORE SHIPPING
A PWA

A practical, technically grounded field guide to Service Workers in production, covering common failure modes, caching traps, browser quirks, and update challenges that are easy to underestimate. Every claim is labelled as spec fact or field experience. Browser-version sourced.

📖 How This Guide Labels Claims

Not all claims in production engineering carry the same weight. Every significant claim in this guide is tagged with one of three levels:

Spec - directly traceable to the WHATWG Service Worker spec, W3C, or IETF. True by definition across compliant engines.
Browser - observed and version-confirmed in specific browsers. May differ in other engines or future releases.
Field - real production failure patterns. Widely reproduced but not a guaranteed rule; context matters.

If a claim has no tag it is either logical inference from the above, or framing/context. Claims phrased as "always/never" without a Spec tag should be read as strong field guidance, not universal law.

Table of Contents

01Mental Model & Lifecycle
02Real Production Failures
03Caching Strategy Deep Dive
04Update Mechanism - Precisely
05Storage & Limits Reality
06Security Risks
07Debugging & Observability
08Cross-Browser Differences
09Advanced Patterns
10Practical Tips

01Mental Model & Lifecycle

The Real Mental Model

It's Not a Cache. It's a Programmable Proxy.

When I first learned service workers, it was easy to file them next to localStorage - as a smarter cache. In production, that mental model can create hard-to-debug issues. A more reliable model is this: a service worker is a persistent, origin-scoped reverse proxy written in JavaScript that runs in a separate thread and outlives any page.

Think of it the way you'd think about an nginx config that you deploy once and that independently handles every request to your domain - except it's running in the user's browser, you cannot SSH into it, it updates asynchronously, and it can keep running even after the user closes all your tabs.

The Lifecycle Is a State Machine, Not a Simple Sequence

The docs show install → activate → fetch as a neat waterfall. Reality is messier.

A new SW sits in waiting state indefinitely if any controlled client (tab) is open. Spec - WHATWG SW spec §7.3. A user who opened your app days ago and never closed the tab is blocking the update for every session on that device.
The install event fires in a ServiceWorkerGlobalScope - no DOM, no window, no document. Spec. Obvious in hindsight, but this still catches many of us at least once when we try to read window.location.
Between install completing and activate firing, the old SW remains in control. Spec. Two workers exist simultaneously. Any shared state (IndexedDB, Cache API) is accessible by both.
After activation, existing clients remain controlled by the old SW until they navigate. Spec §7.9. Even after a new SW is "active," multiple SW versions can be handling requests from the same user concurrently across different tabs.

// What "waiting" looks like from the inside
self.addEventListener('install', (event) => {
  // The OLD SW is still handling ALL requests right now.
  // This SW is installed but not yet active.
  // Per spec (SW §7.3), it waits until every client
  // controlled by the old SW has been navigated or closed.
  // On mobile, where tabs are rarely closed, this can be days.
  event.waitUntil(precache());
});
sw.js

The Event Loop Is Separate - And It Has Teeth

Your SW runs in a ServiceWorkerGlobalScope - a separate event loop from both the page and from other workers. This has non-obvious consequences:

event.waitUntil() keeps the SW alive until a promise settles - it does not provide a response. Spec. In a fetch handler, event.respondWith() supplies the response; waitUntil is for extending the worker's lifetime for background work (cache writes, analytics) that should outlive the response. Without it, the runtime may terminate the SW once it considers the event handled. The confusion: DevTools keeps the SW alive permanently during development, masking this in local testing.
Module-level variables are reset after the SW is terminated - not after every event. Spec. The SW may stay alive across multiple sequential events without being killed. But when the browser does terminate it (which can happen at any idle point), the next wake starts from a clean global scope. Never rely on JS variables surviving across event handlers - use Cache API or IndexedDB for any state that must persist.
Race conditions are structural, not bugs. Spec / Field. The fetch event can fire before activate completes on first install. Navigation preload responses can arrive before the SW is fully awake. These are normal operating conditions, not edge cases.

⚠ The Invisible Third Thread

Your app has: the main thread, any web workers, AND the service worker. Each can independently modify shared storage (IndexedDB, Cache API). There is no shared mutex. There is no transaction isolation across threads. Two threads writing to the same cache key simultaneously produce a last-write-wins race that looks like random data corruption.

Why It "Feels Unpredictable"

SWs can feel random or flaky at first. It is not random - the SW's behaviour is a function of state you cannot observe: which version is installed, which tabs are open, how long ago the SW last woke, what the browser's eviction policy did to the cache overnight. When invisible variables change, behaviour changes. The fix is making this state observable - covered in section 7.

02Real Production Failures

Production Failures

Common Production Failures

These are not hypotheticals. Each one is based on real production behavior seen in shipped apps.

FAILURE 001 The Long-Lived Broken Build

Scenario

A broken build gets shipped and the SW precaches it. A fix goes out 10 minutes later. People who opened the app in that 10-minute window can stay stuck on the broken build until site data is cleared manually - something most users will never do.

Root cause

Cache-first strategy for HTML with no TTL. The broken index.html is served from cache indefinitely. If the SW file itself is byte-identical between the broken and fixed deploys (e.g., no build hash injected), the browser sees no new SW to install and no update is triggered.

How discovered

Support tickets 3 days after the fix was deployed. Users insisting the bug was "still there" when dev couldn't reproduce it. Took an hour to realise affected users had the cached broken version.

Fix

In practice, avoid caching index.html with cache-first. Use network-first for HTML navigations. Auto-inject a build hash into the SW file via your build script so every deploy produces a byte-different SW, which triggers the update check.

FAILURE 002 The Offline Loop - App Worked Offline, Then Served Stale Data Online

Scenario

Offline support is shipped. People lose connectivity and the app works from cache. They come back online, but the app still serves cached API responses - including data from hours ago - because no cache eviction policy was defined for the API cache.

Root cause

Cache-first strategy applied to all fetch requests including API calls. The SW has no concept of "network is back" - it serves cached responses regardless. No expiry logic was attached to the API cache.

How discovered

Users reporting stale cart contents. One user saw a cached sale price that had expired and attempted to checkout at the wrong price. Real financial impact.

Fix

API calls: network-first with a short timeout (2–3s) before falling back to cache. Cache entries for API responses stored with a timestamp; entries older than 5 minutes trigger a background revalidation on next read.

FAILURE 003 The CDN + SW Double-Cache Conflict

Scenario

CDN has a 1-hour cache for JS assets. SW also has cache-first for those assets. New deploy hits the CDN. CDN cache has not expired. SW fetches from CDN, gets the old file, caches it. Both layers hold the old file. Users are on stale code for up to 2+ hours with no developer-controlled recovery path.

Root cause

Two independent caching layers with no coordination. Content-hashed JS assets change filename on every build (cache-busted), but the HTML file referencing those assets was cached at CDN level with a long TTL without a content-hash in its filename.

Fix

Set Cache-Control: no-store on HTML files at the CDN. Long CDN TTLs are safe for content-hashed assets (filename change = automatic cache bust). As a rule, avoid having two independent caching systems cache the same HTML document at the same time.

FAILURE 004 Safari iOS: The Silent Storage Eviction

Scenario

PWA works perfectly in testing. iOS users report the app "randomly" stops working offline, forgets settings, and occasionally needs a full reload on open.

Root cause

Safari on iOS evicts PWA origin storage (Cache API, IndexedDB, SW registration) under storage pressure or after a period of inactivity. Browser - Apple has not published fixed thresholds; the behaviour has changed across iOS versions and should be treated as adaptive, not rule-based. The eviction fires with no warning event and no way to recover cached data. Design for it as a normal event, not a bug.

How discovered

Reports exclusively from iPhone users. Correlated with low-storage devices. Two weeks spent debugging what was assumed to be a logic bug before Safari's eviction policy was found.

Fix

Call navigator.storage.persist() to request persistent storage (Chrome/Firefox honour this; Safari may partially honour it depending on version and install state). Architect offline features assuming eviction is possible. On every app open, verify cache integrity and have a graceful rehydration path for cold-start after eviction.

FAILURE 005 The skipWaiting() Version Mismatch Crash

Scenario

skipWaiting() is added to force immediate SW activation. The new SW activates and claims all clients. A page that was mid-session had been rendered by the old SW (old HTML, old chunk filenames). The new SW no longer has the old chunk filenames in its cache. React's lazy-loaded route chunk returns a 404. White screen. Full app crash.

Root cause

The rendered HTML references old content-hashed filenames. The new SW's precache only includes new filenames and deleted the old ones during activate. The skipWaiting() + clients.claim() combination forces the new SW to serve requests from a page whose HTML still references old assets. Field - widely reproduced; not a guaranteed consequence of every skipWaiting use. The risk scales with how aggressively old caches are deleted in activate and how long sessions run between deploys.

Fix

During SW activate, keep the previous version's assets in cache until all clients have reloaded. Or: rather than force-claiming, prompt the user to refresh and only call skipWaiting() in response to a user action. It is safer not to delete old caches until you are confident no old-version HTML is still active in any client.

⚠ The OAuth Callback + SW Failure (From a Real Postmortem)

The SW intercepts a navigation to /oauth/callback#access_token=…. It receives the URL as /oauth/callback - hash fragments are client-side only and never included in HTTP requests. Spec - RFC 3986 §3.5; WHATWG URL standard. It returns a cached stale index.html. Old JS runs. The old JS has a crash bug fixed in the new deploy. Desktop browser works because it has no SW. This failure is invisible in server-side monitoring - the auth provider logs a successful token issuance; the failure is purely frontend. For safety, consider bypassing the SW for auth callback routes using a URL pattern exclusion in your fetch handler.

03Caching Strategy Deep Dive

Caching Strategy

When Every Strategy Fails

The Workbox docs present five strategies like menu items. What they omit are the conditions under which each one will cause a production incident.

Strategy	When It's Right	When It Burns You
`Cache First`	Immutable assets with content-hashed filenames - fonts, images, versioned JS bundles	HTML navigations, API responses, anything that changes without a filename change. Users can be permanently stuck on old UI.
`Network First`	API calls, HTML navigations, auth flows, anything that must be current	Slow connections: with no timeout, the user waits for the full network timeout before getting the cached fallback. Always pair with a 2–3s timeout.
`Stale While Revalidate`	Avatars, blog posts, product listings - freshness matters but instant display matters more	Any resource mutated by the user's own actions. User submits a form; gets back stale data on next load. Catastrophic for e-commerce carts.
`Cache Only`	Truly offline-first apps with explicit, complete sync logic	Almost always wrong for SPAs unless you have a full offline data layer. Silently returns stale data forever.
`Network Only`	Analytics, payments, auth callbacks, real-time data	Not the "safe default" most treat it as - the SW still intercepts and can fail the request even in network-only mode. Ensure error handling is explicit.

The HTML/JS Chunk Version Mismatch Problem

This is the most under-documented failure mode in PWA engineering. Here is the exact sequence:

User has index.html (v1) served and cached. It references app.a1b2c3.js.
You deploy. New index.html references app.d4e5f6.js. Old chunk is no longer on the server (some CDNs expire old hashes quickly).
New SW installs and activates. During activate, old caches are deleted. app.a1b2c3.js is gone from both cache and network.
User has an open tab. Old index.html is rendered. User navigates. React Router lazy-loads ProductPage.a1b2c3.js - the old name. Cache miss. Network 404. React throws. White screen.

// Defense: global chunk load error handler with reload-loop guard
window.addEventListener('unhandledrejection', (event) => {
  const err = event.reason;
  // Vite lazy chunk load failures have this signature
  if (
    err?.message?.includes('Failed to fetch dynamically imported module') ||
    err?.message?.includes('Importing a module script failed')
  ) {
    const lastReload = +localStorage.getItem('chunkErrorReload') || 0;
    const now = Date.now();
    // 30s cooldown prevents infinite reload loops on genuinely broken deploys
    if (now - lastReload > 30_000) {
      localStorage.setItem('chunkErrorReload', String(now));
      window.location.reload();
    }
  }
});
main.jsx

Cache Invalidation Is a Contract, Not a Strategy

The hardest part of SW caching is usually not choosing a strategy - it is defining an explicit contract for when a cached response is considered invalid. In my experience, if this is not written down early, it can lead to implicit, inconsistent rules across event handlers.

Before writing a single line of SW caching code, answer these questions for every resource type:

How long is this response valid? (TTL in seconds, not "seems OK")
What event should invalidate it immediately - deploy, user logout, user action?
What happens if the user is offline when invalidation should have occurred?
Is serving a stale version while revalidating acceptable for this resource?

Versioning Strategies and Their Pitfalls

Approach 1: Build hash in cache name. Simple. Works. Pitfall: if you precache 20MB of assets, every deploy requires downloading all 20MB again even if only one file changed.

Approach 2: Per-file content hash in filename. Only cache-busts changed files. What Workbox precacheAndRoute with a manifest does by default. Pitfall: the manifest generation must be airtight. Any missed file means stale content served silently; any incorrectly included file is re-downloaded on every deploy.

Approach 3: Runtime caching with TTL. Resources cached on demand with expiry metadata stored alongside them. Most flexible. Pitfall: TTL logic lives inside the SW - bugs there have no easy recovery path. Also, TTL is evaluated on read, not on expiry, so you don't know a cache entry is stale until someone requests it.

⚡ The "Serving JSON as HTML" Bug

Your SW intercepts every request. If cache key logic is incorrect, you can serve a cached JSON API response when the browser expects an HTML document (same URL, different Accept header). The page renders as raw JSON text. I have seen this happen more than once in real projects. Include request.destination in your cache key logic, or use separate named caches per resource type.

04Update Mechanism - Precisely

Update Mechanisms

Why Users Are Still on Last Month's Build

The SW update mechanism is one of the most important things to understand, and also one of the easiest to misunderstand. In real projects, update confusion can leave broken experiences in production for days.

The Two Distinct Mechanisms Behind "The 24-Hour Rule"

A common misunderstanding is that browsers check for a new SW "at most once per 24 hours." This collapses two separate mechanisms into one and produces a misleading picture. Here is what actually happens:

Mechanism 1 - Update Check Frequency

When triggered

On every navigation to a controlled page, and whenever you call registration.update() explicitly. This is not throttled to once per 24 hours. If a user navigates 10 times in a session, the browser attempts to fetch the SW script up to 10 times.

Spec reference

WHATWG Service Worker spec §9.9 "Update algorithm" - the update is triggered on every navigation, with no built-in frequency throttle on the check itself.

Mechanism 2 - HTTP Cache Bypass Threshold (the real "24-hour rule")

What the 24 hours actually means

When fetching the SW script, the browser ignores HTTP cache headers (e.g., Cache-Control: max-age) after 24 hours have elapsed since the last successful update. This is about bypassing the HTTP cache, not about how often the browser checks for an update.

Practical implication

Historically, long-lived cache headers on sw.js could delay updates. Modern Chrome now issues update checks with cache: no-cache by default, so it usually revalidates regardless. You should still serve sw.js with Cache-Control: no-cache (or max-age=0) so correctness does not depend on browser-specific heuristics.

Spec reference

WHATWG SW spec §9.9 step 8.5 - "If the result of parsing registration's last update check time is not null, and the current time is before the time that is 24 hours after registration's last update check time, set request's cache mode to 'no-cache'."

ℹ Practical Consequence

A user who has had a tab open for 23 hours and navigates within your app will trigger an update check on that navigation. In modern Chrome, this already uses revalidation semantics for sw.js; other engines may vary more by version. Always serve sw.js with Cache-Control: no-cache (or max-age=0) so update behaviour stays predictable across browsers.

// Force an update check when tab regains focus
// Catches long-lived tabs and sessions resumed after hours
document.addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'visible') {
    navigator.serviceWorker.ready.then((reg) => {
      reg.update(); // triggers an explicit update check for sw.js
    });
  }
});
main.jsx

What skipWaiting() Actually Does - And What It Doesn't

The docs make skipWaiting() sound like a convenience method. It is a takeover operation. Here is the precise sequence:

Calling self.skipWaiting() from a waiting SW immediately transitions it to activating, bypassing the normal wait for all clients to close.
Fetch events that have already called respondWith() continue to completion normally - they are not dropped.
Fetch events that have not yet called respondWith() may be dropped - but this is a narrow window and the principal risk is elsewhere.
The real danger: pages whose HTML was served by the old SW (with old content-hashed asset filenames) are now controlled by the new SW. When those pages lazily load chunks, the new SW may not have the old chunk filenames in its precache. Result: 404, runtime error, white screen.

When to use it: The new SW is fully backwards-compatible with any page the old SW could have served. You have a reload-after-claim flow (see below). You are willing to accept rare mid-session disruption.

When not to use it: The new SW deletes old asset versions from cache during activate. The user may be mid-transaction (payment, form). You have not implemented the reload-after-claim flow.

The Clean Update Pattern

// sw.js - controlled update flow
self.addEventListener('install', (e) => {
  e.waitUntil(precache()); // cache new assets
  // Do NOT call skipWaiting() here unconditionally.
  // Wait for an explicit user-confirmed signal.
});

self.addEventListener('activate', (e) => {
  e.waitUntil(
    // Clean up old versioned caches - but NOT if you plan to use skipWaiting,
    // because old-SW-served pages may still need the old assets.
    // Only delete caches once you know all clients have reloaded.
    cleanOldCaches()
  );
});

self.addEventListener('message', (e) => {
  if (e.data === 'SKIP_WAITING') {
    self.skipWaiting(); // only called after user confirms "Reload"
  }
});

// main.jsx - full update handshake
let waitingSW = null;

// Step 3: page reloads AFTER new SW has taken control
navigator.serviceWorker.addEventListener('controllerchange', () => {
  window.location.reload();
});

registration.addEventListener('updatefound', () => {
  const newSW = registration.installing;
  newSW.addEventListener('statechange', () => {
    if (newSW.state === 'installed' && navigator.serviceWorker.controller) {
      waitingSW = newSW;
      showUpdateBanner(); // "New version available - Refresh"
    }
  });
});

// User clicks "Refresh" in the banner
function applyUpdate() {
  waitingSW?.postMessage('SKIP_WAITING');
  // controllerchange fires → window.location.reload() above
}
sw.js + main.jsx

Multi-Tab Synchronisation Is Your Problem

The browser does not coordinate SW updates across tabs. Tab 1 triggers an update check. New SW installs, waits. User closes Tab 2 (the one keeping it in waiting). New SW activates. Tabs 1 and 3 are now controlled by the new SW. Tab 4, opened from a cached navigation, may still be on the old SW for that session.

Use the BroadcastChannel API or clients.matchAll() from the SW to coordinate and notify all tabs simultaneously. When this is skipped, it is common to see the same user end up with split state across tabs.

05Storage & Limits Reality

Storage Reality

The Illusion of Infinite Cache

Storage Limits - Observed Behaviour, Not Guarantees Browser

Browsers do not publish contractual storage quotas. The numbers below are derived from documentation, storage API responses, and observed production behaviour. Treat them as approximations - especially Safari, which explicitly uses adaptive quota allocation based on total disk space.

Browser version tested	Quota	Eviction Policy	persist() Support
Chrome / Edge 124+	Origin can use ~60% of total disk size (dynamic, device-dependent; origin-scoped)	LRU by origin when device storage falls below threshold	Honoured after user gesture; grants persistent status
Firefox 126+	Roughly up to ~50% of free disk, with group caps (for example around 10 GiB per eTLD+1); dynamic and implementation-dependent	LRU; persistent origins exempt from eviction	Shows permission prompt; strongly honoured
Safari iOS 17.x	Starts permissive; Apple has not published a hard cap. ~50 MB is a common observed floor for new origins.	Entire origin storage wiped on inactivity or critical storage pressure. Exact threshold undocumented by Apple; has changed between iOS 15, 16, 17. Field observation - not Apple-documented	Partially honoured for installed PWAs on iOS 16.4+; no user-visible prompt; no guarantee
Safari macOS 17.x	More generous than iOS in practice; closer to desktop browser behaviour (approximate)	Less aggressive than iOS; follows desktop storage pressure norms	Honoured but no UI indicator

⚠ Safari iOS Will Evict Your Origin Without Warning

This is a structural constraint of the platform on iOS. The inactivity threshold and the storage-pressure trigger have changed between Safari versions and are not formally documented by Apple. The behaviour is real and reproducible, but the exact trigger conditions should be treated as variable, not fixed. Design your architecture to handle eviction as a normal event, not an error. Every open after eviction must be treated as a cold start with graceful rehydration.

Cache API vs IndexedDB - Real Tradeoffs

Using Cache API for application state - technically works, but you are serialising objects into Response bodies. Querying by field requires reading everything out. Use IndexedDB for structured data.
Using IndexedDB for assets - possible, but the Cache API integrates naturally with fetch interception and is more efficient for binary HTTP responses.
IDB transaction auto-commit - a subtle data-loss trap. Spec - IDB §3.7: transactions commit when there are no more pending requests and the current event loop task is complete. If you await something unrelated between IDB operations, the transaction may have already auto-committed. Subsequent writes throw an InvalidStateError. Whether this surfaces as an observable error depends entirely on whether the calling code has a catch block - without one, it is effectively silent. This is the most common IDB production bug and its error message rarely points to the root cause.

Opaque Responses and Storage Quota - The 7 MB Accounting Quirk Browser

The "opaque responses take 7 MB in cache quota" claim that circulates online requires careful clarification. The response itself is not padded to 7 MB. Rather, Chromium's Cache API implementation charges a minimum of 7 MB of quota for any opaque (cross-origin, no CORS) cached response, regardless of its actual size. This is a conservative accounting mechanism to prevent attackers from inferring cross-origin response sizes through quota measurements. The actual bytes stored may be far smaller, but your quota is charged as if it were 7 MB. Chromium behaviour - Firefox/Safari differ

The practical advice is unchanged: never cache opaque responses. But the mechanism is quota accounting, not storage inflation.

06Security Risks

Security

Security Risks to Plan For

A service worker with a bug is not just a broken feature. It is a persistent, origin-level man-in-the-middle proxy that you installed in your user's browser and cannot easily remove. - From a production XSS incident postmortem

Cache Poisoning via SW

If your SW caches responses without validating them, an attacker who achieves any MITM position (compromised CDN edge, malicious Wi-Fi, BGP hijack) can inject a malicious response that gets cached by your SW and served to the user long after the MITM position is gone. The attacker achieves persistent presence without persistent network access.

Defense: only cache responses where response.ok === true and response.type === 'basic' (same-origin). It is usually safer not to cache opaque responses - you cannot inspect their status or content. If you must cache third-party content, validate a content hash on retrieval.

// Refuse to cache opaque or error responses
async function safePut(cache, request, response) {
  if (response.type === 'opaque') {
    // Opaque: status is always 0. Could be a 200 or a 503 or injected content.
    // You have no way to tell. Refuse to cache.
    console.warn('Refusing to cache opaque response for', request.url);
    return;
  }
  if (!response.ok) return; // never cache 4xx / 5xx
  await cache.put(request, response);
}
sw.js

Persistent XSS via Service Worker

Classic XSS injects code into a page - it vanishes on reload. XSS that registers a service worker (or modifies an existing one's cache) achieves persistent XSS. The injected code can survive navigation and browser restart. Field. Remediation options include clearing all site data for the origin, unregistering the compromised SW via a "kill switch" deployment, or pushing a corrective SW update that cleans the poisoned cache - clearing site data is the most complete option but not always the only one.

This is why your SW file should be served with a strict Content-Security-Policy header and should not be dynamically generated or templated from any user-controlled input. A SW file that includes user-controlled data is a critical vulnerability, not a minor misconfiguration.

Auth Tokens in Cache - Silent PII Exposure

It is common to cache authenticated API responses without fully considering sensitivity. A cached response for GET /api/user/profile containing PII sits in Cache API storage. On a shared device, or after logout without cache purging, another user on the same device can potentially access it - especially if the SW does not gate cached responses on auth state.

Rule: On every logout event, explicitly clear all caches containing user-specific data. Scope user-data caches separately from your app-shell cache so you can delete them surgically without invalidating static assets.

The SW Scope Hijack Risk

A SW registered at / controls all requests from that origin. On platforms where users can upload content to paths like yourapp.com/uploads/, improperly validated HTML+JS file uploads at that path could register a SW controlling the entire origin. Modern browsers enforce that a SW can only control pages within its script's directory and below, and the Service-Worker-Allowed response header controls the maximum scope. Audit your upload endpoints and ensure uploaded content cannot be served with a content-type that triggers SW registration.

07Debugging & Observability

Debugging

Seeing Inside the Black Box

Why SW Debugging Is Painful

Your SW runs in a separate thread with its own DevTools session. Breakpoints in the SW do not pause the page. The SW can be killed and restarted by the browser mid-debug. The DevTools "Application" tab shows you the currently registered SW but not the history of what it has served, what cache decisions it made, or why it returned a particular response. You are debugging a stateful proxy with no access log and no audit trail.

Useful DevTools Techniques

"Update on reload" in DevTools Application tab - forces the SW to update on every page reload. Essential for development. But it changes SW lifecycle behaviour significantly: the waiting state is bypassed entirely. Try to avoid benchmarking or testing timing-sensitive update flows with this enabled. Production behaviour will differ.
SW-specific breakpoints - in Sources, set breakpoints directly in sw.js. The SW has its own DevTools instance: click "inspect" next to the service worker in Application → Service Workers. Breakpoints there pause the SW thread independently of the page.
Cache Storage inspector - Application → Cache Storage. Inspect every cached request/response pair including full headers and body. Manually delete entries to test cold-start behaviour. This is the fastest way to verify what is actually in the cache versus what you think is there.
DevTools offline checkbox is not real offline - it blocks page-context network requests but may not block SW background sync or periodic sync fetches in all browsers. For accurate offline testing, use OS-level network disabling or a proxy tool.

Production Logging Strategy

console.log from a SW goes to the SW's DevTools console - invisible in production. The solution is forwarding logs to page clients and from there to your analytics pipeline:

// sw.js - forward structured logs to all controlled clients
async function swLog(level, ...args) {
  const clients = await self.clients.matchAll({ includeUncontrolled: true });
  const message = args.map(String).join(' ');
  clients.forEach((client) =>
    client.postMessage({
      type: 'SW_LOG',
      level,
      message,
      swVersion: SW_VERSION, // injected at build time
      timestamp: Date.now(),
    })
  );
}

// main.jsx - receive and forward to analytics
navigator.serviceWorker.addEventListener('message', (e) => {
  if (e.data?.type === 'SW_LOG') {
    analytics.track('sw_event', {
      level: e.data.level,
      message: e.data.message,
      sw_version: e.data.swVersion,
      timestamp: e.data.timestamp,
    });
  }
});
sw.js + main.jsx

Building a Mobile Debug Overlay

When a bug only reproduces on a specific physical device (as in the OAuth postmortem), the fastest path is a floating overlay rendering console output on-screen. Gate it to dev/staging only via a flag:

// MobileConsole.jsx - gate to dev/staging via env var or URL param
const [logs, setLogs] = useState([]);

useEffect(() => {
  if (!isDebugMode()) return;
  const origLog = console.log;
  const origError = console.error;
  const capture = (prefix) => (...args) => {
    origLog(...args);
    setLogs((prev) => [...prev.slice(-100), `[${prefix}] ` + args.join(' ')]);
  };
  console.log = capture('LOG');
  console.error = capture('ERR');
  return () => { console.log = origLog; console.error = origError; };
}, []);
MobileConsole.jsx

08Cross-Browser Differences

Cross-Browser

Safari Is a Different Planet

Chrome and Firefox have largely converged on SW behaviour. Safari is its own ecosystem, shaped in part by Apple's constraints on the platform. If you are shipping to iOS users, it helps to learn these differences early. The table below reflects behaviour as tested in the versions indicated - browser support changes; treat this as a snapshot.

Feature / Behaviour	Chrome 124	Firefox 126	Safari iOS 17.4
Background Sync API	✓ Full support	No support in stable; historically experimental behind prefs in Nightly	✗ Not supported
Push Notifications	✓ Full	✓ Full	iOS 16.4+ only; requires user gesture; no silent push; installed PWA only
Storage persistence	Honoured after user gesture	Honoured; shows user prompt	Eviction risk regardless; persist() unreliable; threshold varies by iOS version
SW update check	On navigation; modern Chrome revalidates `sw.js` with no-cache by default	On navigation; follows spec update algorithm (engine/version behavior may vary)	Follows spec; iOS-specific throttling in background
Navigation preload	✓ Supported	✓ Supported	✓ Supported (Safari 15.4+) - MDN lists NavigationPreloadManager as supported across all major engines. Earlier versions of this article incorrectly marked it unsupported.
Periodic Background Sync	✓ Chrome / Edge only	✗	✗
Install prompt	beforeinstallprompt event	Firefox for Android only	No API - manual "Add to Home Screen" only
Standalone PWA mode	Full, persistent, native-like	Limited	Works; limited splash customisation; reduced system integration

iOS Safari PWA Quirks - Precisely Documented

Each installed PWA on iOS gets an isolated storage context. Browser - verified iOS 17. Cookies, localStorage, IndexedDB, and SW registrations are not shared between the installed PWA and Safari browser - they are separate storage buckets. However, they share the same origin and network stack. Auth sessions set in Safari are invisible to the PWA. OAuth must complete within the PWA context specifically - test there, not in Safari browser.
iOS PWAs relaunch from start_url on resume. If a user backgrounds the PWA and returns after a while, iOS relaunches from the start_url in the manifest, not from wherever they were. Unsaved state is lost. Deep linking and "continue where you left off" require explicit state persistence.
theme_color in the manifest is ignored on iOS. Status bar styling requires <meta name="apple-mobile-web-app-status-bar-style"> instead. The manifest property works on Chrome Android.
Full-screen video behaviour differs. The native iOS video player takes over in PWA standalone mode but not in Safari browser mode. Custom video controls require special handling.
EU policy churn in iOS 17.4 caused brief PWA uncertainty. As of early 2026, full iOS PWA capabilities (offline, push, install UX) still effectively depend on WebKit PWAs installed through Safari.

09Advanced Patterns

Advanced Patterns

Beyond the Basic Fetch Interceptor

Navigation Preload - The Hidden Performance Win

By default, when a user navigates to a page, the browser must wake the SW before any network request is made for the document. On a cold SW start, this adds 50–500 ms of latency to every navigation, making your PWA feel slower than a regular website. Navigation preload solves this by starting the document fetch in parallel with SW startup:

// Enable navigation preload in activate
// Supported: Chrome, Firefox, Safari 15.4+. Safe to enable universally.
// The optional chaining handles browsers where navigationPreload is undefined.
self.addEventListener('activate', (event) => {
  event.waitUntil(
    self.registration?.navigationPreload?.enable()
  );
});

self.addEventListener('fetch', (event) => {
  if (event.request.mode !== 'navigate') return;

  event.respondWith((async () => {
    // preloadResponse is already in-flight before the SW woke up
    const preload = await event.preloadResponse;
    if (preload) return preload;
    // Fallback: normal network fetch
    return fetch(event.request);
  })());
});
sw.js

Background Sync - Offline Mutations Done Right

Background Sync lets you queue mutations while offline and replay them when connectivity returns - even if the tab is closed. The pattern: user submits a form → SW intercepts → network fails → SW registers a sync tag → browser wakes the SW on reconnect → SW replays the queued request.

What examples omit: you need idempotency keys on your API endpoints to handle re-delivery. You need a max-retry policy and exponential backoff. You need a way to surface replay results to a UI that may not have an open tab (use Push Notifications or Badge API where available). And the API is only reliably available on Chrome/Edge - design a synchronous fallback for Firefox and Safari.

Streaming Responses Through the SW

The SW can construct a streaming Response using ReadableStream, enabling the "app shell + content stream" architecture: serve the first part of a response from cache (the shell HTML) and stream the rest from network. This eliminates the "blank flash" before content appears and produces measurable 100–300 ms improvements in perceived navigation latency on typical connections. It is complex to implement correctly and requires careful handling of stream cancellation.

Workbox - Honest Assessment

	What It Does Well	Where It Falls Short
Workbox	Precaching with manifest generation, strategy abstractions, background sync primitives, Vite/webpack plugin integration, runtime caching with expiry	Abstracts away the SW lifecycle until something breaks - then you need to understand the raw mechanics anyway. Runtime bundle is non-trivial (~30 KB). Opinionated defaults can conflict with custom requirements (e.g., its cache deletion strategy during activate can cause the version mismatch crash described in Failure 005). Generated SW files can be difficult to reason about.

Honest take: Workbox is excellent when you want offline capability without deep SW expertise on day one. It can become a liability when requirements diverge from its defaults, because you end up fighting the abstraction. For critical flows - auth callbacks, payment pages, any route where version mismatch is costly - write those handlers manually regardless of whether you use Workbox elsewhere. Understand the lifecycle before you rely on the abstraction.

10Practical Tips

Practical Tips

Helpful Things to Keep in Mind

Twenty things that often take real time to learn in production. Claims are marked where they distinguish spec from field experience.

Serve sw.js with Cache-Control: no-cache.

If sw.js is HTTP-cached by the browser, your SW updates depend on the cache expiry - not your deploy. Serve it with no-cache always. This is the real lesson behind the "24-hour rule" confusion.

Hash fragments never reach the SW.

/callback#token=abc is intercepted as /callback. If you serve a cached document here, the token is processed by potentially stale JS. Bypass the SW for all auth callback routes with a URL pattern exclusion.

DevTools "Update on reload" changes lifecycle behaviour.

It completely bypasses the waiting state. Production behaviour will be different. Always test the actual update flow in an incognito window with this flag disabled.

Clearing site data is a nuke, not a scalpel.

It removes cache, SW registration, cookies, IDB - everything for the origin. If your bug survives a site data clear, the issue is in your deployed bundle, not in client-side cache.

Use versioned cache names.

caches.open('my-cache') accumulates stale entries across every deploy. Use 'my-cache-v' + BUILD_HASH. In your activate handler, delete caches that do not match the current version string.

Your SW can serve responses after the origin is gone.

If a user installed your PWA and you take the site down, the SW serves the cached version indefinitely. This is a feature if you planned for it; a crisis if you did not.

iOS PWA and Safari share nothing.

Separate storage contexts entirely. Auth sessions, cookies, and IDB data set in Safari are invisible to the installed PWA. OAuth must complete within the PWA context. Test there specifically.

Opaque responses inflate storage quota in Chromium.

Chromium only - Chromium charges ~7 MB of quota per opaque (cross-origin, no CORS) cached response as a side-channel protection, regardless of actual response size. Firefox and Safari handle this differently. Universal advice: never cache opaque responses.

skipWaiting and clients.claim do different things - you need both.

skipWaiting() promotes the waiting SW to active, bypassing the "wait for all clients to close" gate. clients.claim() makes the newly active SW take control of existing open pages. Neither implies the other - skipWaiting without claim activates the SW but leaves old pages still controlled by the old one. Then trigger a page reload via the controllerchange event.

Async work without waitUntil is silently abandoned.

Any async chain in a SW event handler not wrapped in event.waitUntil() may be cut off when the SW goes idle. You will see partial writes, dropped analytics, and incomplete cache updates with no error logged.

Module-level variables reset on termination, not every event.

Spec The SW may handle many events without being killed. But when it is terminated (which can happen at any idle point), the next wake starts clean. Any JS variable you set may or may not survive to the next event handler. Use Cache API or IDB for anything that must persist reliably.

Cache grows forever unless you evict explicitly.

The browser will eventually evict under pressure, but you have no control over when. For runtime caches, set a max-entries limit and implement LRU eviction yourself or use Workbox's expiration plugin.

Network-first without a timeout is worse than no SW.

On a degraded connection, network-first waits for the full network timeout (which can be minutes) before falling back to cache. Always pair network-first with a 2–3 second race-to-cache fallback.

Unregister doesn't stop the current SW session.

registration.unregister() marks the SW for removal, but it stays active for the current page session. To fully clear a broken SW deployment, users must close all tabs for the origin and reopen.

Multiple SW scopes share origin storage.

If you register both /sw.js (scope: /) and /admin/sw.js (scope: /admin/), they share IndexedDB and Cache API storage. Requests to /admin/ are handled by the longest matching scope. Cache key collisions between SWs are real and produce bizarre behaviour.

Stale-while-revalidate is wrong for user-mutated resources.

If you SWR-cache GET /cart and the user just added an item, they will see the stale cart for the next load cycle. Never use SWR for any resource that can be changed by the user's own in-session actions.

You cannot push a fix to users with a broken SW.

If 100k users have a broken SW, you cannot reach them directly. They receive the fix when their browser next fetches a byte-different sw.js. A "kill switch" SW that self-unregisters is the only accelerator - and only works if the broken SW can still fetch and process updates.

Audit your precache manifest on every build.

Any file missing from the manifest is silently fetched from network (or unavailable offline). Any incorrectly included file is re-downloaded on every deploy. Make manifest auditing part of your CI step.

Two SW versions coexist during updates - design for it.

Between install and all-clients-closed, old and new SWs coexist and share IndexedDB and Cache API. Any IndexedDB schema migration in a new SW version must be backwards-compatible with what the old SW expects. Version your IDB schemas explicitly.

The operational overhead is real. Budget for it.

In many projects, shipping SW without planning for versioning, updates, debugging, and cross-browser differences can multiply the initial implementation effort in production incident time. The technology is powerful. Treat it as production infrastructure from day one.

✓ A Principle That Helps Prevent Many Failures

Treat your service worker as production infrastructure, not just a frontend feature. It benefits from versioning, deployment automation, monitoring, a documented rollback strategy, and at least one person who understands its lifecycle. In my experience, reliable PWAs usually come from clearer operational processes around the SW lifecycle.