A Simple Lesson from a Web Scrapper Project

I had a web scraper project.

It checked around 80 websites.
The goal was straightforward: fetch the page HTML and extract specific data.

Getting the page source is easy:

In Chrome: right click → View Page Source
In Chrome DevTools → Network tab, the first request shows the raw HTML response
In Python: requests.get(url).text gets you the server response

Most sites worked fine.

But a few returned no data.

I check the DevTools > Inspect on those pages.
The HTML was there.
The data was there.

Still, my scraper found nothing.

That is when it clicked.

Page Source and Inspect are not the same thing.

Page Source: The raw HTML returned by the server when your browser or code requests a page.
Inspect / DevTools DOM: The live DOM after the browser executes all JavaScript, updates the HTML, and renders the page.

Many modern websites use client-side rendering with JavaScript frameworks like React, Vue, or Angular.
The server often sends an almost empty HTML shell, and JavaScript fills the page later by calling APIs.

Those few websites were loading data with JavaScript.
The server response was almost without valuable data.
The browser filled it later (they call it CSR or client side rendering!)

Why CSR?

It has a faster user experience. The browser downloads once, then dynamically renders content. However SSR needs to download the whole page again from server

Once I understood this, I found the fix quickly.
I ran Selenium.

The page rendered like a real browser.
The data appeared.
The scraper worked.

Lesson learned.

If the data exists only in Inspect,
requests.get() will never see it.

Why CSR?

Related Posts

Making AI Plant Identification Work for Users

Start searching

No results found