Skip to main content

How to Detect Google AdSense on a Website with Python

Β· 5 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

Detecting whether a website is running Google AdSense is a common task for digital marketers, SEO researchers, and competitive analysts. From a technical perspective, AdSense works by injecting a specific JavaScript library into the page, usually accompanied by a unique "Publisher ID" (formatted as pub-xxxxxxxxxxxxxxxx).

In Python, we can identify these markers by "scraping" the HTML and searching for the signature AdSense scripts.


πŸ•΅οΈ How to Identify AdSense Markers​

There are three primary "fingerprints" an AdSense-enabled site leaves behind:

  1. The Script Tag: Looking for adsbygoogle.js or pagead2.googlesyndication.com.
  2. The Publisher ID: Searching for the regex pattern pub-[0-9]+.
  3. The ads.txt File: A public file located at domain.com/ads.txt that lists authorized digital sellers.

πŸ’» The Implementation​

This script uses requests to fetch the page and BeautifulSoup to parse the HTML. I have also added a check for the ads.txt file, which is the most "official" way to verify AdSense.

# πŸ” Python Script: Google AdSense Detector

### 1. Requirements
```bash
pip install requests beautifulsoup4

2. The Code​

import requests
from bs4 import BeautifulSoup
import re

def check_adsense(url):
# Ensure the URL starts with http
if not url.startswith('http'):
url = 'https://' + url

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebkit/537.36'}

try:
# 1. Check the Homepage HTML
response = requests.get(url, headers=headers, timeout=10)
soup = BeautifulSoup(response.text, 'html.parser')
html_content = response.text.lower()

# Look for the AdSense script signatures
signatures = [
'googlesyndication.com',
'adsbygoogle.js',
'pagead2'
]

has_script = any(sig in html_content for sig in signatures)

# 2. Look for Publisher ID (pub-xxxxxxxx)
pub_id_match = re.search(r'pub-\d+', html_content)
pub_id = pub_id_match.group(0) if pub_id_match else "Not Found"

# 3. Check for ads.txt (The most reliable indicator)
domain = url.split("//")[-1].split("/")[0]
ads_txt_url = f"https://{domain}/ads.txt"
ads_txt_response = requests.get(ads_txt_url, headers=headers, timeout=5)

has_ads_txt = "google.com, pub-" in ads_txt_response.text.lower() if ads_txt_response.status_code == 200 else False

# --- Report Results ---
print(f"πŸ“Š Results for: {url}")
print(f"βœ… AdSense Script Found: {has_script}")
print(f"πŸ†” Publisher ID: {pub_id}")
print(f"πŸ“„ Valid ads.txt Entry: {has_ads_txt}")

if has_script or has_ads_txt:
print("\nπŸš€ Verdict: This website is likely running AdSense.")
else:
print("\nπŸ“ Verdict: No AdSense detected.")

except Exception as e:
print(f"❌ Error checking {url}: {e}")

# --- Test It ---
check_adsense("[https://www.example.com](https://www.example.com)")


βš–οΈ Detection Methods Comparison​

MethodReliabilityWhy it works
HTML Script ScanMediumQuickest way, but can be blocked by certain "lazy loading" setups.
Regex Pub-ID SearchHighAlmost all AdSense implementations require the pub- string to be present.
ads.txt VerificationHighestThis is a security standard. If Google is authorized to sell ads, it must be here.

πŸ“š Sources & Technical Refs​


πŸ“‹ Pro Tip: Handling Anti-Bot Protection​

Many large sites use Cloudflare or other "WAF" (Web Application Firewalls) that block Python's default requests library. If you find your script getting a 403 Forbidden error, you may need to use Selenium or Playwright to simulate a real Chrome browser.

Related articles