How to Detect Google AdSense on a Website with Python
Detecting whether a website is running Google AdSense is a common task for digital marketers, SEO researchers, and competitive analysts. From a technical perspective, AdSense works by injecting a specific JavaScript library into the page, usually accompanied by a unique "Publisher ID" (formatted as pub-xxxxxxxxxxxxxxxx).
In Python, we can identify these markers by "scraping" the HTML and searching for the signature AdSense scripts.
π΅οΈ How to Identify AdSense Markersβ
There are three primary "fingerprints" an AdSense-enabled site leaves behind:
- The Script Tag: Looking for
adsbygoogle.jsorpagead2.googlesyndication.com. - The Publisher ID: Searching for the regex pattern
pub-[0-9]+. - The
ads.txtFile: A public file located atdomain.com/ads.txtthat lists authorized digital sellers.
π» The Implementationβ
This script uses requests to fetch the page and BeautifulSoup to parse the HTML. I have also added a check for the ads.txt file, which is the most "official" way to verify AdSense.
# π Python Script: Google AdSense Detector
### 1. Requirements
```bash
pip install requests beautifulsoup4
2. The Codeβ
import requests
from bs4 import BeautifulSoup
import re
def check_adsense(url):
# Ensure the URL starts with http
if not url.startswith('http'):
url = 'https://' + url
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebkit/537.36'}
try:
# 1. Check the Homepage HTML
response = requests.get(url, headers=headers, timeout=10)
soup = BeautifulSoup(response.text, 'html.parser')
html_content = response.text.lower()
# Look for the AdSense script signatures
signatures = [
'googlesyndication.com',
'adsbygoogle.js',
'pagead2'
]
has_script = any(sig in html_content for sig in signatures)
# 2. Look for Publisher ID (pub-xxxxxxxx)
pub_id_match = re.search(r'pub-\d+', html_content)
pub_id = pub_id_match.group(0) if pub_id_match else "Not Found"
# 3. Check for ads.txt (The most reliable indicator)
domain = url.split("//")[-1].split("/")[0]
ads_txt_url = f"https://{domain}/ads.txt"
ads_txt_response = requests.get(ads_txt_url, headers=headers, timeout=5)
has_ads_txt = "google.com, pub-" in ads_txt_response.text.lower() if ads_txt_response.status_code == 200 else False
# --- Report Results ---
print(f"π Results for: {url}")
print(f"β
AdSense Script Found: {has_script}")
print(f"π Publisher ID: {pub_id}")
print(f"π Valid ads.txt Entry: {has_ads_txt}")
if has_script or has_ads_txt:
print("\nπ Verdict: This website is likely running AdSense.")
else:
print("\nπ Verdict: No AdSense detected.")
except Exception as e:
print(f"β Error checking {url}: {e}")
# --- Test It ---
check_adsense("[https://www.example.com](https://www.example.com)")
βοΈ Detection Methods Comparisonβ
| Method | Reliability | Why it works |
|---|---|---|
| HTML Script Scan | Medium | Quickest way, but can be blocked by certain "lazy loading" setups. |
| Regex Pub-ID Search | High | Almost all AdSense implementations require the pub- string to be present. |
| ads.txt Verification | Highest | This is a security standard. If Google is authorized to sell ads, it must be here. |
π Sources & Technical Refsβ
- [1.1] Google AdSense Help: How to find your Publisher ID - Understanding the ID format.
- [2.1] IAB Tech Lab: ads.txt Specification - The technical standard for the
ads.txtfile. - [3.1] BeautifulSoup Docs: Searching the tree - Efficient ways to find tags and attributes.
π Pro Tip: Handling Anti-Bot Protectionβ
Many large sites use Cloudflare or other "WAF" (Web Application Firewalls) that block Python's default requests library. If you find your script getting a 403 Forbidden error, you may need to use Selenium or Playwright to simulate a real Chrome browser.
