Host-Based Ad Blocking

To block ads on the web, we need to catch requests to a particular ad server and send them to our local server. We do this by adding an entry to the hosts file. The hosts file on a computer system gives the system information about where to find a computer on the network by mapping hostnames to IP addresses. For our purposes, we will be taking advantage of this by routing requests to ad servers from webpages to our personal system.

These steps will require that you have Apache HTTP server (or IIS with ISAPI_rewrite, something that can rewrite URLs) running on your local machine (or on a system with low latency, possibly on your network).

First, we will need to edit the hosts file, and add an entry in the form “127.0.0.1 ad.example.com.” If you are running Windows, examples should already be present in the hosts file. On Windows systems, this file can be found in %WINDOWS%\system32\drivers\etc\hosts, on Linux systems in /etc/hosts, and in OSX 10.2 and later in /private/etc/hosts.

So how do we get the hostname of an ad server? We can do this several ways, but you have to work for it just a little bit (I know, AdBlock makes it so easy). Since I am a web developer, I usually have the Firebug extension running, so I just click “Inspect” and the element is highlighted in the source code, usually with the URL of the server right there for me.

If you don’t have or don’t need Firebug, you can go the “view source” route. I have found the easiest way using this method is to search for some adjacent text you see in the rendered page, and look around for included Javascript files or hotlinked images. Most of the time the ad is easy to see. For included Javascript files that build the ad image, just block the whole server, and they won’t be able to render the ad, making the image server irrelevant. Often, ad servers will use several host names, requiring multiple entries in the hosts file.

If you’re using Firefox or Internet Explorer, you can usually right-click an image and select Properties to get information about an image’s location. Chrome seems to not want to give you that information easily, and selecting “Inspect Element” usually gives me mixed results, and a Webkit inspector that is surprisingly shoddy and works about three quarters of the time for me.

Once we’ve created the entry in the hosts file, we can test by closing all browser windows to force a reload of the file, then opening the ad server URL. If it goes to our local server instead of the ad server, we’ve successfully blocked the ad server across the entire operating system. Since we’re handling a bunch of HTTP requests anyway, why not do something with them?

I created a page that shows the word “Blocked” and the hostname instead of the Apache 404 error page, but I don’t want it displayed as my 404 URL for everything. I tried this, and it was problematic. I forget why.

To get our page handling all these misdirected requests, we’ll use some basic URL rewriting. You may need to adjust the example to suit your development environment, as this will take over all requests to your local server. This is the .htaccess file I have on my development machine’s DocumentRoot:

RewriteEngine on

RewriteCond %{HTTP_HOST} !^localhost* [NC]
RewriteRule (.+) index.php [L]

The RewriteCond takes any request that is not for “localhost” and sends it to the RewriteRule, which directs all requests to index.php:

<?php
// get information from the requested url
$path = $_SERVER[‘REQUEST_URI’];
$host = $_SERVER[‘SERVER_NAME’];

// in case short_tags=On in php.ini
echo ‘<’ . ‘?xml version=“1.0″ encoding=“UTF-8″?’ . ‘>’;
?>

&lt;!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml11.dtd”&gt;

&lt;html xmlns=“http://www.w3.org/1999/xhtml” xml:lang=“en”&gt;
&lt;head&gt;
&lt;style type=“text/css”&gt;
body {
font-family: sans-serif;
background-color: #e1eaf3;
color: #9fb1c3;
margin: 10px;
}
h1 {
font-size: 12px;
margin: 0;
}
p {
font-size: 10px;
}
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Blocked&lt;/h1&gt;
&lt;p&gt;&lt;?php echo $host; ?&gt;
&lt;/body&gt;
&lt;/html&gt;

I styled my block page in a nice light blue, so it shows up unobtrusively in ad-blocked pages. At the same time, you can easily see which items have been blocked. Since it replaces content that would otherwise be served from elsewhere, it doesn’t break page layout.

The advantage to blocking ads this way is that since I have my hosts file in a list of shortcuts, I can open it up, add an entry for an ad server, and close it. Since it is then blocked at the operating system level, any browser run from that system will respect the block.

That’s it! Happy ad-free surfing!

Tags: , , , ,

5 Responses to “Host-Based Ad Blocking”

  1. [...] Host-Based Ad Blocking Jan 23 [...]

    Pixelbath » Blog Archive » Block Windows Live Messenger Ads
  2. That’s a lot of works….It works, but it really takes a long, long time to catch all ads on the web manually, one by one depending on the ad server. An when the ad is located on the same server as the web site, this is impossible to block, or you block the entire web site. I have tried a few methods to block ads, and I have settled with AdSweep, it detects ads based on particularities and it does a pretty good job blocking ads on most web sites.

    Andrei Prokorov
  3. Andrei: True enough that it’s a lot of work, but there are maybe 10 ad servers that serve the majority of sites I visit. Obviously this method isn’t for everyone, and software like AdSweep is good if you don’t want to fiddle with your ad blocking, but this setup grew from the original host-based blocking I had been using for years. I just decided to make it serve content where there were formerly ads.

    Not to mention there are other clever (or stupid) uses for the method I’m describing.

    pixelbath
  4. Why not just use AdBlockPlus ?

    Rohit Arondekar
  5. Rohit: Can you do this with AdBlockPlus? The advantage in this setup is that the ads are not just blocked. I can apply any and all manner of stupid RewriteRule tricks on any specific ad, domain, or image request coming from my PC. If the issue is how easily I can get to my hosts file, it’s one of the “favorite” files in UltraEdit, so it’s literally three clicks away.

    Besides, asking that is like asking, “Why climb Everest? Why not just watch Everest: Beyond the Limit?” The answer to that, of course, is I prefer to do things my way, ’cause I’m a rebel who doesn’t play by anybody else’s rules, not even my own.

    pixelbath

Talk back