Posts Tagged ‘apache’

Host-Based Ad Blocking

To block ads on the web, we need to catch requests to a particular ad server and send them to our local server. We do this by adding an entry to the hosts file. The hosts file on a computer system gives the system information about where to find a computer on the network by mapping hostnames to IP addresses. For our purposes, we will be taking advantage of this by routing requests to ad servers from webpages to our personal system.

These steps will require that you have Apache HTTP server (or IIS with ISAPI_rewrite, something that can rewrite URLs) running on your local machine (or on a system with low latency, possibly on your network).

First, we will need to edit the hosts file, and add an entry in the form “127.0.0.1 ad.example.com.” If you are running Windows, examples should already be present in the hosts file. On Windows systems, this file can be found in %WINDOWS%\system32\drivers\etc\hosts, on Linux systems in /etc/hosts, and in OSX 10.2 and later in /private/etc/hosts.

So how do we get the hostname of an ad server? We can do this several ways, but you have to work for it just a little bit (I know, AdBlock makes it so easy). Since I am a web developer, I usually have the Firebug extension running, so I just click “Inspect” and the element is highlighted in the source code, usually with the URL of the server right there for me.

If you don’t have or don’t need Firebug, you can go the “view source” route. I have found the easiest way using this method is to search for some adjacent text you see in the rendered page, and look around for included Javascript files or hotlinked images. Most of the time the ad is easy to see. For included Javascript files that build the ad image, just block the whole server, and they won’t be able to render the ad, making the image server irrelevant. Often, ad servers will use several host names, requiring multiple entries in the hosts file.

If you’re using Firefox or Internet Explorer, you can usually right-click an image and select Properties to get information about an image’s location. Chrome seems to not want to give you that information easily, and selecting “Inspect Element” usually gives me mixed results, and a Webkit inspector that is surprisingly shoddy and works about three quarters of the time for me.

Once we’ve created the entry in the hosts file, we can test by closing all browser windows to force a reload of the file, then opening the ad server URL. If it goes to our local server instead of the ad server, we’ve successfully blocked the ad server across the entire operating system. Since we’re handling a bunch of HTTP requests anyway, why not do something with them?

I created a page that shows the word “Blocked” and the hostname instead of the Apache 404 error page, but I don’t want it displayed as my 404 URL for everything. I tried this, and it was problematic. I forget why.

To get our page handling all these misdirected requests, we’ll use some basic URL rewriting. You may need to adjust the example to suit your development environment, as this will take over all requests to your local server. This is the .htaccess file I have on my development machine’s DocumentRoot:

RewriteEngine on

RewriteCond %{HTTP_HOST} !^localhost* [NC]
RewriteRule (.+) index.php [L]

The RewriteCond takes any request that is not for “localhost” and sends it to the RewriteRule, which directs all requests to index.php:

<?php
// get information from the requested url
$path = $_SERVER[‘REQUEST_URI’];
$host = $_SERVER[‘SERVER_NAME’];

// in case short_tags=On in php.ini
echo ‘<’ . ‘?xml version=“1.0″ encoding=“UTF-8″?’ . ‘>’;
?>

&lt;!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml11.dtd”&gt;

&lt;html xmlns=“http://www.w3.org/1999/xhtml” xml:lang=“en”&gt;
&lt;head&gt;
&lt;style type=“text/css”&gt;
body {
font-family: sans-serif;
background-color: #e1eaf3;
color: #9fb1c3;
margin: 10px;
}
h1 {
font-size: 12px;
margin: 0;
}
p {
font-size: 10px;
}
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Blocked&lt;/h1&gt;
&lt;p&gt;&lt;?php echo $host; ?&gt;
&lt;/body&gt;
&lt;/html&gt;

I styled my block page in a nice light blue, so it shows up unobtrusively in ad-blocked pages. At the same time, you can easily see which items have been blocked. Since it replaces content that would otherwise be served from elsewhere, it doesn’t break page layout.

The advantage to blocking ads this way is that since I have my hosts file in a list of shortcuts, I can open it up, add an entry for an ad server, and close it. Since it is then blocked at the operating system level, any browser run from that system will respect the block.

That’s it! Happy ad-free surfing!

.htaccess Snippets

Here are some .htaccess snippets I’ve had to use, and if you run your own site, blog, or some other third thing, you might find them useful.

Moved from one URL to another: My old blog url used to be verbose.pixelbath.com, and before that was pixelbath.com/verbose. Setting aside the notion that this blog moves around too much, the following snippet…

RewriteEngine on
RewriteCond %{HTTP_HOST} ^verbose [NC]
RewriteRule ^(.*)$ http://www.pixelbath.com/blog/ [R=301,L]

…redirects any host name containing ‘verbose’ will be redirected to the main blog URL. Useful because many sites had me linked to the old blog, and I didn’t want to break their links too badly. Nothing fancy, so please note that this does not transfer url parameters. It only redirects requests with the single word ‘verbose’ to the main blog URL.

Using a single file to handle all URL requests: If you’ve used almost any PHP CMS or MVC framework such as CodeIgniter or CakePHP, you’ve probably used something like this for “search friendly URLs”:

RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.+) /blog/index.php [L]

What this does is start from the /blog/ folder, and handle any requests under that. The first RewriteCond sets our rule to not apply to any physical files matching the request, and the second does the same for physical directories.

Once it passes the two conditions (a request in /blog/ that is not a physical file or directory), it goes to the RewriteRule, which simply takes all matching requests and redirect them internally to /blog/index.php. This is not a browser redirect, so the user will still see the URL the way they found it, something like http://example.com/blog/archive/foo. I use this technique on the comics pages by parsing the URL segments into comic and page requests.

Stop image and/or content hotlinking: Some netizens are either not savvy with the way the Internet works, or don’t give a crap because idiocy prefers the low-hanging fruit. Either way, I’ve actually got a few snippets for this purpose.

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?myspace.com [NC,OR]
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?blogspot.com [NC]
RewriteRule ^.*$ http://www.yourdomain.com/ [R,L]

The preceding snippet will block specific websites and their subdomains from hotlinking from your site, but will allow any other site not specified in your .htaccess file to do so. If you’d prefer to stick another image in place of the hotlinked one, this is particularly adviseable when you, (often with amusing results):

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?yourdomain\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /images/goatse.jpg [L]

This one will take any request with a referer not originating from your domain, or blank referers (because some users do legitimately blank their referer string), and redirect them to an image elsewhere on your site. This will work “inline” and display whichever image you specify on outside sites.

If you’d prefer to be plain and simple though, you can just set HTTP code 403 (Forbidden) on any image for any of the rewrites in this section. Simply replace the RewriteRule of each with:

RewriteRule .*\.(jpe?g|gif|bmp|png)$ - [F]

Which simply sets any request for any image to 403 (Forbidden). Obviously, it should be used in conjunction with RewriteConds.