
Not long after I began this project I began placing music files on my sites. I quickly noted an overwhelming number of hits were coming from a site named dizzler.com and more recently deezer.com. I expect others will follow as the dishonesty of those entrepreneurs continues. Both of these sites, and others, steal bandwidth by scouring the internet using either a standard search engine like google or yahoo to find direct links to your mp3 files or a search bot. Using simple search values like "?intitle:index.of? mp3" or "?intitle:index.of? flv ogg mp3 wma" one can eliminage the need to even use a search bot. Having discovered a direct link to your music files they then "sell" the link using advertising as the vehicle to the internet at large. The free music provider then incurrs little cost, being able to run their service using a very simple site that sells your music & bandwidth via advertising.
I've read numerous articles on this topic and what works for me may not work for you. But this is what I do. I use an apache2 .htaccess file rather than attempting to fiddle with virtual host configurations. The biggest reason is that apache code that works for one music directory will usually work for all music directories without any modification allowing me to symlink and therefore duplicate a single htaccess file of any name to all my music directories.
Site Root .htaccess:
In my site root I globally protect my xspf.settings with code like below. While the <files xspf.settings.php> is not a necessity, sometimes I fiddle with my server and I don't want to rely strictly on PHP to prevent a user from downloading my settings file if my php configuration gets messed up and apache2 mistakenly serves it as a text file. To add more files, duplicate the entire <files ...></files> block for each individual file or use <FilesMatch regex> ... </FilesMatch> block instead. You should includeOptions +FollowSymLinksas the first line in your .htaccess to be sure the next part works correctly.
Options +FollowSymLinks
<Files "xspf.settings.php">
Deny from all
</Files>One could also use a FilesMatch like below to block access to any files with names ending in config.php, settings.php or any .inc file, three file types and names commonly used in php configurations. Note that this will probably be useless in your music directory since typically you wouldn't keep music in the same directory as your xspf/php programs. Placing this code in your domain root will prevent any access to files with these name endings SITE WIDE, so be aware and test your site after adding this to your domain root. This kind of block can be very helpful if you are adventurous enough to compile PHP or attempt to fiddle with fastcgi or fcgid. It prevents apache from serving your config files even if PHP is improperly configured.
<FilesMatch "config\.php$|settings\.php$|\.inc$">
Deny from All
</FilesMatch>Then I list known user agents that scan for files or search for security liabilities in existing software. The lines below basically state that any user agent that begins with the text following the "^" using a nocase compare get dropped summarily by apache. Notice I do not include Wget in the list. I generally accept the use of Wget for grabbing web pages and code snippets so I don't think it useful to attemtp to block it sitewide. Any savvy Linux user can read the manual page and discover the use of the "-U user-agent" command argument. What we're after here are the known bots that clearly identify themselves. If you see a new bot it's easy to add it here. If things get more complicated it may be necessary to reverse the logic and use a "Deny all user agents not in the list" philosophy. But for me it hasn't gotten to that point and I doubt it will. You can copy the text below verbatim to your .htaccess.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/5\.803 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib/2.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Morfeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]NOTE: Some legitimate site's use perl, python, php or other languages to download pages from your site without specifically identifying themselves with a user agent string. The validator at xspf.org is one of these services. When you request validation of xspf.php from this site it requests the playlist using "Python-urllib/2.5" as it's user agent. If you get odd errors with sites like validator.w3.org, validator.xspf.org etc you may need to comment the lines in red above and possibly others.
Next I add a list of known music services and undesirable search bots to the list. The SetEnvIfNoCase statement is for Apache2, it won't work on Apache 1.x. This statement sets an environment variable if the referrer (the domain name of the site requesting the page or file) matches the text or IP that follows or the Request contains certain words like "passwd" or "w00tw00t.DFind". I name my environment variable nomedia. I could just use:
SetEnvIfNoCase Referer dizzler.com nomediabut the
=yesmakes it more clear. You can also copy the text below verbatim to your .htaccess file. Be aware .htacccess is evaluated top to bottom so the order may be important. This weeds out the obvious bots, attempts to hack passwd and playlist theives and for me the undesirable w00tw00t dfind bot.
SetEnvIfNoCase Referer dizzler.com nomedia=yes
SetEnvIfNoCase Referer pandora.com nomedia=yes
SetEnvIfNoCase Referer grooveshark.com nomedia=yes
SetEnvIfNoCase Referer deezer.com nomedia=yes
SetEnvIfNoCase Referer playlist.com nomedia=yes
SetEnvIfNoCase Referer 66.232.150.219 nomedia=yes
SetEnvIfNoCase Referer 65.49.37.165 nomedia=yes
SetEnvIfNoCase Referer infobox.ru nomedia=yes
SetEnvIfNoCase Request_URI "passwd$" nomedia=yes
SetEnvIfNoCase Request_URI "passwd%00$" nomedia=yes
SetEnvIfNoCase Request_URI "^/w00tw00t" nomedia=yes
SetEnvIfNoCase Request_URI "DFind\:\)$" nomedia=yes
Order allow,deny
allow from all
deny from env=nomediaThat's it for my Site Root as far as basic protection goes.
Music .htaccess
Because I have many music directories I want to protect and they all need the same protection I create a master .htaccess file named "music.htaccess". I create this file outside my site root. On a hosted site this would be in your FTP root. On a standard debian or ubuntu server install I maintain a directory tree where I keep photos, some music, some of my porn collection and other personal things I want to easily deny access from the internet. I call it "wdata", you are free to name it whatever you want. This part assumes you're using /var/wdata as the location for music.htaccess.
Contents of music.htaccess
First order of business in music htaccess is to deny unadulterated wget access. Note the missing [OR] at the end. This statement says to deny any requests from the user agent "wget" in any case (Wget, wGet, wgEt etc.)
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC]
RewriteRule ^.* - [F,L]Next I add a more aggressive search for the music theives. The "$" on the end checks to see if the referrer domain ends in the nasty theives domain name, regardless of the hostname or any other part of the referer string. So, in the example below a request from *.deezer.com will set the 'no' environment variable and result in denial of access. This is by no means a complete list so you'll probably add others as you scan your log files for nasties.
You might be wondering why I include these nasties twice, once in my root .htacces and again here. The root .htaccess prevents them from using a "no hostname" url to browse my site. If there is a user on www.dizzler.com they can successfully load my pages but NOT my music. The statements below prevent any music file access from the following domains. This is somewhat harsh but I strongly disagree with their "Capitalistic" take whatever you can get away with attitude and since It's my choice to deny them access to my music, I do. If I don't they will add my tracks to their playlists and mislead users into believing they own the music they present in their lists. Dizzler and Deezer are a LIE, plain and simple; apparently what capitalistic business in America is all about these days. Steal other people's stuff and sell it any way you can. Makes me disgusted to live in the USA.
SetEnvIfNoCase Referer dizzler.com$ theft
SetEnvIfNoCase Referer grooveshark.com$ theft
SetEnvIfNoCase Referer ez-tracks.com$ theft
SetEnvIfNoCase Referer widgetbox.com$ theft
SetEnvIfNoCase Referer pandora.com$ theft
SetEnvIfNoCase Referer deezer.com$ theft
SetEnvIfNoCase Referer playlist.com$ theft
SetEnvIfNoCase Referer infobox.ru$ theftThe last section is very important. This states that the default server priority is to assume that every file is available to all users and all ip addresses that request it. It then states allow from all requesters with the exception of those that triggered the above referer test (checking if the environment variable "theft" is set or exists).
Order allow,deny
allow from all
deny from env=theftTesting your .htaccess
Now to test access we'll use wget. You can download Wget for windows here: http://gnuwin32.sourceforge.net/packages/wget.htm Wget comes with nearly every linux install so no need if you are running a linux desktop. Using Wget we can present a forbidden user agent using the "wget -U user-agent.." syntax..
wget -U libwww-perl/5.803 http://www.trbailey.net
--2009-04-10 18:33:18-- http://www.trbailey.net/
Resolving www.trbailey.net... 216.99.209.41
Connecting to www.trbailey.net|216.99.209.41|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2009-04-10 18:33:18 ERROR 403: Forbidden.
We can also present a forbidden referrer:
wget http://www.trbailey.net/ --referer dizzler.com
--2009-04-10 18:34:33-- http://www.trbailey.net/
Resolving www.trbailey.net... 216.99.209.41
Connecting to www.trbailey.net|216.99.209.41|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2009-04-10 18:34:33 ERROR 403: Forbidden.And testing a files directory yields:
wget http://www.trbailey.net/xspf/70s/000-Rolling%20Stones%20-%20Gimme%20Shelter.mp3
--2009-04-15 16:42:47-- http://www.trbailey.net/xspf/70s/000-Rolling%20Stones%20-%20Gimme%20Shelter.mp3
Resolving www.trbailey.net... 216.99.209.41
Connecting to www.trbailey.net|216.99.209.41|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2009-04-15 16:42:47 ERROR 403: Forbidden.Finally, presenting a forbidden referer, I'm using www.grooveshark.com but it could be any of the forbidden domains listed::
wget http://www.trbailey.net/xspf/70s/000-Rolling%20Stones%20-%20Gimme%20Shelter.mp3 --referer www.grooveshark.com
--2009-04-15 16:50:09-- http://www.trbailey.net/xspf/70s/000-Rolling%20Stones%20-%20Gimme%20Shelter.mp3
Resolving www.trbailey.net... 216.99.209.41
Connecting to www.trbailey.net|216.99.209.41|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2009-04-15 16:50:09 ERROR 403: Forbidden.If you see an error 500 you've copied or added something incorrectly, double check your .htaccess file.
Enjoy the music!
References:
Apache2 Documentation at apache.org
Blocking Offsite Browsers
Perishable Press ultimate Blocklist