Leverage your .htaccess file efficiently

- Wednesday, March 16, 2011

The .htaccess file is a configuration file which is used on Apache based web servers to control many features of the server. The .htaccess file gives webmasters a lot of control over their website. The file contains the configuration statements (web server configurations) to customize the Apache Web server software for the website.

Leverage your .htaccess file efficiently

Hypertext Access, commonly shortened to .htaccess file is an incredibly useful feature which allows webmasters to control so many aspects of their website and how it works. The file acts as a local configuration file so that an individual website can customize the configuration to match its requirements. Other .htaccess files in subdirectories may change or nullify the effects of those in parent directories.

Simple and useful guide about .htaccess file with examples...

You can use any text editor to create or make changes to .htaccess files. Be sure .htaccess file is uploaded in ASCII mode, not BINARY, or it won't work. A web master should be very careful while editing the .htaccess files, as even a small mistake can make the website stop working. Web master should immediately test the site to be sure it works.

Below are the most useful .htaccess examples:

Redirect from Non-WWW to WWW version

There is potential for URL Confusion and splitting of Pagerank and Link Value by allowing spidering and indexing of non-www versions of webpages. Recommend 301 redirecting the non-www requests to their equivalent www version.

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com[nc]
RewriteRule ^(.*)$ http://www.example.com/$1 [r=301,nc]

301 Redirect File and Folder

You can use .htaccess file to redirect any request for a specific URL to a new URL. Or you can also redirect the entire folder to a new folder location.

//301 Redirect Old File
Redirect 301 <local-path>/OLD-URL.html <local-path>/NEW-URL.html

//301 Redirect Entire Directory
RedirectMatch 301 /old-folder/(.*) /new-folder//$1

Custom Error Pages

Custom Error Page allows you to have your own custom personal error pages instead of using the web hosting provider’s error pages or having no page. This makes a website look very much professional in the unlikely event of an error. Web master can also write scripts to notify him if there is an error.
Example I use a PHP script on to automatically e-mail me when a page is not found.

List of HTTP Response Status Codes. These are some of the most common errors:

  • 400 - Bad request
  • 401 - Authorization Required
  • 403 - Forbidden
  • 404 - Not Found
  • 500 - Internal Server Error
//Custom 400 errors
ErrorDocument 400 <local-path>/error-400.html

//Custom 401 errors
ErrorDocument 401 <local-path>/error-401.html

//Custom 403 errors
ErrorDocument 403 <local-path>/error-403.html

//Custom 404 errors
ErrorDocument 404 <local-path>/error-404.html

//Custom 500 errors
ErrorDocument 500 <local-path>/error-500.html

Preventing directory listing

Preventing directory listing

Stop A Directory Index From Being Shown. Typically servers are setup to prevent directory listing, but often they aren't. If there is a directory full of downloads or images that a web master doesn't want visitors to be able to browse through, add the following line to the .htaccess file.

//Prevent directory listings
Options All -Indexes

//Prevent viewing of .htaccess file
<Files .htaccess>
order allow,deny
deny from all
</Files>

Alternative Index Files

Webmaster may not always want to use index.html or index.php as the index file for a directory, for example if you are using PHP files in your site, you may want index.php to be the index file for a directory. You are not limited to 'index' files though.

Alternate index files are entered in a list. The server will work from left to right, checking to see if each file exists, if none of them exist it will display a directory listing, unless it have been turned this off.

//Change default directory page
DirectoryIndex home.html default.html main.htm

Caching Schema

Making static content cacheable avoids frequent requests for it and improves website performance. Specify expiration of at least one week for static resources.

  • 1 Mins = 60
  • 1 Hour = 3600
  • 1 Day = 86400
  • 1 Week = 36288000
  • 1 Month = 160704000
//Caching Schema 
# One month for css and js
<filesMatch ".(css|js|txt|xml)$">
Header set Cache-Control "max-age=2628000, public"
</filesMatch>

# One year for image files
<filesMatch ".(jpg|jpeg|png|gif|ico|woff|woff2|webp|svg)$">
Header set Cache-Control "max-age=31536000, public"
</filesMatch>

Block IP Address

You can block bad visitors by IP address or blocks of IP addresses using a .htaccess file.

//Block users by IP
order allow,deny
deny from 192.168.1.1
deny from 192.168.2.1
allow from 8.8.8.8
allow from 192.168.3.1

Block Bad Bots

Many of the requests made to your website and its content come from bots and other forms of automation. This rise in automated often malicious traffic leads to cost and bandwidth limits. Bots are tailored to target very specific elements of a website, but can affect more than just stolen content, spammed forms, or account logins.

//Block bad bots
RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

Regular Expressions

  • Patterns ("wildcards") are matched against a string
  • Normal alphanumeric characters are treated as normal
  • Special characters:
    • . (full stop) - match any character
    • * (asterix) - match zero or more of the previous symbol
    • + (plus) - match one or more of the previous symbol
    • ? (question) - match zero or one of the previous symbol
    • \? (backslash-something) - match special characters
    • ^ (caret) - match the start of a string
    • $ (dollar) - match the end of a string
    • [set] - match any one of the symbols inside the square braces.
    • (pattern) - grouping, remember what the pattern matched as a special variable
  • Examples:
    • a+ matches "a", "aaaa", "aaaaaaaaaaaa", but not "bbb"
    • [ab]+ matches, "a", "b", or any length combination of the two
    • \.s?html? matches ".htm", ".shtm", ".html" or ".shtml"
    • (.+)/1999/(.+) matches "subject/1999/psy1011/", and also stores "subject" in $1 and "psy1011/" in $2.
  • Regular expressions are very extensive.
    • Documentation on silas: man regex
    • Friedl (1997). Mastering Regular Expressions. O'Reilly.

A .htaccess is one of the most useful files a webmaster can use. There are a wide variety of different uses for it which can save time and increase security on your website.

Note for Windows users
Notice that the file name starts with a period (.). This is to indicate to the Linux/Unix operating system that it is a system file that is used by a server application, not by a user of the system. However, with Windows, the period denotes a separator between a file's name and its extension. As a result, if you try and create an .htaccess file in Windows it won't have a name. To get around this, create a file called htaccess.txt, ftp that to the server, and then rename it to .htaccess once it's there.

Joydeep Deb

Marketing . Innovation . Technology

Share:

About the Author
Joydeep Deb

Joydeep Deb is a Senior Digital Marketer and Project Manager with strong experience in Digital Marketing, Lead Generation, Online Brand Management, Marketing Campaigns, Project Management, Search Engine Optimization (SEO), Search Engine Marketing (SEM), PPC, eMail Marketing, Web Analytics, Web Technologies, Web Design and Development.

With an MBA in Marketing. IIM Calcutta Alumini. Lives in Bangalore, Karnataka - India.

Get social with Joydeep at Twitter | Facebook | LinkedIn

Subscribe to Newsletter

Receive my latest posts right in your inbox? Enter your email address below to subscribe.

We'll never share your email with anyone else.