User agents and referrers - who are you any way?

We rely on knowing who is coming to our site and how they got their our conversion goals are set by this and sometimes our authentication systems rely on them but what are these concepts and how can we abuse use them.

User Agents

A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web. Web user agents range from web browsers to search engine crawlers (”spiders”), as well as mobile phones, screen readers and braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.


The user agent is a special code that is sent during a http request to identify what the client application is, so every time we use a browser our browser sends out this code identifying it. Browser traditionally send out a user code that looks like
Mozilla/# (compatible; )
This is a historical thing and the Mozilla referrers not to the modern browser but its distant ancestor and this user string is used by many browsers including Internet Explorer!

Couple of common strings
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Internet Explorer 6 on XP

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Firefox 2.0 on Win XP

Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3
Safari on Iphone

Changing your agent
For changing your user agents have a look at the following resource, for Firefox users the agent switching plugin is a great tool.

Bots and User agents
Its not just browsers that have user agents anything that connects using http will send out a user agent and that includes search engine spiders. Most people have come across Googlebot Google famous search engine crawler. Googlebot has had several incarnations and several user agents but currently travelling the web with a agent string like this

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
It also goes under
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
So what’s the difference? in terms of how it crawls your site probably little but the top one is declaring itself a browser so if you were hiding or showing different content to browsers then your site may well have displayed this information to the Google Bot, many people myself included have theorised that GoogleBot has several alter ego’s and sometimes pretends to be a real browser to check for such duplicity.

Bots should if properly built (and if their designer wanted them to) follow robot.txt directive aimed at them, but this should be considered “voluntary”.


Quick tip: Have you ever wanted to get on to sites like expert exchange that let Google in but not you? well you could always change your user agent to Google :), alternatively you could just use the cached page on the Google search page.


Setting User Agent in CuRL
If your using PHP and CuRL then the string below will allow you to change your scripts user agent for more information on using curl_setopt check out the PHP manual

curl_setopt($handle, CURLOPT_USERAGENT, "USER AGENT HERE&quot ;)


For more information about CURL check out the PHP tutorial section at webdigity .

Referrer information

The referer, or HTTP referer, identifies, from the point of view of an internet webpage or resource, the address of the webpage (commonly the URL, the more generic URI or the i18n updated IRI) of the resource which links to it. By checking the referer, the new page can see where the request came from

The referrer indicates where the user has come from this is normally a URL or IP address, this is useful to see where links to your site are, which are bringing in traffic etc. Unfortunately referrer information like anything else can be faked, and a common tactic by those with less scruples is to attempt referral spam, this is where they fake traffic to a site often by creating a fake referrer through a proxy site. When someone manipulates their referral information it is often called spoofing

Spoofing
Spoofing is when some one attempts to gain data from a URL by claiming the user has come from a different URL, lets imagine a badly designed website has a simple login area which goes to a members are, after a successful login the user is redirected from the login to the members page. By faking that the user was coming from the ok’d login to the members area they can avoid needing to authenticating.

Spoofing has a few legitimate uses and its worth trying to spoof into your secure areas to make sure they are not compromised Firefox users might be interested in refspoof a toolbar plugin.
Some basic checks include

  • Target/Referrer: membersarea
  • Target:membersarea Referrer:Login

But most spoofs occur when cross domains so people authenticate on site A but access content on site B.

Faking referrer information in Curl
If pulling data in using Curl and PHP it might be important to change the referrer (perhaps to a page explaining what the script is doing) in such cases you can use the following

curl_setopt($handle, CURLOPT_REFERER, "URL HERE&quot ;)


Again visit webdigity tutorial section for more information and examples on CURL.

Get our Content via RSS feed using Feedburner


Subscribe to The Venture Skills Blog by Email

PodcastAll our Posts are audio subscribed for more information see here, and to access the podcast feed here

AddThis Social Bookmark Button


RSS icon This blog is moving soon, make sure you move with us by using our Feedburner RSS feed, if you have used the autodiscovery button in your browser you may need to swap feeds, simply delete the old feed and add, http://feeds.feedburner.com/VentureSkills For a more detailed explanation on feeds and recieving our content in various formats click here

One Response to “User agents and referrers - who are you any way?”

  1. Nick Says:

    Nice tutorial, thanks.

    And thanks for the link :)