When a web browser make a request it sends information to the server about what it is looking for in headers. One of these headers is the
Accept header. The Accept header tells the server what file formats, or more correctly MIME-types, the browser is looking for. Let's take a look at Firefox's Accept header:
GET /page/routing-in-recess-screencast HTTP/1.1 Host: RecessFramework.org Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Let's translate Firefox's request to English:
I want the resource "/page/routing-in-recess-screencast" and I want it in an HTML or XHTML format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in an XML instead. If you can't even give it to me in XML, well, I'll take anything you've got!
The Accepts header gives the browser a chance to tell the server which format it wants for a resource. By giving a list of options this content negotiation happens in a single request. One of the key design goals of the HTTP spec is to minimize back-and-forth communication. The browser could ask for each of these formats one at a time but it would be wasteful.
How does the browser specify the preference Give me HTML/XHTML before XML before *? Preference is indicated by the "relative quality parameter" (q) and its value (qvalue), seen in
application/xml;q=0.9,*/*;q=0.8. Here's how the HTTP spec defines it:
Each media-range MAY be followed by one or more accept-params, beginning with the "q" parameter for indicating a relative quality factor. The first "q" parameter (if any) separates the media-range parameter(s) from the accept-params. Quality factors allow the user or user agent to indicate the relative degree of preference for that media-range, using the qvalue scale from 0 to 1 (section 3.9). The default value is q=1.
For as brilliant as the spec is, it is a terrible read. What's going on is simple:
- Everything item's default preference value is 1.
1: html, xhtml, xml, *
- If an item specifies q=X, its preference value is X.
1: html, xhtml
- Order by preference value in descending order.
1: html, xhtml
The only other major detail is in cases where there are ambiguities the more specific one wins. For example if both application/xml and */* had a preference of 0.9 application/xml would still come first. Firefox chooses to make it explicit that */* is less preferred by giving it a preference of 0.8. Firefox's Accept header is sensible and well thought out. Opera's is too. Other browsers: not so much.
What in The Header Were You Thinking WebKit?
GET /page/restful-php-framework HTTP/1.1 Host: RecessFramework.org Accept: application/xml,application/xhtml+xml,text/html;q=0.9, text/plain;q=0.8,image/png,*/*;q=0.5
Note: Accept split to two lines for width. On quick glance it doesn't look too different from Firefox's. Let's try it again in English just to be sure.
I want the resource "/page/restful-php-framework" and I want it in an XML, XHTML, or PNG format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in HTML or plain text instead. If you can't do that for me I'll take whatever!
Really WebKit? The browsing engine most responsible for killing XHTML prefers XHTML over HTML! It would also prefer PNG over HTML. That's a little embarrassing, but what is worse: Safari and Chrome accept XML over HTML (and, ambiguously, over XHTML, too). WebKit's Accept header forces web developers to work against the HTTP spec.
Suppose you are Twitter and want to be a good RESTful internet citizens following the HTTP spec. You've got a resource called a tweet that can be represented as XML or JSON or HTML. You wouldn't want Safari users to get an XML copy of a Tweet by browsing around, so you have to actively ignore WebKit's Accept header preferring XML above all else. Aside: It turns out Twitter's REST API ignores many REST/HTTP best practices like the Accept header, anyway, but that's another story for another post.)
Most WebKit-based browsers (and Safari in particular) would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested. So the Accept header is somewhat in error. On the other hand, this isn't a hugely important bug, and we design our Accept header mainly to give the best compatibility on Web sites, since content negotiation is not really used much in the wild. Our current header was copied from an old version of Firefox.
Internet Explorer Accepts Polluting the Internet
GET /book/html/index.html HTTP/1.1 Host: RecessFramework.org Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, */*
This is the Accepts header for IE8 on a Windows 7 machine. One peculiarity is the "application/msword" MIME-type. Office isn't installed but the Word Document Viewer is. This made me wonder, what does IE's Accept header look like on a machine with Office installed? Brace yourlselves:
GET /book/html/index.html HTTP/1.1 Host: RecessFramework.org Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/x-shockwave-flash, application/x-silverlight-2-b2, application/x-silverlight, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Ok, now let's translate to English:
I want the resource "/book/html/index.html". Now, bear with me, I'm Internet Explorer and Office is installed so I can accept this resource in a lot of formats, in this order of preference: GIF, JPG, Progressive JPG, Click Once App, Microsoft XPS Document, XAML, XAML Browser App, Flash, Silverlight 2, Silverlight 1, Excel Document, Powerpoint Document, or a Word Document. If you can't give me "/book/html/index.html" in any of those formats then give me anything you've got!
There are two things wrong with this picture. The lesser evil: IE has a hook for other applications to insert new MIME-types into its Accept header. This means if a resource could be represented on the server as a Word Document or as an HTML document, Word as an application can inject behavior into IE so that it always has higher precedence than HTML. All an application has to do is modify the registry (
HKLM/Software/Microsoft/Windows/CurrentVersion/Internet Settings/Accepted Documents). (Hear that Cisco? You could increase internet consumption if you stuck a couple 255 character WebEx MIME-types in IE's Accept header.)
The greater evil is that IE sends this ~200-300byte Accept header for every single browsing request. 250 bytes isn't much, but on internet scale per every request of the most popular browser, it adds up. Internet Explorer's Accept header emissions pollute the information superhighway. Lets do some back-of-napkin calculation. Google gets 294 million searches a day now. If IE has roughly 55% market share thats 162 million IE requests on Google a day for 38GB worth of garbage internet traffic. On Google searches alone, IE pollutes the internet with over a terabyte of traffic every month in its Accept header. Anyone want to estimate what this number looks like across the rest of the internet?
Update 1: IE team Program Manager Eric Lawrence "I strongly recommend that developers not list MIME types here." Yet Silverlight and Office do. Whoops.
Update 2: IE doesn't send the extended header on *every* request, it sends */* for refreshes and some subsequent visits. [IEBlog]
It is not just wasted bandwidth that is the problem, it is wasted server processing, too. If a server or framework wants to follow through on the HTTP protocol the server must be sure it can't respond with any of the requested formats before it can respond with HTML. Bottom line: IE's Accept header is extremely ugly.
If WebKit is Foolish and IE is Prodigal how valuable is the Accept header?
This was the question I asked myself about half-way through writing the Accept parsing and content-negotiation code going into the next release of Recess, a RESTful PHP framework.
Content-negotiation with the Accept header is an interesting idea in principle that is hard to use properly in practice because browsers misuse it. As stated, Twitter's REST API doesn't use the Accept header for content-negotiation, they use extensions on the URL '.json' and '.xml'. Rails disables the Accept header by default. Frameworks can enhance performance by ignoring the Accept header and relying on '.xml'-like extensions. As such the next release of the Recess Framework, too, will disable Accept header based content-negotiation by default.
Bottom line: If you're building APIs for other developers to consume, consider using Accept-based content-negotiation. If you're building consumer facing web apps: ignore the Accept header until WebKit and IE get their acts together.