
This is a report on how our bet on the skilled, rogue freelancer and the growing small agency has worked out as we approach our software-as-a-service's first year of operation.
We took a risk when we decided to build HiFi: create a professional grade software-as-a-service product in the website content management space dominated by consumer do-it-yourself, cookie-cutter, template driven products. While most content management system services are a play to displace professionals with software, HiFi's goal is to amplify professional web developers with better software and infrastructure. For too long the freelance and small agency status quo has been expensive self-installed, self-maintained software on overcrowded, underpowered servers. Our wild bet was that by killing it on the fundamentals of having a great JSON API, a powerful infrastructure, and best-in-class, knowledgeable support, we could alleviate the aspects of modern website site development that, frankly, suck (you know, things like maintaining server software and configuration, losing sleep over uptime and reliability, updating software to fix bugs and security issues, custom queries for data, effectively compiling and compressing CSS, LessCSS, JavaScript, and CoffeeScript, and so on), and help professional developers focus on what they love most: designing and implementing great web sites.
Only a year in it's still early to make a call, but so far the freelancers and agencies have invested in the switch to HiFi have not only loved the experience but have increased their productivity substantially. We're really encouraged not only in the feedback, but in the hard numbers. Publicly, for the first time, here they are...
Growth
Let’s start with number of websites powered by HiFi. For our purposes we define this metric as the number of live websites that received traffic. This means free trial sites and sites that are currently in production but not yet live are not counted. From May through November of 2010 HiFi was in beta. In less than a year since coming out of Beta the total number of web sites powered by HiFi has grown more than 350 percent.

Monthly page views is another interesting measurement we look. September was a record setting month for HiFi with over 2.5 million page views served. By the end of October HiFi should surpass 20 million page views served since inception.

Site Speed
We spent some time earlier this month improving our caching system so not only are we serving more page views than ever before, they’re getting served faster than ever before. Cached pages are now served ~100ms faster than before which represents a 5-10x improvement in page load speed depending on client internet connection. You can really feel the difference in clicking around sites on HiFi!

HiFi web sites are much more than just HTML pages, across the board each page contains beautiful graphics, meticulous CSS and LESS files, and powerful JavaScript and CoffeeScript programs. We serve all of these assets up, too. In fact, HiFi has served over 150,000,000 requests total across website front- and back-ends. During peak hours we average up to 33 requests per second with bursts of over 100 requests per second.
Reliability
We haven't found competing services that are publishing their exact monthly uptime records but we think it's important. This is something we lose sleep over, we're proud of, and in recent months HiFi is more reliable than ever. Since June 1st, 2011 we have a 99.998% uptime record. In the trailing year we're just a hundredth of a percent shy of that at 99.989%. Here's a look at HiFi's uptime, as measured on minute intervals by Pingdom, in the trailing year:

Bigger, Faster, Stronger
Our team is really pleased with how quickly the number of live public sites on HiFi is growing. Not only are we seeing sites grow their traffic by leveraging the baked-in features of HiFi, but we are seeing bigger sites moving to HiFi as well. This is evident in that from September of 2010 to September of 2011 our number of live sites has grown by 5x and overall page views by 8x. With recent updates we are serving these websites faster and more reliably than ever before and with no additional effort needed on the web site developer’s part.
Do you have 5 minutes to try a CMS that will open up new creative opportunities for your firm's work?
We put together these 5-minute Challenges for you to quickly get a sense of HiFi's power. Each challenge covers an interesting feature that would take significant effort, an assortment of 3rd party plugins, or simply wouldn't be possible at all, using any of our competitors' software. These challenge are just a hint at how HiFi will help you deliver higher quality work to your clients, faster.
Challenge #1: Can you dramatically speed up your clients' web sites? (Easy)
Challenge #2: Can you bend a CMS template to your custom design's will? (Moderate)
Challenge #3: Can you access your site's content using JavaScript? (Advanced)
Sign up for a free trial account and take HiFi for a spin.
For website developers time is a valuable resource. Website development requires time spent designing, time spent turning a design into HTML and CSS, and time spent making a site interactive with JavaScript programming. One of our fundamental goals with HiFi is to make build-outs move faster for web development professionals by providing the best power tools available. Today we’re excited to announce support for automatically compiling and minifying CoffeeScript, "a little language that compiles into JavaScript". CoffeeScript can improve productivity and reduce your code footprint compared to writing pure JavaScript.
What is CoffeeScript?
The project's own description captures it nicely:
CoffeeScript is a little language that compiles into JavaScript. Underneath all of those embarrassing braces and semicolons, JavaScript has always had a gorgeous object model at its heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way.
The golden rule of CoffeeScript is: "It's just JavaScript". The code compiles one-to-one into the equivalent JS, and there is no interpretation at runtime. You can use any existing JavaScript library seamlessly (and vice-versa). The compiled output is readable and pretty-printed, passes throughJavaScript Lint without warnings, will work in every JavaScript implementation, and tends to run as fast or faster than the equivalent handwritten JavaScript.
CoffeeScript is a power tool for JavaScript development. It borrows inspiration from other great languages such as Ruby and Python to produce a beautiful, concise syntax. For a feature-by-feature walkthrough of the language check out coffeescript.org, these are but a few of the highlights.
1. Function Syntax
JavaScript functions and callbacks are frequently to create interactive websites. Consider this jQuery snippet that toggles the behavior of clicking on a target div:
$('#target').toggle(
function() { alert('First handler'); },
function() { alert('Second handler'); }
);
Compare this with the equivalent CoffeeScript:
$('#target').toggle(
-> alert 'First handler',
-> alert 'Second handler'
)
Functions are denoted with the arrow syntax Whitespace is significant so statements do not end in semi-colons and function bodies are not wrapped in braces. You may also notice that parenthesis are optional when calling a function. CoffeeScript’s language design choices tidy the syntax up quite a bit.
2. Implicit Returns
This is a move borrowed from Ruby’s playbook and it is wonderful. Let’s take a look at a basic example, squaring a number. Here’s the JavaScript version:
function(x) { return x * x; }
CoffeeScript:
(x) -> x * x
The last expression of a function automatically returns. Notice, too, we have demonstrated how functions define parameters.
3. Everything is an Expression
A natural complement to implicit returns is the idea that everything is an expression, something Rubyists love. Consider the following function:
grade = (score) ->
if score >= 90
if score > 93
“A”
else
“A-”
else if 90 > score >= 80
if score > 83
“B”
else
“B-”
else
“C”
The letter grades are returned from within the nested ifs. This little snippet also shows off a cute little technique borrowed from Python which allows chained comparison operations. Variables can also be assigned the result of an if statement.
targetDiv = if foo
$(‘#bar’)
else
$(‘#baz’)
Or, more concisely, using the then keyword:
targetDiv = if foo then $(‘#bar’) else $(‘#baz’)
4. Object Literals
Object literals can use YAML-like syntax and indention to define nested objects. Commas are optional. Consider the following JavaScript/jQuery snippet:
$.ajax({
url: ‘search’,
success: function(data) {
console.log(data);
},
data: {
query: “Search Terms”,
page: 1,
results:10
}
});
Here is the equivalent CoffeeScript:
$.ajax
url: ‘search’
success: (data) -> console.log data
data:
query: “search terms”
page: 1
results: 10
5. Simple Object-Oriented Classes, Inheritance, and super
JavaScript’s prototypal system of inheritance is powerful but awkward to work with. CoffeeScript’s classes are easy to define, extend, and work with. The @ symbol is an alias of ‘this’ or ‘this.’ when followed by a symbol.
class Vehicle
constructor: ->
@wheels = 1
console.log ‘Vehicle constructed’
class Car extends Vehicle
constructor: ->
super
@wheels = 4
console.log ‘Car constructed’
Using CoffeeScript with HiFi
Using CoffeeScript with HiFi is actually no different than using plain Javascript! From the design tab, add a new script and give it a name with a '.coffee' extension. This will create a new CoffeeScript file for you to work with. When you are ready to actually include it in your template, just use the js tag:
{% js 'somescript.js,setup.coffee' %}
You can include as many scripts, Coffee or plain JS in this tag and HiFi will automatically compile your CoffeeScript into Javascript. It also knows when your CoffeeScript files change and recompiles them on the fly.
Once you are done developing, just add the min flag to your JS tag:
{% js 'somescript.js,setup.coffee' min %}
Now, in addition to compiling your CoffeeScript files, all files listed will be compiled together and then minified using the Google Closure compiler. HiFi will still recognize when your files change and automatically recompile and minify these for you. It's just that easy.
API design has been a point of discussion on Hacker News recently. The discussion has been focused on the process of adding on an API to your app. With our SaaS product, the HiFi CMS, we approached the problem from the opposite direction: flesh out the API first and then build your application on the API. There was a lot of interest in our approach to this, so we thought we would write out this post.
Our application, HiFi, is a modern web cms that is the antithesis of most cookie-cutter CMS SaaS products on the market. A core goal with HiFi has been to create a platform where professional designers, web developers, and information architects can create websites that live up to their original vision. Early in the planning process it became clear a flexible, first-class API would be needed to achieve this goal. Getting the API right would be the only way a hosted CMS could acheive the flexibility that developers love about non-hosted platforms.
Our requirements for the API were driven by common sense best practices and by the unique needs of website content models. The key requirements were:
- Consistency - all objects in the system should be represented, manipulated, searched, versioned, and access controlled in the exact same way. This holds for content, types, users, relations, permissions, etc.
- Simplicity - there are only two methods in our API: get and put. An object's type and its "undeleted state" are just properties of each object. JSON is the only format for V1. Further more, the requests made are in a very similar format to the data you get back. It just feels natural.
- Versioning / Immutability - storage is cheap in 2010: why not keep everything? Objects aren't updated or deleted, new copies are created.
- Structure - all objects are organized in a tree. Relationships between objects can introduce additional structure.
- Queryable - all objects can be queried by criteria on their own properties and properties of structuraly related objects
- Access Controlled - read / write permissions can be applied to any object and inherit down the tree
We built the API to satisfy these requirements first, then we built our app on top of the API. This turned out to be a great idea. We got to dogfood our API for the entire development process and it made testing a lot simpler.
Websites running on HiFi use a templating language, twig, which is very similar to Django templing. All templates have access to the API server-side. This makes it really easy to add complex features to websites without much stress. Here are a few quick examples of queries:
{"type":"page", "orderBy":"-publishedAt"}
This query pulls in the most recently published pages in HiFi.
{"type":"page","orderBy":"-publishedAt","parent":{"type":"feed","url":"/blog"}}
Fetch the most recently published pages whose parent is a blog feed with url '/blog'.
{"type":"page","orderBy":"-publishedAt","child":{"type":"comment"}}
Fetch the most recently published pages which contain a comment.
All nested queries can be combined as much as possible and be as recursive as desired. There are similar commands for working with relationships. The net result is an API that was simple, yet powerful enough to build our admin panel on.

HiFi's admin user interface is 95% client-side using jQuery, Resig's JavaScript templates, and pulls data via the API. We created a HiFi JS library that creates a jQuery-like chainable DSL around the API. The next example should look familiar to jQuery users. In it we'll change the URLs of all pages on a site to be lower case using this:
hifi({type:'page'}).each(function(page) {
hifi(page).update({url:page.url.toLowerCase()});
});
A neat side-effect of writing the API first is finding use cases you didn't plan for. One example for HiFi was in keeping old versions of all objects. The intention for this API feature was to enable easy roll backs for any content type. When we recognized that old URLs were being stored and were queryable through the API we updated our website routing code's logic to see if an unreachable URL ever existed and if so we automatically 301 redirect to the object's most recent version's URL. We got this for free. Had we built the app first we would have written a lot of special case code to achieve the same result.
Now that HiFi is open for business and over 50 websites have launched on the service we're beginning to see uses of the API in end-user work that are really exciting. An example site that has personally blown me away is PBS' CiaoItalia.com. Ciao Italia is America's longest running cooking show. Over the course of 21 seasons chef Mary Ann Esposito has compiled a lot of great recipes and videos. If we were to flatten the APIs tree it would look like: 21 Seasons -> ~20 Episodes each -> ~4 Recipes each -> YouTube Video. This advanced search page was implemented entirely using the API in user-space templates.
HiFi's API was built before we built the app on top of it and it's a better product because of it. If you're not in a greenfield situation and your app already exists consider what it would take to design an API that could eventually sit beneath your app not beside or on top of it. Investing in a great API not only cleans up your code base, but also opens up the door for awesome, unanticipated use cases coming to life.
This page is currently under construction due to the relaunch of this site.
One of the cool features we're working on for HiFi are custom types. HiFi has a unique JSON based data store that makes adding custom types really easy. We wanted to put up an example of how this would work, so we built a custom "slide" type. As we've mentioned before, this site is running on a pre-alpha of HiFi. So the slideshow below is being driven by HiFi.
In effect, this is a "meta-slideshow". Flip through it to see how easy it is to add custom types to HiFi and to see some of the functionality you can get right out of the box.
If this seems a little technical, don't worry. This is an example of some of the extended features that HiFi will make available to developers. It should be a good example of both the power and user-friendliness that HiFi will provide.
Note that comments can be left on any of the slides individually. This is explained in the slideshow.
Regular expressions are a handy tool for developers to use. They're a concise, domain-specific language for pattern matching text. Regular expressions have many applications: input validation, data extraction, and advanced search and replace are a few good examples. In this introductory regular expression tutorial we'll take a high-level tour of primary concepts. We'll avoid details for now and revisit them in follow-up tutorials.
Use our free regex tester to follow along with this tutorial. It is web-based and written in JavaScript so no download is necessary. We'll be using JavaScript style syntax, which is closely related to its Perl regex inspiration. Most modern language regex implementations use a syntax like Perl/JavaScript's but the details for executing match and replace are language specific.
Simple Text Finding
If you've ever used 'Find' in a program like your web browser or text editor you've used a tool akin to the most basic regular expressions. For example, the regular expression /hifi/g will match the string 'hifi'. It will not match 'HiFi' or 'high-fidelity'.

In order to match these three variations we can use regex choices, just like an OR.
This or That, Regex Choices
Choices are the 'OR' of regular expressions. The vertical bar character | denotes the "OR" between options. For example, /hifi|HiFi|high-fidelity/g, translates to 'hifi', or 'HiFi', or 'high-fidelity'.

What if we wanted to choose between two options in the middle of a regex match string? For example, what if we wanted to match the words ending in an 'ood' or 'ould' sound in the classic tongue-twister 'How much wood would a woodchuck chuck'? We could repeat a lot of information or'ing each of the words like /wood|would|could/g or we could use matching groups and use a choice in the middle of the string.
Regex Matching Groups
Create matching groups by pairing parenthesis around a part of the regular expression. They allow you to 'pick out' important parts of a large match with sub-expressions. That sounds more complicated than it is. Let's look at a simple example: /(w|c)ould/g.

Notice how there's a special match on the first letter. We have essentially picked out the small part of the string we care about: the first letter of an 'ould' sounding word.
We can use a second group to match 'wood', too, with an or in the middle of the regex: /(w|c)(oul|oo)d/g

What else would this regex match that is not in this string? I'll leave this as an exercise to the reader.
How Many Times? Regex Quantifiers
If we care how many times a regex factor occurs, meaning characters, groups, and classes (up next), we can use a quantifier. Quantifiers come in two formats: ranges between curly braces and the special characters *, +, and ?.
Let's talk about ranges first. Google owns a number of variations on the domain 'google.com' with varying numbers of 'o's. The following domains will all take you to Google: gogle.com, google.com, gooogle.com. We can capture any of these variations using a range quantifier: /go{1,3}gle/g.
It turns out, they also own the domain with 5 o's, but a squatter owns the domain with 4. So, we can put together the three big concepts we've learned so far: choices, groups, and quantifiers, to "match g followed by 1 to 3 o's or 5 o's followed by 'gle'" with this regex: /g(o{1,3}|o{5})gle/g.

Ranges in the form of {N,} that do not have a ceiling translate to "at least N times". We can use quantifiers on groups, too, so we can find all instances of the google domain with an even number of o's like this: /g(oo){1,}gle.com/g

Now we're ready to start talking about the special quantifier characters. They're simple shortcuts for commonly used ranges:
* is {0,} - 0 or more matches
? is {0,1} - 0 or 1 match
+ is {1,} - 1 or more matches
Thus, the previous example could have also been written as /g(oo)+gle.com/g
Regex Character Classes
We saw how we could match a 'w' or a 'c' using a group and a choice with (w|c). What if we wanted to match all vowels? /(a|e|i|o|u)/g. That's starting to get verbose!
There is a special construct in regular expressions called character classes, they are placed within square brackets []. For example, /[aeiou]/g and /(a|e|i|o|u)/g mean roughly the same thing. Ranges can be used, too, so [A-Z] is all uppercase English letters, [a-z] is all lowercase letters and [A-Za-z] all english letters. Finally, classes can be complemented with the ^ character to say "not any of these characters" so consonants could be described as the opposite of vowels: /[^aeiou]/g. (Note: Complements include all ASCII characters so [^aeiou] includes numbers, punctuation, etc.)
We can match words using a class of all English characters and a 1 or more quantifier, /[A-Za-z]+/g, such as:

Pulling it All Together
As a last example let's put all the concepts we've covered in regular expressions together to pick out tags from an HTML file. HTML tags open like <div id="container"> and close like </h1>
Let's focus on finding opening tags, for now. They start with a less than sign and are followed by a letter, followed 0 or more other letters or numbers. We'll use a capturing group to pick out the important part: /<([A-Za-z][A-Za-z0-9]*)/g

What if we wanted to pick out the attributes of a tag, too? We could match everything except the greater than character, using a complemented character class: ([^>]*)

As a final touch we can also match all closing tags, too. The difference between a closing tag and an opening tag is forward slash after the greater than sign like </ that. The slash can occur 0 or 1 times, 0 for opening, 1 for closing. The forward slash is a special regex character so it must be escaped with a backslash. We'll cover escaping special characters in a future post, for now just take my word that if you use \\/ it means simply a forward slash. So, this last little piece looks like (\\/?) and our product is /<(\\/?)([A-Za-z][A-Za-z0-9]*)/g

That's all for now, folks. We've seen how regular expressions can be used in the most basic sense as similar to your text editor's "find", covered choices which allow for ORs in our matching, used groups to pick out the important parts of a match, looked at sequences giving us the ability to specify how many times a match occurs, and finally poked around with character classes for concise lists of characters that match or do not match. Armed with just these tools you've got 80% of the power of regular expressions under your belt.
Unfortunately the other 20% of regular expression syntax is more detail oriented. These topics include special characters & escape codes, anchors, non-capturing groups, and lazy vs. greedy matching. We'll be covering these topics in future posts in this series. If you're eager to learn regular expressions you should subscribe to our development blog and continue playing around with our free regex tester.
It's no secret that we're working on a new service: HiFi, a content management system for designers who demand nothing less than a high-fidelity translation from a PSD design to live website. The space between PSD and HTML is filled by a templating language harmonizing a website's content with its beautiful, semantic markup.
If you've worked with a content management system before you've probably hated or tolerated its templating language. So have we.
Our internal CMS employs the use of Smarty, a popular and generic templating language for PHP. We tolerate it. The last major release of Smarty was nearly six years ago back in 2003. Since deciding to build HiFi our driving question has been "what should a great CMS look like for the 2010s", not the 2000s.
What's wrong with yesterday's templating languages?
A lot. We're in the process of surveying the good, the bad, and the ugly, and doing some prototyping of our own. Some of the most fundamental problems are already clear, though. Let's take a look.
It's Not a Language, It's Advanced Search/Replace
Many of today's templating "languages" simply search for template tags and replace them with instructions. This can take you a long ways, and the initial implementation is simple, but it leads to poor design decisions by constraining language possibilities to the tasks regular expressions solve.
Moving to a full-blown language processor (lexer/parser/compiler) removes most of the unnecessary constraints and code base scaling nightmares faced when everything is a regular expression search/replace. We're using techniques and best practices straight out of compiler textbooks to make sure HiFi's templating system is as simple and powerful as possible.
Most Errors Surface at Runtime
In most of today's templating languages the only way to find errors in your template is to run the template. So front-end developers get into a cycle of coding, running, hitting an error, coding, running, hitting an error, etc. Lots of these errors are simple ones, like the misspelling of a variable you're printing.
One of the benefits of moving to a full-blown language processor is the ability to do more analysis of a template's content. We can improve the experience for designers writing templates by issuing meaningful errors earlier. We can look at your template and, without running it, tell you "hey, this variable 'daate' you want to print in this if block that only runs on Mondays will never exist".
We believe improving the diagnostics of template coding will make writing template code much more enjoyable.
Weak use of Expressions
Arithmetic in view code should generally be avoided but scenarios exist where it or other expressions are the best way to carry out the task at hand. Assigning a new variable a calculated value, doing date arithmetic, using conditionals, etc, are possible throughout. We're aiming to make expressions natural and consistent.
What are your templating language loves & woes?
Is there a particular templating system you absolutely love or absolutely hate? Features you can't live without? Let us know in the comments as we're hoping to find great influences for HiFi. Also, be sure to the HiFi mailing list as we roll out announcements and a beta program in upcoming months.

On Friday we published a post describing the HTTP Accept header and WebKit's unfortunate use of it. Using the Accept header the browser can indicate to the server its Content-Type preferences.
Maciej Stochowiak of Apple's WebKit team responded to our post, admitting WebKit's error:
Most WebKit-based browsers (and Safari in particular) would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested. So the Accept header is somewhat in error...
The latest versions of WebKit, and thus Safari and Chrome, prefer XML over HTML in the Accept header. If a server is following the HTTP spec and serving a resource that can be represented as XML or HTML, it will respond with HTML to Firefox and XML to Safari.
We set up a quick demo of a tweet that can be represented as HTML, XML, JSON, and for fun, a JPG. Try opening http://recessframework.org/demo/content-negotiation/tweet in Firefox and then Safari/Chrome. You'll get just what each browser's Accept header asked for (pictured below). (Aside: For fun, try it in IE and you'll get a JPG. Then hit refresh and you'll get the HTML. Here's why.) (Demo built using the latest GitHub version of Recess, a RESTful PHP Framework. Source code here.)

Why does WebKit's Accept preference of XML over HTML matter?
Maciej of Apple's WebKit team goes on in his response to downplay the chosen Accept header's importance:
On the other hand, this isn't a hugely important bug, [...] since content negotiation is not really used much in the wild.
Maciej, your transparency and admission of the error is appreciated, but your assesment of WebKit's Accept header importance is wrong. Content-type negotiation has not seen used much in the wild for two reasons:
- Server-side software working against the grain of the HTTP spec made it historically difficult.
- Web browsers improperly using the Accept header made it historically worthless.
On point 1, HTTP 1.1 is now 10 years old and in the past couple of years Fielding's REST movement has gained momentum. Server-side implementation is no longer a hard problem: most servers and frameworks can handle content-negotiation just fine.
On point 2, of modern browsers WebKit is the last one blocking the primary use case for content-type negotiation that today's web apps have: representing resources as either HTML or XML. Even though Internet Explorer's Accept header is full of garbage, its sins result in wasted resources and a performance penalty, not incorrectness. WebKit's Accept header results in web developers choosing between HTTP incorrectness or a bad user experience. Follow the HTTP spec and your users will get XML dumps as demo'd, or do not follow the HTTP spec and roll your own one-off content-negotiation protocol. The choice is usually simple: user experience is more valuable than developer experience and adherence to the web's most important RFC.
Why does WebKit use this Accept header?
The final pieces to Maciej's response identify the origin of WebKit's Accept header as Firefox's old Accept header:
...we design our Accept header mainly to give the best compatibility on Web sites [...] Our current header was copied from an old version of Firefox.
Firefox 3.0 and 3.5, tested in our previous post, have reasonable Accept headers. Maciej is right, though, Firefox 2.0 looks to have had a particularly bad Accept header:
text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Just like WebKit's, Firefox 2.0 prefers XML, XHTML, and PNG, over HTML. WebKit must have copied Firefox's Accept header on multiple occasions, judging from this year old commit which removes the antiquated 'text/xml' type from WebKit's list.
Firefox 2 sent "text/xml" but in an acceptable order for the web server. And Firefox 3 doesn't send "text/xml" at all in the Accept header since it is being deprecated in favor of "application/xml". We decided that the best solution is to match Firefox 3 and stop sending "text/xml" in the Accept header.
In all this copying of Firefox's Accept header the WebKit team missed the forest for the trees. WebKit's team knows quite well it "would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested" but failed to express this in its Accept header. Fortunately, it's an easy fix.
The Fix: Change this String at Line 200 of FrameLoader.cpp
On line 200 of WebCore/loader/FrameLoader.cpp resides the string that causes this pain:
static const char defaultAcceptHeader[] =
"application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
WebKit could make amends with web developers wanting to use HTTP content-negotiation to provide XML and HTML representations of their resources without crippling the end-user experience by changing their default Accept header in one of two ways:
- Keep up the status quo and copy Firefox 3.5's reasonable Accept header:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- Live a little, give preference to your browser's kick ass HTML abilities, and save some bandwidth:
text/html,*/*
The error has been entered to WebKit's Bugzilla tracker here. I will post updates to this blog (RSS feed) regarding the status of WebKit's fix as they come. We can only hope that, in time, we'll be able to use the Accept header as it is specified and make the internet a slightly happier, more RESTfullier place.
Update: WebKit team responds to this post. Admits error, downplays importance.

When a web browser make a request it sends information to the server about what it is looking for in headers. One of these headers is the Accept header. The Accept header tells the server what file formats, or more correctly MIME-types, the browser is looking for. Let's take a look at Firefox's Accept header:
GET /page/routing-in-recess-screencast HTTP/1.1
Host: RecessFramework.org
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Let's translate Firefox's request to English:
Dear RecessFramework.org,
I want the resource "/page/routing-in-recess-screencast" and I want it in an HTML or XHTML format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in an XML instead. If you can't even give it to me in XML, well, I'll take anything you've got!
Love,
Firefox
The Accepts header gives the browser a chance to tell the server which format it wants for a resource. By giving a list of options this content negotiation happens in a single request. One of the key design goals of the HTTP spec is to minimize back-and-forth communication. The browser could ask for each of these formats one at a time but it would be wasteful.
How does the browser specify the preference Give me HTML/XHTML before XML before *? Preference is indicated by the "relative quality parameter" (q) and its value (qvalue), seen in application/xml;q=0.9,*/*;q=0.8. Here's how the HTTP spec defines it:
Each media-range MAY be followed by one or more accept-params, beginning with the "q" parameter for indicating a relative quality factor. The first "q" parameter (if any) separates the media-range parameter(s) from the accept-params. Quality factors allow the user or user agent to indicate the relative degree of preference for that media-range, using the qvalue scale from 0 to 1 (section 3.9). The default value is q=1.
For as brilliant as the spec is, it is a terrible read. What's going on is simple:
- Everything item's default preference value is 1.
1: html, xhtml, xml, *
- If an item specifies q=X, its preference value is X.
0.9: xml
0.8: */*
1: html, xhtml
- Order by preference value in descending order.
1: html, xhtml
0.9: xml
0.8: *
The only other major detail is in cases where there are ambiguities the more specific one wins. For example if both application/xml and */* had a preference of 0.9 application/xml would still come first. Firefox chooses to make it explicit that */* is less preferred by giving it a preference of 0.8. Firefox's Accept header is sensible and well thought out. Opera's is too. Other browsers: not so much.
What in The Header Were You Thinking WebKit?
Don't relax yet IE, you're up next, and you're even more egregious. So, what's wrong with WebKit, the lauded engine behind Safari and Google's Chrome? Let's take a look:
GET /page/restful-php-framework HTTP/1.1
Host: RecessFramework.org
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,
text/plain;q=0.8,image/png,*/*;q=0.5
Note: Accept split to two lines for width. On quick glance it doesn't look too different from Firefox's. Let's try it again in English just to be sure.
Dear RecessFramework.org,
I want the resource "/page/restful-php-framework" and I want it in an XML, XHTML, or PNG format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in HTML or plain text instead. If you can't do that for me I'll take whatever!
Thanks,
WebKit
Really WebKit? The browsing engine most responsible for killing XHTML prefers XHTML over HTML! It would also prefer PNG over HTML. That's a little embarrassing, but what is worse: Safari and Chrome accept XML over HTML (and, ambiguously, over XHTML, too). WebKit's Accept header forces web developers to work against the HTTP spec.
Suppose you are Twitter and want to be a good RESTful internet citizens following the HTTP spec. You've got a resource called a tweet that can be represented as XML or JSON or HTML. You wouldn't want Safari users to get an XML copy of a Tweet by browsing around, so you have to actively ignore WebKit's Accept header preferring XML above all else. Aside: It turns out Twitter's REST API ignores many REST/HTTP best practices like the Accept header, anyway, but that's another story for another post.)
Update from Maciej Stachowiak of Apple's WebKit team:
Most WebKit-based browsers (and Safari in particular) would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested. So the Accept header is somewhat in error. On the other hand, this isn't a hugely important bug, and we design our Accept header mainly to give the best compatibility on Web sites, since content negotiation is not really used much in the wild. Our current header was copied from an old version of Firefox.
Internet Explorer Accepts Polluting the Internet
We've covered the good and the bad. Now let's talk about Internet Explorer. The IE team made great strides with being a nicer player on the web. Unfortunately, its Accepts header is downright ugly:
GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/jpeg, application/x-ms-application, image/gif,
application/xaml+xml, image/pjpeg, application/x-ms-xbap,
application/x-shockwave-flash, application/msword, */*
This is the Accepts header for IE8 on a Windows 7 machine. One peculiarity is the "application/msword" MIME-type. Office isn't installed but the Word Document Viewer is. This made me wonder, what does IE's Accept header look like on a machine with Office installed? Brace yourlselves:
GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
application/vnd.ms-xpsdocument, application/xaml+xml,
application/x-ms-xbap, application/x-shockwave-flash,
application/x-silverlight-2-b2, application/x-silverlight,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*
Ok, now let's translate to English:
Dear RecessFramework.org,
I want the resource "/book/html/index.html". Now, bear with me, I'm Internet Explorer and Office is installed so I can accept this resource in a lot of formats, in this order of preference: GIF, JPG, Progressive JPG, Click Once App, Microsoft XPS Document, XAML, XAML Browser App, Flash, Silverlight 2, Silverlight 1, Excel Document, Powerpoint Document, or a Word Document. If you can't give me "/book/html/index.html" in any of those formats then give me anything you've got!
Thanks,
Internet Explorer
There are two things wrong with this picture. The lesser evil: IE has a hook for other applications to insert new MIME-types into its Accept header. This means if a resource could be represented on the server as a Word Document or as an HTML document, Word as an application can inject behavior into IE so that it always has higher precedence than HTML. All an application has to do is modify the registry (HKLM/Software/Microsoft/Windows/CurrentVersion/Internet Settings/Accepted Documents). (Hear that Cisco? You could increase internet consumption if you stuck a couple 255 character WebEx MIME-types in IE's Accept header.)
The greater evil is that IE sends this ~200-300byte Accept header for every single browsing request. 250 bytes isn't much, but on internet scale per every request of the most popular browser, it adds up. Internet Explorer's Accept header emissions pollute the information superhighway. Lets do some back-of-napkin calculation. Google gets 294 million searches a day now. If IE has roughly 55% market share thats 162 million IE requests on Google a day for 38GB worth of garbage internet traffic. On Google searches alone, IE pollutes the internet with over a terabyte of traffic every month in its Accept header. Anyone want to estimate what this number looks like across the rest of the internet?
Update 1: IE team Program Manager Eric Lawrence "I strongly recommend that developers not list MIME types here." Yet Silverlight and Office do. Whoops.
Update 2: IE doesn't send the extended header on *every* request, it sends */* for refreshes and some subsequent visits. [IEBlog]
It is not just wasted bandwidth that is the problem, it is wasted server processing, too. If a server or framework wants to follow through on the HTTP protocol the server must be sure it can't respond with any of the requested formats before it can respond with HTML. Bottom line: IE's Accept header is extremely ugly.
If WebKit is Foolish and IE is Prodigal how valuable is the Accept header?
This was the question I asked myself about half-way through writing the Accept parsing and content-negotiation code going into the next release of Recess, a RESTful PHP framework.
Content-negotiation with the Accept header is an interesting idea in principle that is hard to use properly in practice because browsers misuse it. As stated, Twitter's REST API doesn't use the Accept header for content-negotiation, they use extensions on the URL '.json' and '.xml'. Rails disables the Accept header by default. Frameworks can enhance performance by ignoring the Accept header and relying on '.xml'-like extensions. As such the next release of the Recess Framework, too, will disable Accept header based content-negotiation by default.
So, when would you want to parse Accept headers for content negotiation? When your consumers are respectful of HTTP and REST (RESpecTful!). This could mean RIAs written in javascript, Flash, or Silverlight. It could also mean other other servers consuming your RESTful API.
Bottom line: If you're building APIs for other developers to consume, consider using Accept-based content-negotiation. If you're building consumer facing web apps: ignore the Accept header until WebKit and IE get their acts together.