Join our HiFi mailing list to receive the latest news & updates

Our Optimal ExpressionEngine Install

21 Comment(s) | Posted | by Josh Lockhart | |

First off: Yes. This is a post about another Content Management System on the HiFi Blog.

We developed HiFi in a way that best fits our clients’ needs. However, occasionally, it just isn’t possible to use a hosted solution for a client — particularly educational institutions and some government organizations. When our clients need a self-hosted content management solution, we rely on ExpressionEngine. As people who obviously have strong opinions on content management, we respect the way EE works and the ideas behind it.

We are very particular about our work — our projects must be written to the highest standards using W3C compliant markup and, at the same time, still remain completely faithful to the project’s original design. This is certainly not an easy task, and we require that any CMS we use makes this possible.

This post explains how we set up ExpressionEngine, from hosting, to plugins, to client hand-off.

ExpressionEngine Control Panel

ExpressionEngine is a mature and extensible content management system from EllisLab that is installed on your own web server. It is very customizable and allows us to develop and launch websites that are standards compliant and true to their original designs. We also like ExpressionEngine for its:

  • Mature plugin marketplace
  • Easy, clean templating system
  • Custom field sets
  • Customizable UI for client users

Although ExpressionEngine is perfectly capable on its own, we believe it is important to extend ExpressionEngine with several very useful third-party plugins. These plugins extend the functionality of EE, but perhaps more importantly, they critically improve your clients' user experience:

We’ll go into detail for each plugin below.

It’s important to remember the cost of these plugins in addition to the cost of ExpressionEngine when quoting projects for a client. For a commercial site, this comes to $299 for EE plus $164.90 for plugins, for a grand total of $463.90. When working with clients on a tighter budget, we will sometimes spread the cost out with their hosting payments during the first year. This comes to about $39 a month on top of hosting and backup.

Of course, ExpressionEngine does have its shortcomings when compared to a hosted content management solution: ExpressionEngine requires you to obtain a hosting account, create a database, install the CMS, add and configure plugins, harden security settings, optimize the website templates, and prepare the CMS for client users.

As I’m sure most ExpressionEngine developers will agree, setting up an ExpressionEngine website is not always a simple or quick task. But when done well, the end result is very rewarding. This article explains New Media Campaigns’ process for launching a custom website using an optimized ExpressionEngine installation. We’ve found that it takes an hour or two to get up and running and then another hour to prepare the CMS for client handoff.

Starting a Project

Before we start a project, we make sure the required software and tools are available on the production server. If you don’t check these early in the project development cycle, you are in for a rude awakening just prior to launch!

We rarely develop a project on the production server, so not having these items does not impede development (assuming your development environment has them). But you should check these items before you begin development to avoid complications when your deadline looms.

Web Server

We prefer a UNIX or Linux production box (or virtual machine) running the Apache web server. Other web servers like nginx or IIS will work, too.

SSH and SFTP

You’ll need a way to upload ExpressionEngine and change file permissions. SFTP access is a must. SSH access is even better.

MySQL

You’ll need MySQL 4.1 or newer. Make sure your MySQL database user has SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, and DROP privileges. These are required by ExpressionEngine.

PHP

You’ll need PHP 5.0 or newer with at least 16M of allocated memory. PHP will also need the GD2 or ImageMagick libraries if your application will use CAPTCHAs, image resizing, or image watermarking.

Installing ExpressionEngine

Installing ExpressionEngine goes smoothly once you’ve got your server environment set up. The EE site documents this well. There are five steps followed by an automated installation wizard.

ExpressionEngine Plugins

After ExpressionEngine is installed, the first thing we do is complement ExpressionEngine’s default features with several helpful plugins.

Structure ($65)

Structure is perhaps our favorite ExpressionEngine plugin. It provides a different (and more intuitive) paradigm for separating content from presentation compared to what ExpressionEngine provides out-of-the-box. The Structure plugin presents website content to our clients as a tree outline; it makes it easy to understand, add, edit, and remove site content. This is a concept we believe very strongly in which is why we use it in HiFi.

We’ve found that clients have a much easier time finding and editing their own content using this type of content representation. It matches the way they think about their site and the way we collaborate with clients to build a sitemap as a part of our creative process.

NSM Better Meta ($49)

Search engine optimization is very important to our clients, as it should be. The way a site performs in search engines and delivers returns to a client’s bottom line is a key way that we are measured as web professionals.

Leevi Graham’s Better Meta plugin provides an easier, more intuitive way for our clients to enter metadata (Title, Description, Keywords, etc.) for EE site content. Important SEO fields are added to the Publish Form where they really belong. Additionally, this plugin also automatically generates a sitemap.xml file for Google and other search engines.

WygWam ($35)

Although our developers prefer Markdown for writing page content, it is silly to assume our clients do, too. Most of our clients are familiar with and prefer using Microsoft Word and we want to make it as frictionless as possible for them to control their site. Therefore, it is very important to provide a familiar in-browser experience, integrated into the CMS, that makes a best effort to produce standards compliant markup.

WygWam, a plugin by Pixel and Tonic, does just that. WygWam integrates the popular CKEditor with ExpressionEngine and allows our clients to edit pages and other content using an in-browser rich-text editor akin to Microsoft Word.

FreeForm and FreeForm Spam (Free)

Many of our clients’ websites use online forms to collect all kinds of information. It is important that the CMS be able to collect and store user-submitted information. ExpressionEngine provides a very simple Contact Form implementation out-of-the-box. But FreeForm, a plugin by SolSpace, does a much better job.

FreeForm allows us to define an unlimited number of Form variables and Form templates that are used to capture user-submitted data and send templated notification emails to both the submitter and the receiver. Strongly consider installing FreeForm with FreeForm Spam to help reduce spam submissions.

Image Sizer (Free)

Perhaps one of the most popular ExpressionEngine plugins, Image Sizer allows you to dynamically resize and cache images on-the-fly as they are requested. This turns out to be a very important tool in improving the experience your clients have with their site.

This plugin ensures images fit your design regardless of the original images uploaded to the CMS by the client. While it might be a big deal for those of us that do this for a living, dealing with image dimensions, aspect ratios and resizing is a big hassle for most people. Installing this plugin will make your clients’ lives easier and give them a lot more confidence to go in and edit their site.

CP Analytics (Free)

As its name suggests, the CP Analytics plugin presents the website’s Google Analytics data on the ExpressionEngine CMS dashboard. This allows our clients to quickly see how their site is reaching their customers and constituents.

NSM .htaccess Generator ($14.95)

This helpful ExpressionEngine plugin, also from Leevi Graham, removes the “index.php” from ExpressionEngine CMS page URLs to create more user-friendly page URLs. This plugin is more an aesthetic preference than a necessity.

Securing ExpressionEngine

ExpressionEngine, out-of-the-box, is quite secure. However, there are several additional steps you can take to further harden ExpressionEngine website security.

Secure your server

If you are managing your own web server, you can require key-based login and disable password-based login. This is perhaps the most beneficial security precaution you can take.

Secure your database

If your MySQL server is running on the same machine as your web server, only allow connections from localhost. This prevents potential remote attempts to hack your database.

Rename the system directory

When you install ExpressionEngine, you have the opportunity to rename your ExpressionEngine system directory. Using a different name from the default value will provide a little bit of additional security through obfuscation.

Move your system directory above the web document root

Should your web server configuration become corrupted or malformed, PHP files beneath your document root could potentially be served as plain text. Moving your ExpressionEngine system directory above your document root ensures sensitive information is not inadvertently disclosed.

Process form data in Secure Mode

Processing form data in secure mode deters automated spam submission attacks. It does add one additional database query, though.

Disable “Allow Dictionary Words as Passwords”

This prevents users from using plain-text dictionary words as account passwords. Inconvenient, maybe. Secure, yes.

Consider Using CAPTCHAs

ExpressionEngine can generate and require CAPTCHAs for all form submissions. This will help prevent spam. You can enable and configure CAPTCHAs in “Admin > System Preferences > Captcha Prefences”.

Additional security preferences

You can read more about ExpressionEngine’s numerous security preferences in the online User Guide.

Optimize ExpressionEngine for Performance

ExpressionEngine will perform admirably by default. However, should your website be Fireballed, there are a few additional tweaks you can make to improve ExpressionEngine’s performance.

Query caching

When Query Caching is enabled, ExpressionEngine will log database queries to a text file. When the same query is performed, and if a related cache file exists, the cache will be used instead of calling the database. You can enable Query Caching at “Admin > System Administration > Database Settings”.

Note: It was brought to my attention in the comments that Query Caching is disabled by default and discouraged by EllisLab; native MySQL caching is much faster than using PHP for query caching. But this feature is available if needed.

Fragment caching

Expression templates support fragment caching. Simply add the “cache” and “refresh” attributes to any ExpressionEngine template tag. The “refresh” attribute value is an integer (measured in minutes), and the content of the given tag will be cached for the specified length of time. Fragment caching is preferable when you need to cache parts of a page that contains other dynamic data that may change with each request.

{exp:channel:entries channel="news" limit="10" cache="yes" refresh="10"}
    This content will be cached for 10 minutes
{/exp:channel:entries}

Template caching

If you prefer a more heavy-handed approach, you can cache entire pages. You can enable Template Caching and specify the refresh time period in the Templates page’s preferences panel. It doesn’t make sense to use both Fragment Caching and Template Caching together for the same page; choose one or the other but not both.

Disable unnecessary query data

When you output channel data using the {exp:channel:entries} tag, ExpressionEngine will query for a lot of information by default; some or most of the queried data may be unnecessary. You can disable certain query data to improve the performance of this tag. The items you can disable are:

  • categories
  • category_fields
  • custom_fields
  • member_data
  • pagination

To disable any of the above items, add a disable attribute to your Channel Entries tag and separate each item name with a “|” pipe.

{exp:channel:entries disable="categories|member_data|pagination" ... }

Perform Front-End Optimizations

While not specific to EE, it is important to optimize your template code so that the site can perform its best. Generally this means following guildlines like YSlow or Google PageSpeed and testing either in Firebug or with something like GTMetrix.

A trick for making this process easier is to automate the minification and compilation of all of your css and js. We really enjoy having this functionality in HiFi. A beta EE plugin for this is Minify.ee by Leevi Graham. His other work is great so we expect this to be as well.

Prepare ExpressionEngine for Clients

By this point in the project, we have installed ExpressionEngine, purchased and installed plugins, hardened security, and optimized the CMS. The project is nearly complete and ready to deliver to the client. Before we do, though, we need to do some house-cleaning.

Create a “Clients” member group

First, we create a new Member Group specifically for the client. This allows us to expose only certain elements of the ExpressionEngine CMS to our clients.

ExpressionEngine Member Groups

Create a new Member Group called “Clients” at “Members > Member Groups”. Then, still on the “Members > Member Groups” screen, click “Edit Group” for the “Clients” member group. The most important settings we use for simplifying client access to the EE CMS are:

  • Control Panel Area Access
    • Can access PUBLISH page (YES)
    • Can access EDIT page (YES)
    • Can access TEMPLATES page (NO)
    • Can access COMMUNICATE page (YES)
    • Can access MODULES page (YES) †
    • Can access ADMIN page (NO) ††

(†) Although the client can access this panel, it will be hidden from view
(††) You may want to enable this depending on your client’s needs

There are many other access settings you can configure based on your client’s needs. If you will have many client users of various roles (ie. editor, department-specfic, etc.) it may also be necessary for you to edit the “Channel Assignment” settings for each client user to only give client users access to certain ExpressionEngine channels.

Simplify page editors

If left with default settings, ExpressionEngine channel page editors will have a lot of tabs and a lot of settings — some of which may be unnecessary. We recomend you make the UI as simple as possible with only the controls that clients need to manage their site.

ExpressionEngine Page Editors

On the “Admin > Channel Administration > Channels” screen, for each channel, click “Edit Preferences”. The most important settings we use for simplifying channel editors for clients are:

  • Administrative Preferences
    • Default Status
    • Default category
    • Select “Allow Comments” button in Publish page by default?
  • Channel Posting Preferences
    • Automatically turn URLs and email addresses into links?
  • Comment Posting Preferences
    • Allow comments in this channel?
    • Enable Captcha for Comment Posting?
    • Moderate Comments?
  • Layout Customization in Publish Page †

(†) Uncheck the items that are not needed for the client to edit content in the given channel.

On each Channel’s New Entry form, we also click “Show Sidebar” and hide unncessary tabs and fields.

Customizing each user’s Control Panel

And last, we customize each user’s Control Panel. To do this, we log into the Control Panel as each user and perform the following changes:

  • Add a new “Pages” tab that links to the Structure Module page
  • Add a new “Forms” tab that links to the FreeForm Module page

You can add additional tabs if you need to provide easy access to other Modules. Because of the changes performed in “Create a clients member group” above, the client user account should not see the “Templates” and “Admin” tabs. Our goal is to provide a simple user interface that only provides the tools required by the client — nothing more.

Moving to the production server

After the client approves the website, it is time to move the website to the production server. After you migrate your database (if necessary), you need to move over the ExpressionEngine CMS files. When you do, ensure file and folder permissions are still correct. You will also need to edit several path settings in the ExpressionEngine control panel.

Log into the ExpressionEngine Control Panel on the new server. The Control Panel styling and images will likely be missing.

General Configuration

ExpressionEngine General Preferences

Navigate to the “Admin > General Configuration” screen and edit the following fields:

  • URL to the root directory of your site
  • URL to your Control Panel index page
  • URL to your “themes” folder

Captcha Configuration

ExpressionEngine CAPTCHA Preferences

If the project uses CAPTCHAs, navigate to the “Admin > Security and Privacy > Captcha Preferences” screen and edit the following fields:

  • Server Path to Captcha Folder
  • Full URL to Captcha Folder

File Upload Preferences

ExpressionEngine File Upload Preferences

If the project accepts file uploads, navigate to the “Admin > Content Administration > File Upload Preferences” screen and edit the following fields for each File Upload Destination.

  • Server Path to Upload Directory
  • URL of Upload Directory

Template Preferences

ExpressionEngine Template Preferences

If the project saves template files to the filesystem, navigate to the “Design > Templates > Global Preferences” screen and edit the following fields:

  • Basepath to Template File Directory

Provide clients with a “How-To” guide

When you hand-off an ExpressionEngine website to a client, you may have included one-on-one training in your contract. If not, it will be helpful to provide the client with the resources to learn and use ExpressionEngine to edit his website. A great resource for clients is the HeadSpace Client Guide (PDF) for ExpressionEngine 2. This guide provides an easy-to-read overview of how to use ExpressionEngine to edit website content.

http://headspacedesign.ca/blog/entry/the-expressionengine-2-client-guide-is-here/

Hopefully this guide has been a helpful look into the process that we go through when we set up an ExpressionEngine site. By taking our time and ensuring that we configure and install everything necessary, we’re able to deliver a high performing site that our clients are able to manage.

Explaining Javascript Templates and How We're Using Them in HiFi

6 Comment(s) | Posted | by Josh Lockhart |

What are Javascript templates?

Our HiFi web publishing platform is built for maximum speed and efficiency. An important contributing factor to HiFi’s UI performance is Javascript templates. Popularized by John Resig, Javascript templating is a fast and efficient technique to render client-side templates with Javascript using a JSON data source. The template is HTML markup, peppered with tags that will either insert variables or run programming logic.

Resig's Javascript templates can be stored in a variable or encapsulated within a <script id="templateId" type="text/html"></script> element. HTML markup in a <script> element is effectively ignored by the web browser, yet still accessible by Javascript or jQuery using the <script> element’s id attribute.

Resig’s initial proof of concept has since been converted to a convenient jQuery plugin that couldn't be easier to use:

<script type="text/html" id="template">
<li id="{%= id %}">{%= content %}</li>
</script>
<ul id="destination"></ul>

The example Javascript template above may be rendered and appended to an unordered list element with the following Javascript.

var listItem = $('#template').render({
	id: '1',
	content: 'List item content'
});
$('#destination').append(listItem);

Why do we use Javascript templates?

Javascript templates are rendered and cached client-side without sending an HTTP request to the server — in other words, they’re lightning fast. Speed benefits aside, Javascript templates also afford us the opportunity to abstract the HiFi administrative UI into a simple Javascript API.

The Javascript API separates concerns: HTML, CSS and Javascript are logically separated in their own files. This keeps markup and CSS out of our Javascript logic, so adding functionality to the HiFi UI does not require adjusting the HTML or CSS, and vice-versa. This separation of concerns also enforces human user interface guidelines by encapsulating the UI’s markup and style from the UI’s functionality; we can quickly add new HiFi features and functionality that integrate into the existing UI style seamlessly and automatically.

If or when we do decide to modify the the HiFi UI’s markup or style, each UI element has its own Javascript template with an associated stylesheet making such changes a breeze.

How do we use Javascript templates?

As mentioned above, each UI element has it’s own Javascript template and associated stylesheet. When the UI is rendered in the browser, all UI elements’ templates are made available in a Javascript templates[] array and may be rendered on-demand.

Additionally, the HiFi platform exposes a hifi() Javascript closure, allowing simple communication to and from the HiFi API (a topic for another day, just assume it works). As an example, let’s say you are using the HiFI platform to moderate your website’s user-submitted comments. You need to reply to a comment. You enter your reply in the provided form and click the Reply button. Behind the scenes, your reply is sent to the HiFi server with this code (simplified for demonstration purposes).

var newComment = {
	type: 'comment',
	author: 'Josh Lockhart', 
	content: $('#replyTextArea').val()
};
hifi({ type: 'page', id: pageId }).append(
	newComment,
	function() {
		var commentLineItem = $('#commentTemplate').render(commentData);
		$('#comments').append(commentLineItem);
	}
);

This example code submits your new comment’s data to the HiFi API, and in the callback function, renders a comment template using your reply as the data source. The rendered template is appended to the UI.

We really enjoy client-side Javascript templating

Javascript templates isolate UI interaction client-side with minimal HTTP requests to the server. This practice drastically improves the speed of the UI and the overall performance of the HiFi platform. Additionally it has made the development process much simpler and pleasant. We have all of our markup in one place and we never have to deal with AJAX code that is tangled up with HTML creation.

How we made a jQuery Events Calendar for The Salvation Army

7 Comment(s) | Posted | by Josh Lockhart | |

New Media Campaigns recently developed and launched http://www.keepthebellringing.org for the Salvation Army of Wake County. Springboard Eydo, an agency partner, provided the excellent layout design that we incorporated into our content management system. Although the Salvation Army's website features are many, I want to focus on one feature in particular — the calendar.

Overview

Calendar

The Salvation Army of Wake County website includes a monthly calendar that may be paginated into the past or future. Each calendar cell (representing one day of the current month) contains a list of events, similar to Apple iCal or Google Calendar. When you click an event link, a modal window appears featuring details about the selected event. This calendar is the result of carefully woven design, HTML markup, CSS styles, and jQuery scripts.

The Markup

We developed The Salvation Army website using the XHTML 1.0 Transitional DTD. In most cases, the markup will validate as XHTML 1.0 Strict. However, we opted for the Transitional DTD since a majority of the website's content is provided and controlled by the client. Here is a sample of the calendar's markup.

<ul id="full-calendar-legend">
	<li class="all" id="all-events">All Events</li>
	<li class="orange" id="community-center-events">Community Center</li>
	<li class="blue" id="church-events">Church</li>
</ul>
<table id="calendar">
	<thead>
		<tr class="calendar-heading-days">
			<th title="Sunday">Sun</th>
			<th title="Monday">Mon</th>
			<th title="Tuesday">Tue</th>
			<th title="Wednesday">Wed</th>
			<th title="Thursday">Thu</th>
			<th title="Friday">Fri</th>
			<th title="Saturday">Sat</th>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td class="calendar-cell calendar-cell-weekend first"></td>
			<td class="calendar-cell"><span class="calendar-cell-date">1</span></td>
			<td class="calendar-cell"><span class="calendar-cell-date">2</span></td>
			<td class="calendar-cell"><span class="calendar-cell-date">3</span>
				<ul class="calendar-cell-events">
					<li class="all orange">
						<a href="/page/summer-basketball-registration" title="See details for this event" class="thickbox calendar-event-orange">Summer Basketball Registration April 6-June 5</a>
					</li>
				</ul>
			</td>
			<td class="calendar-cell"><span class="calendar-cell-date">4</span></td>
			<td class="calendar-cell"><span class="calendar-cell-date">5</span></td>
			<td class="calendar-cell"><span class="calendar-cell-date">6</span></td>
			<td class="calendar-cell calendar-cell-weekend last"><span class="calendar-cell-date">7</span></td>
		</tr>
	</tbody>
</table>

Yes, we used a table for the calendar markup. A calendar is, in fact, tabular data. Week days (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday and Saturday) are table columns, and days of the month are table cells. We added meta annotations to the calendar markup in the form of class attributes. For example, we annotated table cells with classes like calendar-cell and calendar-cell-weekend.

The CSS

Our CSS is clean and easy to understand. This simplicity is the result of well-annotated HTML markup. By annotating our HTML markup, we can quickly and easily isolate and style the appropriate calendar elements.

#calendar{ width: 800px; font-family: helvetica, arial, verdana, sans-serif; font-size: 0.917em; margin: 40px 30px 0 30px; }
#calendar thead tr.calendar-heading-days th{ text-align: center; font-weight: normal; }
#calendar tbody tr td{ width: 100px; height: 80px; border-top: 1px solid #b8b096; border-right: 1px solid #b8b096; }
#calendar tbody tr td.last{ border-right: none; }
#calendar tbody tr td.today{ background: #b8b096; }
#calendar tbody tr td ul{ margin: 0 0.5em; }
#calendar tbody tr td a{ color: #fff; text-decoration: none; font-size: 10px; color: white; line-height: 11px; display: block; width: 100%; margin: 2px 0; padding: 1px 2px; }
#calendar tbody tr td a.calendar-event-blue{ color: blue; background: blue; }
#calendar tbody tr td a.calendar-event-red{ color: red; background: #bb2f2d; }
#calendar tbody tr td a.calendar-event-orange{ color: #d53b27; background: #FF6600; }
#calendar tbody tr td span.calendar-cell-date{ display: block; text-align: right; padding: 0.25em 0.5em; }

The Javascript

Popup

The calendar behavior was created using jQuery and Thickbox. Here is the relevant snippet from the HTML markup above.

<a class="thickbox calendar-event-orange" title="See details for this event" href="/page/summer-basketball-registration">Summer Basketball Registration April 6-June 5</a>

We annotated the hyperlink with the thickbox class. When the link is clicked, a Thickbox modal window appears and loads the referenced resource with AJAX. We also provided a mini calendar that is used throughout the rest of the Salvation Army website. You can see this mini calendar on the home page in the left column.

Sidebar mini calendar

Like it's larger version, the mini calendar provides a small color-coded link in the appropriate table cell for each event. Each event link's color represents a specific calendar, as described in the calendar legend. When a color-coded link in the calendar legend is clicked, all relevant events in the mini calendar are shown; events with other colors are hidden. We relate events visually with colors and programatically with class annotations. For example, each link in the calendar legend has a class attribute similar to <li class="blue">...</li>. Each event link in the mini calendar has a class attribute similar to <li class="all blue">...</li>. Class annotations create a parent-child relationship between calendar and event; class annotations also establish calendar view states using the following jQuery script. We referenced the color class attribute in our CSS style sheet to customize the calendar appearance. We also referenced the color class in our Javascript (see below) to customize behavior.

function showMainEvents(eventClass){
	$('.calendar-cell-events .all').hide();
	$('.calendar-cell-events .' + eventClass).show();
}
$('#full-calendar-legend li').click(function(){
	showMainEvents($(this).attr('class'));
});

Merging design and programming

The Salvation Army website and calendar demonstrates how New Media Campaigns merges design and programming to create stellar user experiences. Head over to our portfolio to view more of our work.

How to produce XHTML 1.0 Strict markup with TinyMCE

13 Comment(s) | Posted | by Josh Lockhart | |

TinyMCE is a WYSIWYG rich-text editor implemented with Javascript. Many web developers embed TinyMCE within content management systems to provide clients an easy way to create and edit content. In fact, we use TinyMCE in our own content management system. However, it is dangerous to provide users a powerful content editing tool like TinyMCE without adequate training. Invalid markup added to TinyMCE can destroy an otherwise perfect website.

I lose countless development hours correcting invalid markup created within TinyMCE. A majority of the invalid markup is created when users copy and paste text from Microsoft Word or Microsoft Office; users unknowingly paste an inordinate amount of hidden Microsoft meta-data into TinyMCE that causes rendering errors in various web browsers (the current stable version of TinyMCE does not properly remove this meta-data even if using TinyMCE's Paste from Word feature). This is not the user's fault. It is unreasonable to expect a user to know how to write valid XHTML. I needed a solution using TinyMCE to convert user input into valid XHTML 1.0 Strict markup. After hours of research, here is what I found.

Project Setup

For this tutorial, I assume you are using the following directory structure:

[PROJECT ROOT]
-->index.html
-->styles.css
-->js/

Install TinyMCE

  1. Download TinyMCE to your computer
  2. Extract the TinyMCE ZIP archive into this project's js/ directory.
  3. The final directory structure should look like this:
[PROJECT ROOT]
-->index.html
-->styles.css
-->js/
---->tinymce_3_2_4_1/
------>jscripts/
-------->tiny_mce/

When you unzip the TinyMCE file, the TinyMCE directory may be named differently than shown above.

Prepare the HTML File

We will start with simple HTML and CSS files. index.html uses the XHTML 1.0 Strict doctype. It has a simple form with a textarea. styles.css contains some simple styles for TinyMCE. We will reference this CSS file later. To save time, go ahead and grab the HTML source code and CSS styles. Paste the HTML source code into index.html. Paste the CSS styles into styles.css.

Initialize TinyMCE

First, add this line into the HEAD of index.html to import the TinyMCE library.

<script src="js/tinymce_3_2_4_1/jscripts/tiny_mce/tiny_mce.js" type="text/javascript"></script>

Be sure the path to tiny_mce.js is accurate based on the TinyMCE file you downloaded and extracted. Next, we need to initialize TinyMCE. Add this code immediately below the aforementioned code.

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced"
});
</script>

This will convert the textarea within index.html into a TinyMCE editor. You can view your progress by opening index.html in a web browser. If you are using the Safari web browser on Mac OS X, use this code instead. Be sure the Safari plugin is loaded for all subsequent examples, too.

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	plugins:"safari"
});
</script>

Our Objectives

  1. Ensure client input is converted into XHTML 1.0 Strict markup
  2. Remove unused classes from markup
  3. Remove empty HTML elements**
  4. Remove Microsoft meta-data
  5. Encode HTML entities (<,>,&)

**We do not want to remove all empty elements. Blank div elements are sometime used to place dynamic content. In this tutorial we will only remove empty p, em, and strong elements.

Objective 1: XHTML 1.0 Strict Markup

First, we tell the TinyMCE editor to use the XHTML 1.0 Strict doctype. To do so, we use the doctype parameter when initializing TinyMCE. The TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>"
});
</script>

The doctype parameter accepts a string value that is the entire doctype you wish to use. We also need to define a white list of valid XHTML elements and attributes. Any elements and attributes not defined in our white list should be removed from the TinyMCE editor. We tell the TinyMCE editor what elements and attributes are valid by setting the valid_elements parameter to a string. This string should adhere to an expected syntax. The syntax is beyond the point of this article, but you can read more about this syntax on the TinyMCE Wiki. I found a preset XHTML white list on TinyMCE's Wiki. I tweaked this preset white list to ensure it conformed to the Strict doctype. My modified white list should be added to the TinyMCE initialization code like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements : ""
	+"a[accesskey|charset|class|coords|dir<ltr?rtl|href|hreflang|id|lang|name"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|rel|rev"
	  +"|shape<circle?default?poly?rect|style|tabindex|title|type],"
	+"abbr[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"acronym[class|dir<ltr?rtl|id|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"address[class|align|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"area[accesskey|alt|class|coords|dir<ltr?rtl|href|id|lang|nohref<nohref"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup"
	  +"|shape<circle?default?poly?rect|style|tabindex|title|target],"
	+"base[href|target],"
	+"basefont[color|face|id|size],"
	+"bdo[class|dir<ltr?rtl|id|lang|style|title],"
	+"big[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"blockquote[cite|class|dir<ltr?rtl|id|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title],"
	+"body[class|dir<ltr?rtl|id|lang|link|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onload|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|onunload|style|title],"
	+"br[class|id|style|title],"
	+"button[accesskey|class|dir<ltr?rtl|disabled<disabled|id|lang|name|onblur"
	  +"|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|tabindex|title|type"
	  +"|value],"
	+"caption[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"cite[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"code[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"col[align<center?char?justify?left?right|char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|span|style|title"
	  +"|valign<baseline?bottom?middle?top|width],"
	+"colgroup[align<center?char?justify?left?right|char|charoff|class|dir<ltr?rtl"
	  +"|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|span|style|title"
	  +"|valign<baseline?bottom?middle?top|width],"
	+"dd[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"del[cite|class|datetime|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"dfn[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"dir[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"div[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"dl[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"dt[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"em/i[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"embed[height|src|type|width|class|contenteditable|contextmenu|dir|draggable|id|irrelevant|lang"
	+"|ref|registrationmark|tabindex|template|title|onabort|onbeforeunload|onblur|onchange|onclick|oncontextmenu"
	+"|ondblclick|ondrag|ondragend|ondragcenter|ondragleave|ondragover|ondragstart|ondrop|onerror|onfocus|onkeydown"
	+"|onkeypress|onkeyup|onload|onmessage|onmousedown|onmousemove|onmouseover|onmouseout|onmouseup|onmousewheel|onresize"
	+"|onscroll|onselect|onsubmit|onunload],"
	+"fieldset[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"form[accept|accept-charset|action|class|dir<ltr?rtl|enctype|id|lang"
	  +"|method<get?post|name|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onreset|onsubmit"
	  +"|style|title],"
	+"h1[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h2[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h3[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h4[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h5[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"h6[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"head[dir<ltr?rtl|lang|profile],"
	+"html[dir<ltr?rtl|lang|version],"
	+"img[alt=''|class|dir<ltr?rtl|height"
	  +"|id|ismap<ismap|lang|longdesc|name|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|src|style|title|usemap|width],"
	+"input[accept|accesskey|alt"
	  +"|checked<checked|class|dir<ltr?rtl|disabled<disabled|id|ismap<ismap|lang"
	  +"|maxlength|name|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onselect"
	  +"|readonly<readonly|size|src|style|tabindex|title"
	  +"|type<button?checkbox?file?hidden?image?password?radio?reset?submit?text"
	  +"|usemap|value],"
	+"ins[cite|class|datetime|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"isindex[class|dir<ltr?rtl|id|lang|prompt|style|title],"
	+"kbd[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"label[accesskey|class|dir<ltr?rtl|for|id|lang|onblur|onclick|ondblclick"
	  +"|onfocus|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title],"
	+"legend[accesskey|class|dir<ltr?rtl|id|lang"
	  +"|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"li[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title|type"
	  +"|value],"
	+"link[charset|class|dir<ltr?rtl|href|hreflang|id|lang|media|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rel|rev|style|title|type],"
	+"map[class|dir<ltr?rtl|id|lang|name|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"meta[content|dir<ltr?rtl|http-equiv|lang|name|scheme],"
	+"noscript[class|dir<ltr?rtl|id|lang|style|title],"
	+"object[archive|class|classid"
	  +"|codebase|codetype|data|declare|dir<ltr?rtl|height|id|lang|name"
	  +"|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|standby|style|tabindex|title|type|usemap"
	  +"|width],"
	+"ol[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|start|style|title|type],"
	+"optgroup[class|dir<ltr?rtl|disabled<disabled|id|label|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"option[class|dir<ltr?rtl|disabled<disabled|id|label|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|selected<selected|style|title|value],"
	+"-p[class|dir<ltr?rtl|id|lang|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|style|title],"
	+"param[id|name|type|value|valuetype<DATA?OBJECT?REF],"
	+"pre/listing/plaintext/xmp[align|class|dir<ltr?rtl|id|lang|onclick|ondblclick"
	  +"|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout"
	  +"|onmouseover|onmouseup|style|title|width],"
	+"q[cite|class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"s[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"samp[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"script[charset|defer|language|src|type],"
	+"select[class|dir<ltr?rtl|disabled<disabled|id|lang|multiple<multiple|name"
	  +"|onblur|onchange|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|size|style"
	  +"|tabindex|title],"
	+"small[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"span[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"strike[class|class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title],"
	+"strong/b[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"style[dir<ltr?rtl|lang|media|title|type],"
	+"sub[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"sup[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title],"
	+"table[bgcolor|border|cellpadding|cellspacing|class"
	  +"|dir<ltr?rtl|frame|height|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|rules"
	  +"|style|summary|title|width],"
	+"tbody[char|class|charoff|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"td[abbr|axis|bgcolor|char|charoff|class"
	  +"|colspan|dir<ltr?rtl|headers|height|id|lang|nowrap<nowrap|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rowspan|scope<col?colgroup?row?rowgroup"
	  +"|style|title|valign<baseline?bottom?middle?top|width],"
	+"textarea[accesskey|class|cols|dir<ltr?rtl|disabled<disabled|id|lang|name"
	  +"|onblur|onclick|ondblclick|onfocus|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|onselect"
	  +"|readonly<readonly|rows|style|tabindex|title],"
	+"tfoot[char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"th[abbr|axis|bgcolor|char|charoff|class"
	  +"|colspan|dir<ltr?rtl|headers|height|id|lang|nowrap<nowrap|onclick"
	  +"|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown|onmousemove"
	  +"|onmouseout|onmouseover|onmouseup|rowspan|scope<col?colgroup?row?rowgroup"
	  +"|style|title|valign<baseline?bottom?middle?top|width],"
	+"thead[char|charoff|class|dir<ltr?rtl|id"
	  +"|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup|onmousedown"
	  +"|onmousemove|onmouseout|onmouseover|onmouseup|style|title"
	  +"|valign<baseline?bottom?middle?top],"
	+"title[dir<ltr?rtl|lang],"
	+"tr[abbr|bgcolor|char|charoff|class"
	  +"|rowspan|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title|valign<baseline?bottom?middle?top],"
	+"tt[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"u[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress|onkeyup"
	  +"|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style|title],"
	+"ul[class|compact<compact|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown"
	  +"|onkeypress|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover"
	  +"|onmouseup|style|title|type],"
	+"var[class|dir<ltr?rtl|id|lang|onclick|ondblclick|onkeydown|onkeypress"
	  +"|onkeyup|onmousedown|onmousemove|onmouseout|onmouseover|onmouseup|style"
	  +"|title]"
});
</script>

For you XHTML 1.0 Strict evangelists, I did include embed in the white list. Technically, embed is not included in the XHTML 1.0 Strict DTD. However, embed is necessary to provide support for older browsers. Also, embed is available in the HTML 5 DTD. We now have a fully functional TinyMCE editor that enforces XHTML 1.0 Strict markup (with the one exception). However, we still have four remaining objectives.

Objective 2: Remove unused classes from markup

It is recommended that you provide a CSS file to style the contents of the TinyMCE editor. By styling the contents of the TinyMCE editor, the user will see what the final styled code will look like while using TinyMCE. We tell TinyMCE to use our CSS file by specifying the content_css parameter during TinyMCE initialization. This code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css"
});
</script>

NOTE: the valid_elements value is omitted for the sake of brevity. Be sure you still specify the valid elements in your own code! The value of content_css is a path to your CSS file relative to the current HTML file. Next, we tell TinyMCE to remove all unused classes from the client's markup that are not present in our CSS file. This code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true
});
</script>

For example, if a client provided this markup:

<div class="myClass">...</div>

And our CSS file did NOT include the CSS class selector .myClass{}, then TinyMCE would remove the unused class and output this code:

<div>...</div>

Objective 3: Remove empty HTML elements

We will now remove empty p, em, and strong elements from the TinyMCE editor with the cleanup_callback parameter. This parameter's value is the name of a custom Javascript function. Let's call this function myCustomCleanup and define it now.

function myCustomCleanup(type,value){}

This function accepts two parameters: the type of callback (ignored in this tutorial, but you can read more about this parameter on the TinyMCE Wiki), and the final HTML markup of the TinyMCE editor. Let's further define the implementation of the myCustomCleanup function.

function myCustomCleanup(type,value){
	var value = value + ""; //Ensure value is a string
	return value.replace(/<(p|em|strong)(>|[^>]*>)(\\s)*<\\/\\1>/ig,"");		
}

This function uses a Regular Expression to remove all empty p, em, and strong elements from the TinyMCE editor's markup. We tell TinyMCE to call our custom cleanup method during TinyMCE initialization. Our new TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true,
	cleanup_callback : "myCustomCleanup"
});
</script>

Objective 4: Remove Microsoft Meta-data

From my own experience, most Microsoft Word and Microsoft Office meta-data is contained within HTML comments. We will define a new function to remove HTML comments from the TinyMCE editor markup. This function looks like this:

//Citation: http://www.faqts.com/knowledge_base/view.phtml/aid/21761/fid/53
function removeHtmlComments(source){
	var html = source + ""; //Ensure source is a string
	var regX = /<(?:!(?:--[\\s\\S]*?--\\s*)?(>)\\s*|(?:script|style|SCRIPT|STYLE)[\\s\\S]*?<\\/(?:script|style|SCRIPT|STYLE)>)/g;
	return html.replace(regX, function(m,\$1){ return \$1?'':m; });
}

This method accepts the final HTML markup from the TinyMCE editor and removes all single and multi-line HTML comments except those within script and style elements. We now add this function to our myCustomCleanup method so it is called by TinyMCE. Our new myCustomCleanup method looks like this:

function myCustomCleanup(type,value){
	var value = value + ""; //Ensure value is a string		
	value = value.replace(/<(p|em|strong)(>|[^>]*>)(\\s)*<\\/\\1>/ig,"");
	return removeHtmlComments(value);
}

Note: You can force TinyMCE to run the cleanup_callback function at any time by clicking the Broom icon in the TinyMCE editor toolbar.

Objective 5: Encode HTML Entities

Last, we ensure characters like <, >, and & are encoded into their HTML entity equivalents. For example, & will become &. To do this, we specify the entity_encoding parameter during TinyMCE initialization. Our TinyMCE initialization code looks like this:

<script type="text/javascript">	
tinyMCE.init({
	mode:"textareas",
	theme:"advanced",
	doctype:"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' "
	+ "'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>",
	valid_elements:"omitted for brevity",
	content_css:"styles.css",
	verify_css_classes:true,
	cleanup_callback : "myCustomCleanup",
	entity_encoding : "named"
});
</script>

The Final Result

We now have a TinyMCE installation that produces valid XHTML 1.0 Strict markup regardless of client input. It also removes unused classes from markup, deletes empty HTML elements, strips Microsoft meta-data, and encodes HTML entities! I hope this tutorial gets you moving in the right direction. This tutorial is not meant to be a final implementation. I am still tweaking this implementation to produce even better results. If you see room for improvement, kindly let me know by posting a comment. I'd love to hear your feedback.

Download a ZIP archive containing the final code

HTML Forms: The Right Way(s)

31 Comment(s) | Posted | by Josh Lockhart | |

I recently stumbled upon a Reddit discussion thread on the proper way to write an HTML form. First, there is no one right way to code an HTML form. Well, I should say there is no one right way to code an HTML form assuming you are using semantic and accessible markup. Here are several methods I use (or have seen others use) when writing HTML forms.

Method 1: Semantic Nirvana

<form>
	<fieldset>
		<legend>Legend Name</legend>
		<label for="name">Name</label>
		<input type="text" name="name" />
	</fieldset>
</form>

This example is self-explanatory and employees the recommended form elements provided by HTML. The form element contains a fieldset with a legend. The fieldset contains a label and an associated input field. This method is the purest state of a semantic HTML form, and I encourage you to implement this structure whenever possible. However, if your design requires more structural flexibility, I encourage you to explore the definition list approach.

Method 2: Definition List

<form>
	<fieldset>
		<dl>
			<dt><label for="name">Name</label></dt>
			<dd><input type="text" name="name" /></dd>
		</dl>
	</fieldset>
</form>

This example encapsulates the form and its elements within the context of a definition list. One could argue the semantic accuracy of this approach, but I believe there is some semantic truth to this scenario. A definition term (the "Name") is defined by the user input. This method also remains accessible to assistive technologies by providing a label for the associated input field. Label's allow assistive technologies to associate the label with a form input field. Label's also provide a larger, clickable "focus" area for a form field; when a user clicks on a label, the associated form input field is focused. This method also degrades nicely in text-only web browsers or hand-held devices.

I often use this method when I create forms with labels to the left of their associated input fields, similar to the Sign Up form on Campaign Monitor's website (see Figure A). The DT elements are floated left and assigned a common width. The DD fields are floated right and assigned a common width. The DT fields will clear any previous DT or DD elements using CSS.

Campaign Monitor's Sign Up Form

Figure A

One might consider ordered or unordered lists to accomplish this same structure. However, such an approach would require the markup developer to attach custom class meta information to each LI element to indicate which LI element contains the label and which the form input field. I believe the definition list provides more accurate semantics without additional meta information.

Overall, I prefer this method when I require semantics and design flexibility. This method provides plenty of elements to work with out of the box: form, dl, dt, dd, label, and input. CSS developers will have plenty to work with without extraneous markup.

Method 3: Paragraphs galore!

<form>
	<fieldset>
		<p>
			<label for="name">Name</label><br />
			<input type="text" name="name" />
		</p>
	</fieldset>
</form>

I mention this method only because I see many developers abuse it too often. This method encapsulate a label and its input field within a paragraph. The result is a pleasant visual separation when rendered in most desktop web browsers like Firefox, Safari, or Internet Explorer. However, the good ends there. This method is semantically flawed; a form and its elements are not a series of paragraphs. Also, one should not mix markup with presentation (as this method does). After all, not everyone uses visual web browsers (Google bots and screen readers come to mind).

Method 4: What about columns?

Many HTML forms are presented in a column layout. This is an example where aesthetics take precedence over semantics. Which is fine. Semantics are only rules, and rules can be broken, right? In this case, I encourage you to at least try to maintain some form of semantic value.

Method 4a: Use DIVs with appropriate class meta data

One method to implement a column layout is to use DIVs.

<form>
	<fieldset>
		<legend>Legend Name</legend>
		<div class="left-column">
			<label for="name">Name</label>
			<input type="text" name="name" />
		</div>
		<div class="middle-column">
			<label for="email">Email</label>
			<input type="text" name="email" />
		</div>
		<div class="right-column">
			<label for="phone">Phone</label>
			<input type="text" name="phone" />
		</div>
	</fieldset>
</form>

Each DIV may be floated left to simulate columns, and the class attributes convey the intended semantics.

Method 4b: Tables: Those are bad, right?

Tables are great, assuming you use them to display tabular data. Tables were given a bad reputation when print designers first started developing websites in the late 90s and early 2000s. Print designers used tables for lack of better layout tools.

If you have no other option, one might argue that a form could be a table (try not to smirk when you explain this to your resident web standards evangelist). Such a form may look like this:

<form>
	<fieldset>
		<table border="0">
			<thead>
				<tr>
					<th>Column 1</th>
					<th>Column 2</th>
					<th>Column 3</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>
						<label for="one">One</label>
						<input type="text" name="one" />
					</td>
					<td>
						<label for="two">Two</label>
						<input type="text" name="two" />
					</td>
					<td>
						<label for="three">Three</label>
						<input type="text" name="three" />
					</td>
				</tr>
			</tbody>
		</table>
	</fieldset>
</form>

At this point, the form markup is becoming convoluted and less semantic. Consider this method a last resort, and only if you require a form with multiple columns.

So which method do I choose?

There are many more ways to write HTML forms. As I said before, there is no one right way to write HTML forms. If your form is structured with semantic and accessible markup, you can't go wrong. For further reading on this subject, I strongly encourage you to pick up a copy of "Web Standards Solutions: The Markup and Style Handbook" by Dan Cederholm (Apress, 2004; ISBN 1590593812).