Readme.txt or readme.html?

7 08 2007

I’ve suddenly found myself pondering whether to use plain text or simple HTML for the “readme” files, licence texts and other such simple documents in my “ObMimic” software product.

Traditionally, such documents are delivered as plain text files, and this still seems to be how it’s usually done. In the past, this was the safest way to ensure that these files would be readable on all systems, whatever tools the user might or might not have. But I’m not sure that’s still the best approach.

Maybe it’s time to start using simple HTML for these files. HTML and browsers are now so ubiquitous that it’s hard to imagine not being able to view HTML files (at least, provided the HTML is kept clean and simple without javascript or anything fancy).

I guess there could be mobile and embedded environments where plain text might still be necessary or preferable, but I’d kind of expect to know when this is going to be relevant, and even then I’m not sure HTML for these particular files would necessarily be a problem.

So, what are the main pros and cons?

  • First of all, even plain text isn’t without it’s problems. The most obvious of which is line-endings: for anything cross-platform, what line-endings do you use? I’m forever encountering “readme” files under MS Windows that have UNIX-style line-endings and need to be opened in something that sorts this out. OK, it’s not a big problem, but with HTML this just goes away – no decision to make, no build-time adjustments to make, no worries about how it will really look on the other platforms.
  • Similarly, for file extensions “.html” seems a more reliable cross-platform bet than “.txt” or the absence of an extension.
  • I’d also expect HTML to be preferable for accessibility, character-set issues, and general user control over how the text is displayed (e.g. font size).
  • More generally, even a simple HTML page usually looks better than a text file.
  • In addition, using HTML means you can have real links to other documentation, web-sites etc (even if these didn’t work for some reason, you’d be no worse off than if you just had the plain text).
  • The risk with HTML is that if an organization starts using HTML files for this, they will let them get progressively more and more complex until they break and you can’t read them (which could be as silly as black text on a black background – it does happen, I used to regularly get a marketing e-mail of this kind from my former ISP!). If everybody starts doing this, you can guarantee that someday you’ll encounter a “readme” file that you can’t read without hacking.
  • It’s conventional for these files to be plain text, so to at least some extent that’s what everyone expects.

There doesn’t seem to be any killer reason to decide this either way. On the whole I’m inclined to switch to using HTML for these files so as to avoid the “line endings” problem and provide slightly better-looking and more “accessible” content, despite plain text files being more conventional.

Now, I’m not advocating that we all fire-up Dreamweaver and the like and start churning out lots of fancy “readme” files with all sorts of javascript and animations and AJAX stuff and clever CSS hacks. If using HTML for these files, it needs to be kept as simple and safe as possible – basically it should just have the minimal tags necessary to turn it into reasonable, well-formed HTML, with nothing that could possibly render it unreadable. Any organisation without the discipline or quality control to do this should stick to plain text files, but I don’t see that as a reason for me not to do it. At the risk, of course, of looking like an idiot if I ever get it badly wrong – but that’s true of pretty much everything!

Personally I’d want to stick to hand-written, basic HTML with just headings, text, simple lists and some simple links – the same kind of thing as used within Javadoc comments. And I’d want to keep it XHTML compliant, with validation as part of the build process. (Ideally this maybe needs an explicitly-defined, guaranteed “safe” subset of XHTML – something else to ponder…). Probably accompanied by an optional stylesheet, also kept as simple and safe as possible.

Or are there any good arguments for sticking to plain text files? Can anyone see any other pros and cons?



3 responses

7 08 2007

I would prefer README.html to readme.html.

You may want to look at some other projects using HTML for their READMEs:
Mozilla SpiderMonkey
W3C libwww
W3C CSS validator

8 08 2007

Why not include both of them? Sometimes you are in a command line interface and need to read how to install the application, for example.

You can write your .html file , test it in a browser, copy the text and paste it in a .txt file.

8 08 2007

Thanks for the comments – some good examples and ideas.

I hadn’t spotted the “command line” scenario, and does have both html and txt for both “readme” and “licence”. Thought that might look odd or confusing in a directory list, but it looks ok when there aren’t many other files.

So I might go with “both”, provided I can easily automate production of the text from the html during build (maybe a quick XSLT, given that the html will be very simple/limited/clean).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: