Archive

Archive for the ‘Software Tools’ Category

Converting bitmap images to plain HTML

July 14, 2012 Leave a comment

(note: this is a long story of why I created the tool to covert bitmaps to HTML and how it works. If you just want the tool, go to http://img2html.odednoam.com)

I just got engaged in writing a web application, that in one of its use-cases, sends an e-mail notification to its users. It’s been a while since I last wrote such app, and I was glad to know that the 2012 technology is so advanced that setting up the system that would actually send the e-mail takes 5 minutes (10 years ago it would have taken months, requiring you to re-learn the cryptic configuration files of “sendmail” and other delights).

Now, if there’s something I love about modern software development, is that handling the plumbing  is so easy, that you actually have time left to do what you enjoy: make a product that works beautifully. And in this case, beauty is in being able to send a nicely-formatted HTML e-mail to the users of my app.

Of course, to have a really nice e-mail I wanted to have the app’s logo embedded in the text. That’s a bit of a problem: images are commonly blocked of HTML e-mails. Before mail apps blocked them, they were a common source of annoyances, allowing spammers to track active e-mail addresses (by referencing an image hosted on the spammer’s server) and viruses to abuse bugs in the image display code for infecting PCs and spreading themselves. But for a small logo, representing its bitmap as a table with colored cells should be quite easy.

A quick web search brought up several tools to do that. A site called patternmedia.com has a nice tool that does just that; it creates a table in which each cell represents a pixel of the original. Neil Fraser has took one step further and implemented RLE compression to make the resulting HTML a bit smaller. Fraser had the right idea: the size of the HTML is a big issue (pun intended). converting to plain HTML, you not only give up the compression techniques used in image file formats, but you in fact represent the image in a very inefficient format made for text representation. A 1kb image file could easily become a 100-200kb HTML table. And e-mail messages are expected to be very small in size. For example, mobile mail clients commonly do not download a message if it is over 30kb or so. Even GMail, by default, does not display an e-mail message larger than 128kb.

Without compression, each table cell represents one pixel. The code example draws only 8 pixels.

RLE compression, as Fraser noticed, works very well for tables and is really easy to implement. Simply put, if we have a series of pixels all the same color, we can write the color just once and tell the browser that we have a table cell spreading over multiple columns (using the colspan=”…” attribute). This works extremely well for images that have small variance in color, and in which pixels of certain color tend to stick together. Black text on white background is the perfect example (and in fact, first generation fax machines used a variant of this compression).

With RLE compression, each table cell represents a series of consecutive pixels in the same color. The code example draws a complete line of the original bitmap.

In some images you could get better results by using RLE to merge adjacent pixels below a certain pixel (rather than on its side). The HTML would be a bit less readable, as HTML tables are drawn row-by-row, but in code it’s just as easy to create.

With this compression, images are much smaller, but still too large to fit in an e-mail message. Especially if they have colors, and if they are larger (the dimensions of the logo I wanted to use are 79×139, which is more than 10,000 pixels). So I wrote a better compression in the form of “2d RLE”: find large squares of pixels of the same color, and use both “colspan” and “rowspan” to color them. Ideally, I’d like to find the minimum number of rectangles required to cover the entire image. I couldn’t find an algorithm that does that in reasonable complexity, but a simple heuristic lookup for large rectangles works pretty well.

The tool that does this conversion is available at http://img2html.odednoam.com. It supports:

  • Reading an image file and converting it to HTML
  • A choice of compression: Horizontal RLE, Vertical RLE and 2D RLE
  • Supports image transparency for GIF and PNG images
  • Some further compression techniques, such as using the most common color as the table background to avoid repeating it
  • Choice whether to use CSS styles (better compression but lacks support from some mail clients, including GMail)
  • Choice of whether to use strict HTML, or loose HTML which takes even less space

Known issues (if any of the readers can suggest fixes, please let me know):

  • The algorithm used to find squares in 2D RLE is not optimal, and possibly images could be compressed better.
  • Since compression doesn’t work well for images with too many different pixel colors, I added support for color reduction (quantization) taken from https://gist.github.com/1104622. Many desktop tools have better color reduction, so you might get better quality if you reduce colors on your PC using an image editing program rather than the internal tool.

Complete source code is available under the GPL at https://github.com/odednoam/image2html