File Formats Comparison
Graphic File Formats
by Paul Arnote
Graphic files come in all kinds of different formats. For some, it's a difficult task to know what to expect — performance-wise — how each of the different graphic file formats performs, much less to know which format to use to save their work. Hopefully, this brief overview will help sort out which graphic format is most useful for your use.
First of all, graphic file formats come in two different flavors: raster graphics and vector graphics. Raster-based graphics (also known as "bitmaps") represent a pixel-by-pixel representation of an image. That is, the image is divided up into small "square" pixels, each representing a color in the image. When these are all put together, an image is formed when viewed at a normal viewing distance. Vector-based graphics, on the other hand, use geometrical shapes (lines, polygons, circles, etc.), represented by mathematical equations.
As a result, raster-based graphics do not fare so well when they are scaled. That means, whenever you enlarge or shrink an image that is raster-based to a size different than the original, there will be some compromise in the quality of the final product. (By the way, they shrink much better than they enlarge). If you attempt to enlarge a raster-based graphic too much, you will be able to see each pixel of the original, since each pixel gets enlarged. This "blocky" effect is called pixelation.
Vector-based graphics, however, are very scalable. That is, they can be enlarged or shrunk without any significant loss of quality. This is most evident when you attempt to enlarge a graphic that is vector-based. Since vector-based graphics are based on mathematical formulas, your view will be just as clear when you enlarge it 1000% as it was in its original size.
Before we go any further, this is not a debate (nor is it intended to start a debate) over which type of graphic (raster vs. vector) is best; they both have their place in your graphics arsenal. In fact, you will frequently find some artists using both, working with them both to produce a graphic. But it is important to understand the differences between the two so you can choose which is best for your needs.
So let's start off our discussion with raster graphic formats, which are probably the most widely used of the two graphic types.
Raster-based Graphic Formats
There is some terminology we must cover before getting into each specific file format. Many raster-based formats use compression to be able to store the information efficiently and to reduce the file size. Some raster-based formats use no compression at all. Of course, with no compression, the file takes up a large amount of space, but does not suffer from being saved (re-compressed) again and again.
That compression can be either lossy compression or lossless compression. Lossy compression algorithms take advantage of the limitations of the human eye, and discard "invisible" information. Most "lossy" algorithms offer variable compression levels, and at higher compression levels, the loss of image quality becomes quite noticeable. The loss of quality is commonly referred to as "compression artifact." Attempting to save a graphic file over and over again, using lossy compression, results in what is called "generational degradation." That is, the image is re-compressed each and every time the image is saved. Formats the use lossy compression value file size over picture quality.
Lossless compression, on the other hand, utilizes a compression method that does not diminish image quality. Because image quality is not sacrificed, the file size is not as small as you would get if you used a lossy compression method. When you value image quality over file size, lossless compression is the type of compression you should use.
Graphic files are also classified according to their "color depth." An 8-bit graphic is capable of reproducing a maximum of 256 colors, while a 24-bit graphic is capable of reproducing 16 million colors, or what is commonly referred to as true color.
With a clearer understanding of the terminology involved, let's look at individual file formats. Note that we will only discuss some of the most popular image file formats here. There are many, many more out there, as well as some that are extensions of the ones we will discuss. Trying to cover every image file format out there would not only be inexhaustible, but would also require so much more room than we have available in the PCLinuxOS magazine.
JPEG, JPG (Joint Photographic Experts Group)
The JPEG file format, which was standardized in 1992, is probably the most common and popular file format today. The JPEG format is (mostly) a lossy compression format. You can find JPEG files all over the internet, as well as being used in most of the popular digital cameras (to increase the practicality of digital cameras so more photos can be stored on the limited amount of storage medium). Typical JPEG files can compress images 10:1 with a minimal, unnoticeable loss of image quality.
The advantage of the JPEG file format is that it saves files in a minimal amount of space. At low levels of compression, the sacrifice in image quality is largely unnoticeable, while saving storage space at the same time. The JPEG file format is commonly referred to as a 24-bit graphic file, since it supports the reproduction of 16 million colors.
The disadvantage of the JPEG file format is that it suffers from generational degradation. That is, if you edit a JPEG file, re-save it, edit it again, re-save it, edit it again … and so on … each generation will be re-compressed, and more and more image quality will be sacrificed. To minimize this generational degradation, you can convert the JPEG image to a file format that uses lossless compression, make your edits, then convert back to the JPEG file format after all your editing is finished.
The JPEG file format also does not support transparency (as in transparent backgrounds), and nor does it support animations.
GIF (Graphics Interchange Format)
The GIF (pronounced JIF, as like the famous and popular brand of peanut butter) file format was formally introduced to the computing world in 1987 by CompuServe. The GIF file format sports 8-bit color depth, enabling it to reproduce a maximum of 256 colors.
In 1993, the makers of the patented LZW compression algorithm discovered that the GIF format used the compression algorithm, without paying royalties for its use. CompuServe had used it as the compression algorithm without knowledge that the patent existed. This resulted in an agreement in late 1994 that basically said that all commercial on-line informational systems companies using the LZW compression in the GIF file format to license the use of that technology from Unisys, it's maker and patent holder.
The result was outrage and campaigns sprouted up, urging users to "burn the GIF." In fact, many web sites of the time did stop using the GIF file format, their web masters fearful that they would have to spend some serious money to license the use of LZW. As a result, the PNG file format (see below) was formed as an open source solution to circumvent the licensing fees and restrictions placed on the GIF format.
Fortunately, the US patent on LZW expired in 2003, and in the rest of the world in 2004. As a result, the GIF file format may now be used freely.
The original GIF file format introduced in 1987 was called 87a. Two years later, CompuServe introduced 89a, an enhanced version that added support for multiple images in a stream, interlacing, and storage of application specific metadata. It is the latter version that is commonly used to create many of the common animations we have all seen gracing many different web pages.
The GIF file format is most suited for sharp edged line art with a limited number of colors. It is also an excellent choice for simple animations, or low-resolution film clips. The LZW compression algorithm is lossless in its compression.
The primary disadvantage of the GIF file format is its support for only 256 color palettes. As a result, it is not considered a good choice for displaying photographs, where the increased color depth of the JPEG file format and the PNG file format make the photographs appear much more realistic and life-like.
PNG (Portable Network Graphics)
The PNG file format came about due to the patent of the LZW compression algorithm used in the GIF file format. PNG began its life as an open source replacement for GIF, not only to circumvent the licensing issues of the LZW-laced GIF format, but to also address some of GIF's other shortcomings, namely its limitation to 256 color palettes.
Precursory discussion for the PNG file format standard started via the internet newsgroups in January 1995. By October, 1996, the first PNG specification was released. Then, in 2003, it gained international standard status (ISO/IEC 15948). As a result, nearly all current web browsers can properly display graphics utilizing the PNG file format.
One of the PNG file format's strengths is that it employs lossless compression. That mean, no matter how many times you edit, save, re-edit, re-save, re-re-edit, re-re-save a graphic saved with the PNG file format, there is no quality loss with the image. Thus, the PNG file format does not suffer from the JPEG file format's generational degradation. This makes PNG the perfect file format for the storage of photographs. Many users of the PNG file format will make — and save — their photographic edits in the PNG file format, and then convert the edited PNG file to the JPEG file format to minimize the size of the distributed file.
Even though the PNG file format came about as a replacement for the GIF format, it was decided that the PNG file format would be for single images only. As a result, it does not have the ability to do animations, as the GIF file format does. And, despite being an extensible file format, there is no formal agreement on an animated PNG file format. There are a couple that exist, but they are unofficial and have not been officially adopted.
TIFF (Tagged Image File Format)
The TIFF file format has been with us since the mid-1980s. It was put into place to provide a standard for the then-manufacturers of scanners to use, in an effort to keep each of those said scanner manufacturers from coming up with their own proprietary formats.
The TIFF file format was originally copyrighted by Aldus. But Aldus was purchased by Adobe, who now owns the copyright on the TIFF file format.
The TIFF file format also has provisions for LZW compression, and (as we've already discovered) the compression scheme can be used freely since all the patents expired on LZW in 2004. This makes the TIFF file format lossless. But it can also be considered a lossy file format, too. The TIFF file format can also serve as a container for multiple images, and if those images are of the JPEG file type, then it's considered a lossy file format.
Some digital cameras can save in the TIFF file format, using the LZW compression algorithm to help save space on the storage medium. And, while not widely supported by web browsers, the TIFF file format remains widely accepted as a photographic file standard in the printing business. Additionally, the TIFF file format can handle device-specific color spaces, such as CMYK color separations for use on color printing presses. The TIFF file format is also commonly used by OCR software packages, which in turn produce a monochromatic TIFF image for scanned text pages.
BMP (Windows Bitmap, a.k.a. DIB or Device Independent Bitmap)
The BMP file format became popularized by the Windows operating system, and has been a staple there since Windows 3.0. It is well documented and free from patents, so most any operating system can read and write them.
The BMP file format is uncompressed in most cases, although RLE (Run Length Encoding) compression has been applied to it for special applications. In either case, a file in the BMP file format has a relatively large file size. Because of the lack of compression, the BMP file format is considered lossless (RLE compression is lossless), and they compress very well (down to only 10% of their original size) with external compression routines or utilities, such as ZIP.
RAW
RAW is not a single file format. Rather, it is a family of raw image formats used by some digital camera manufacturers. The raw formats are not standardized, and in many cases, they are poorly documented. In fact, raw formats may differ from one camera manufacturer to another. Most of these raw image formats use lossless or near-lossless image compression, which results in a smaller file size than what is achievable with the TIFF file format on the very same camera.
As a result of the lack of a raw image format standard, many graphic editing programs may accept some or none of them. Indeed, some of the older raw image formats have already been orphaned. Attempts are underway by Adobe, through its Digital Negative specification, to standardize the raw image format to be used for digital cameras.
There are several programs in PCLinuxOS to deal with raster-based graphic files. Probably the most widely known, most versatile, and most powerful program is the GIMP. There is also Krita, a part of the Koffice suite, as well as a host of smaller programs, each with their own host of features and niche uses.
Vector-based Graphic Formats
Unlike with raster-based graphics, there are fewer vector-based file formats. Just as with the raster-based graphic file formats, we will only cover the most common ones here.
CGM (Computer Graphics Metafile)
The CGM file format is designed for 2D vector graphics, raster graphics, and text. It is an international standardized format (ISO/IEC 8632). Like most vector graphic files, the graphical elements of the CGM file format can be specified in a textual source file that can be compiled into a binary file.
Designed to be independent from any particular application, system, platform, or device, CGM provides a means of graphics data interchange for computer representation of 2D graphical information. The CGM file format has been somewhat adopted in the areas of technical illustration and professional design. But it is being superseded by our next vector-based graphic format.
SVG (Scalable Vector Graphic)
The SVG file format has raced to the forefront and has supplanted much of its competition. Under development by the World Wide Web Consortium since 1999, it is an open standard to address the need for a versatile, scriptable, and all-purpose vector format for the web and otherwise. While it does not have a compression scheme of its own, it can be compressed quite well with gzip. Due to the repetitive textual nature of the XML language that makes up the SVG file format, the file can often be compressed to only 20% of its original file size. When gzip is used to compress an SVG file, its file extension is sometimes changed to SVGZ to reflect the compression.
The SVG file format is able to be displayed by all modern web browsers, except Microsoft's Internet Explorer. IE requires a plug-in to be able to display SVG files.
The SVG file format is designed to be extensible, and can be scripted to react to user interaction or for animations. The extensive specifications for functionality are many — far too many to list here. Leave it enough said (for here, anyways) that the scripting functions of the SVG file format's XML scripting even permit it to be able to create web applications.
Just as for raster-based graphics, there are several programs in the PCLinuxOS repository to deal with vector-based graphics. Inkscape is probably the most popular of these. Also popular are Xara Xtreme and Open Office Draw. Even the GIMP can import SVG files as either paths or rasterized bitmap images.
Hopefully, this introductory guide will help you choose which type of graphic is best suited for your needs, and helps demystify the often confusing world of computer graphics.
March onward, and unleash the artist within!