by Paul Arnote (parnote)
We've all seen them. We've all encountered them, at some point in our computing history. That dreaded, pesky file that resists deletion, or that is difficult to deal with.
This becomes especially troublesome when you share files with Windows, Mac and Linux users. Each platform has its own separate rules regarding what characters are allowed in filenames. What might be perfectly acceptable on one platform for filenames may be verboten on another.
So, let's take a look at how to deal with these pesky files and filenames.
Grasping the basics
The "basics" vary, depending on the platform you are using. To better understand those basic rules, you have to understand the ASCII tables.
Full ASCII Table
Extended ASCII Table
You are going to find Windows filename restrictions to be the most numerous. Let's start with it first, since there is a good chance of you having to share files with Windows users.
First, any ASCII character appearing between 0 and 31 (decimal value) is forbidden. Then, all of the following characters are also disallowed:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
As if that wasn't enough, certain keywords are reserved for exclusive use by Windows, and are thus not allowed to be used for filenames:
CON, PRN, AUX, CLOCK$, NUL
COM1, COM2, COM3, COM4
LPT1, LPT2, LPT3, LPT4 (LPT4 only in some versions of DR-DOS)
LST (only in 86-DOS and DOS 1.xx)
KEYBD$, SCREEN$ (only in multitasking MS-DOS 4.0)
$IDLE$ (only in Concurrent DOS 386, Multiuser DOS and DR DOS 5.0 and higher)
CONFIG$ (only in MS-DOS 7.0-8.0)
And, if those weren't enough, the NTFS file system further restricts the use of additional filenames, because they are used internally by the NTFS file system.
$Mft, $MftMirr, $LogFile, $Volume, $AttrDef, $Bitmap, $Boot, $BadClus, $Secure, $Upcase, $Extend, $Quota, $ObjId and $Reparse
It's enough to make you question why Windows remains so popular among the general computing public, huh?
On Linux systems, only two characters are disallowed: NUL (ASCII decimal code 0) and the / (forward slash, ASCII decimal code 47). Even though all the other characters are allowed, the use of certain characters may cause you problems in the long run. We'll talk about them later.
On a Mac system, only the : (colon, ASCII decimal code 58) is disallowed from use in filenames.
I can't help but reiterate that all of this becomes vitally important when you are creating files that might be used or viewed by users of the "other" systems. Use a wrong character (or reserved keyword or internal filename), and the user of the other system won't be able to access your file.
Making safe choices
If you want to guarantee cross-platform accessibility to your files, below is a set of "rules" to follow.
- [0-9a-zA-Z_] - Alphanumeric characters and the underscore are always fine to use.
- \/:*?"<>| and the null byte are problematic on at least one system, and should always be avoided.
- Spaces are used as argument separators on many systems, so filenames with spaces should be avoided when possible.
- Colons (;) are used to separate commands on many systems.
- []()^ #%&!@:+={}'~ and [`] all have special meanings in many shells, and are annoying to work around, and so should be avoided. They also tend to look horrible in URLs.
- Leading characters to avoid: Many command line programs use the hyphen [-] to indicate special arguments, while *nix based systems use a full-stop [.] as a leading character for hidden files and directories.
- Anything not in the ASCII character set can cause problems on older or more basic systems (e.g. some embedded systems), and should be used with care.
That basically leaves you with [0-9a-zA-Z-.,_] that are always safe and not annoying to use (as long as you start the filename with an alpha-numeric).
Dealing with troublesome filenames on Linux
Even though Linux filesystems allow the use of all characters except NUL and /, that doesn't mean that there aren't characters that can make life uncomfortable and troublesome when dealing with filenames that use those certain characters. Regardless of whether you love or hate the command line, and regardless if you avoid the command line like the bubonic plague or embrace all of its power, all of us will find ourselves using it at some point or another. As such, it's important to learn how to deal with these files, especially if you are forced to the command line out of necessity (like when your desktop environment won't boot properly and you have to apply fixes via the command line). Some GUI programs won't deal with these troublesome filenames easily either, since many GUI programs are simply command line utilities with a GUI applied/glued to the front end.
Filenames with spaces. I know we've stated this before several times in the pages of this magazine, but it bears repeating -- again. It seems that old Windows habits die hard. You should avoid the use of spaces in Linux filenames. On the command line, the space character is a delimiter between command line options. Like I stated previously, even some GUI programs are not immune. Some GUI programs are nothing more than graphical interface to a command line version of the same program. In these cases, spaces can still not behave properly, especially if the GUI designer didn't take into account the use of filenames with spaces in them. If you feel that you simply MUST have something between "words" in a filename, it's far better to use either an underscore or dash in place of the space.
Fortunately, filenames with spaces in them are probably some of the easiest to deal with. Simply put double quotes around the filename. (N.B.: this is the same solution employed under Windows if you use the Windows CMD command line utility.) Alternatively, you can "escape" the space by placing a backslash (\) before every space (*nix only ... don't bother trying this latter solution under Windows, since Windows uses the backslash character to denote directories and subdirectories).
Filenames with dashes (-). Dashes are not problematic, unless they are the first character in a filename. If they appear anywhere else in the filename except the first character, nothing special has to be done. Just use the filename as is. But the problems come in when a dash is the first character in a filename.
On the command line, the dash character denotes a command line option or switch. When a dash is the first character of a filename, your command line shell will interpret what follows the dash as a command line option or switch. When it doesn't match up with any known option, the command will fail, as in the image below.
To resolve this issue, you have two choices. First, precede the filename with two dashes, like this: nano -- -myfile.txt. Alternatively, you can simply precede the filename with ./ to denote that the file is within the current working directory (presuming you are currently in the directory that contains the file), like this: nano ./-myfile.txt.
Filenames with hash/pound sign (#). Since BASH treats everything following a # as a comment, the hash/pound sign can be problematic, regardless of where it appears in a filename. If the hash/pound sign is at the beginning (#myfile.txt), everything after the # is seen as a comment. If the hash/pound sign occurs elsewhere in the filename (my#file.txt), everything after the hash/pound sign ("my") is seen as a comment. In either case, the file won't be found.
There are two solutions. First, and just as we did with the dash as the first character, you can precede the filename with ./ to denote that the file is in the current working directory, like this: nano ./#myfile.txt. Alternately, you can enclose the filename in single quotes, like this: nano '#myfile.txt'.
Filenames with a semicolon (;). These filenames can also be problematic. Under BASH (and probably other command line shells, too), the semicolon is used to chain multiple commands together. So, when there's a semicolon in your filename, it sees everything after the semicolon as the start of a new command. It also prevents the file from being found, due to the misinterpretation.
Fortunately, the way to work around this issue is the same as with the hash/pound sign that we discussed previously. Thus, you can either put a ./ in front of the filename (nano ./;myfile.txt), or enclose it in single quotes (nano ‘;myfile.txt').
Just add single quotes. With certain other "special" characters used in filenames, all you have to do is enclose the filename in single quotes. Those characters are $, !, @, &, (), ", <>, and \.
As a side note, if you need to use/work with a filename with a single quote in it, enclose the filename in double quotes.
No special action needed. With other "special" characters that might appear in filenames, there is no special action necessary under Linux. Those characters are +, %, *, ^, {}, [], _ , =, ?, ., :, ~ and .. You can use/work with files containing these characters in the filename just as you would any other file.
Summary
Whether you follow the rules for filenames or not, you will eventually encounter files that are difficult to deal with. In fact, you probably already have. Knowing the rules arms you with the knowledge to properly deal with them. Meanwhile, you can also avoid these pitfalls by avoiding their use in your own files, and make life simpler for those with whom you might possibly share those files.
|