by stealth
So what is Sigil? It is a fancy data compression app. You can use it to open
existing ebook files and edit them, or you can use it to create your own new
ebook files.
What are ePub files? Are they a mystery to you? Ever wonder how they are
created? The ePub files don't have to be a mystery, and you are about to learn
how to create them. ePub files are, in essence, a self contained portable
website with some improved features.
An Overview Of Sigil
If you just want to edit an existing ePub file or create one, and you don't care
how it is put together or how it works, then use Sigil. It takes care of all the
technical stuff for you and makes everything work correctly. However, Sigil does
not appear to provide access to all of the features made available in the ePub
standard as found at idpf.org. Sigil does
have a slick Table of Contents (TOC) creation feature.
Sigil In Linux, Mac and Windows
There does seem to be some lost functionality when using Sigil in Linux. Windows
and Mac users claim the ability to do things that just don't seem to work in the
Linux version. Even the documentation says certain things can be done, but they
don't work for Linux users. You can still use the app to create and edit ePub
files. Just apparently not with the same ease that Windows and Mac users have.
Your mileage may vary.
Three Editing Modes In Sigil
The image below shows you the Sigil Editor in the center section of the
combination mode where you do the editing in WYSIWYG or the HTML code. The
single tool icon on the toolbar (below the Tools menu circled in red) is how you
get to the combination editor from the toolbar. You can see which file you are
working in by the green highlight in the left Book Browser Window, as well as
the tab above the editing window. The TOC for the book can be viewed in the
Table of Contents window to the right. The two icons circled in red are the
Metadata editor on the left and the TOC generation on the right.
You can see full WYSIWYG editing in this image, below. The toolbar icon for this
mode is circled in red.
You can see Full Code editing in this image, below. The toolbar icon for this
mode is circled in red.
The two files circled in the Book Browser in the left section of the image below
are key files, which are in any ePub file. The toc.ncx is where the TOC, which
you can see in the pane on the right, is stored. The content.opf stores the list
of the entire contents of the ePub file. The content.opf file is also what
controls the order the HTML files will be arranged in for viewing in an ebook
reader.
It is also how you will see them in the Book Browser under the Text folder
(above) in the pane on the left. All your HTML, SVG/PNG/JPG/GIF, CSS, fonts,
audio, video and possible script files, such as JavaScript, are contained in the
same folder in the ePub file. But, they can also be contained in sub folders, as
you can see above on the left. Those folders are actually created, and the files
are separated into them by Sigil when you open an existing ePub file, or when
you create a new one.
The ePub standard from idpf.org suggests that they
all be stored in the same file and not separated, although it is allowed. I’m
not sure why the author of Sigil decided to go the route which is not suggested,
but allowed. Maybe he is following Microsoft's lead of not following standards.
The reason they suggest not doing it this way, as seen above, is because there
is nothing in the standard requiring a reader to follow the linking behavior
which is required for the separated folder setup to work. Doing it that way
might work in one reader, but not in another. The linking behavior is identical
to what you might do with a web site, but it is not required of the reader to
handle the links or the folder structure properly. All the readers I have tried
handle the links properly.
From within Sigil, you cannot see the upper folders or the other two essential
files that are required in any ePub file. Those two folders and two files are
identical in every ePub file, and it won't work if they aren't.
In the image above, you see the folders with little > pointers next to some of
the folders. That means there are files in those folders, and none in the others
lacking the > pointers. You will see, farther down, that those folders don't
actually exist in ePub files generated by publishers. However, if you used Sigil
to open and edit an existing file, then saved it, Sigil will create the folders
and restructure all the files in that ePub file. Sigil will also correct any
linking problems caused by the restructure.
Sigil's TOC Generation
Below is the automatic TOC generation feature I mentioned above. You will also
see the search tool at the bottom of the editing window. The icon with the
magnifying glass on it, just to the left of the Tools drop down menu in the
image, is how you open, search and replace from the toolbar.
You can also do the TOC by hand if you want. You can also edit part of it after
automatic generation. If you do edit the file after it is created, you will see
activity in the right pane as you are editing.
Right Click Context Menus
There are right click context menus almost everywhere in Sigil. Right clicking
on the text folder will let you start a new HTML file, or import existing files.
Any existing files have to be well formed by the XHTML 1.1 standard, or Sigil
will not bring them in.
Creating An ePub From Scratch
Here is what the same file that is in the previous images looks like from within
Ark. For some reason, Sigil replaces that callouts folder in this ePub. It is at
the same level as the sub folders you saw in the Sigil images above. It has some
*.gif files in it. You can see a couple of different file types, including HTML
and OTF (OpenType Font) files. You will also notice that the sub-folders that
you saw in Sigil aren't here, because they don't actually exist in this ePub
file. O'Reilly makes all their ePub files with all the content files in the same
folder, with the exception of that callouts folder with the *.gif files in this
ePub.
The image below shows more file types, including the content.opf, CSS and
image files.
You can see the toc.ncx file at the bottom of the window in the image
below.
In the image below, you can see the two essential sub-folders and the two
essential files I mentioned above, which you cannot see when opening the ePub
file in Sigil. The OEBPS (Open eBook Publication Structure) folder is where all
the content files for the ebook are stored. That also includes the content.opf
and toc.ncx. The contents of the OEBPS folder is all you see in Sigil. The OEBPS
folder and the META-INF folder are one level down from the main folder and one
level up from the contents you see in the Sigil images above, or what is
actually the ePub file that you would open and read in an ebook reader.
How Does An ePub File Go Together?
If you are like me and have to find out how the ePub file works, or what the
full process for making a properly working ePub file is, then you will have to
roll up your sleeves and get ready to get all that technical stuff all over you.
Just kidding! I sure am glad it is not like working on a gasoline engine.
You need to know how to use a compression app like ark, xarchiver, Q7Z, p7zip,
file-roller, tar or zip. If you know how to create folders on your computer, and
you know how to use a text editor, then you can create your own ePub file using
your favorite text editor and compression app. My favorite text editor is Vim,
of course. Oh, you will also have to know a little something about making well
formed HTML, XHTML and XML files.
The container.xml & mimetype
So what is the container.xml file and the mimetype file? The container.xml file
is, as the name implies, a container file which references the content.opf file
found in the OEBPS folder inside the ePub file. An example is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf"
media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
The mimetype file contains only what you see below. The ePub file is a zip
file.
application/epub+zip
Create Your Own ePub File From Scratch
I prefer to do it this way. It lets me be in control, instead of an application.
The current standard for ePub is EPUB3. It supersedes the older EPUB2
specification.
Open a file manager and create the folder and name for your ePub file name. Then
create your two sub folders META-INF and OEBPS.
Here is the Structure of the ePub file as viewed in Ark before
extraction.
Here is the structure of the ePub folder after extraction.
Epub file name
|
|____META-INF (1 Required)(Other Files Optional)
| |
| |____container.xml (1 Required)
|
|____OEBPS (1 Required)
| |
| |____HTML/XHTML (1 Required)
| |
| |____CSS (Optional)
| |
| |____SVG/PNG/JPG/GIF (Optional)
| |
| |____scripts (Optional)
| |
| |____videos (Optional)
| |
| |____audio (Optional)
| |
| |____fonts (Optional)
| |
| |____content.opf (1 Required)
| |
| |____toc.ncx (1 Required)
|
|____mimetype (1 Required)
Creating The Required Files In The ePub
Start your favorite text editor, and enter the following:
application/epub+zip
Save this file in the root folder that has the ePub file name, and name this
file mimetype. Do not put the mimetype file in OEBPS or META-INF. It has to be
at the same level they are.
Start a new file and add exactly what you see here for the container.xml file.
Then save that to the META-INF folder.
<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf"
media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
The toc.ncx File
Below is a sample of a toc.ncx file. Creating this from scratch can be a hassle.
This is an XML version of the HTML list. It is an XML file with opening and
closing NCX tags and the XML Namespace in the opening NCX tag. It can be used
with or without a DOCTYPE declaration in the file. I have found that, sometimes,
NOT using a DOCTYPE allows you to create a TOC that would otherwise not work.
The sample toc.ncx below has three sections between the NCX tags: head, docTitle
and navMap. The standard at idpf.org says you can also have two other sections
called navPage and navList. Each section is separate from the others, and each
has its own opening and closing tags. I cut out most of the middle of this
toc.ncx file because it was too long. Within the navMap section you will
find:
<navPoint>
<navLabel>
<text>Your TOC Text
</navLabel>
<content src="yourfile.html"/>
</navPoint>
You can nest the navPoint element like so.
<navPoint>
<navLabel>
<text>Your TOC Text
</navLabel>
<content src="yourfile.html"/>
<navPoint>
<navLabel>
<text>Your TOC Text
</navLabel>
<content src="yourfile.html"/>
</navPoint>
</navPoint>
Here is a sample DOCTYPE
This sample toc.ncx does NOT have a DOCTYPE declaration. If it did, it would go
between the XML tag and the NCX tag, in basically the same manner as your XHTML
files. Also, the XML tag below, starting with
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<head>
<meta content="cover" name="cover"/>
<meta content="isbn:9780596159351" name="dtb:uid"/>
<meta content="-1" name="dtb:depth"/>
<meta content="0" name="dtb:totalPageCount"/>
<meta content="0" name="dtb:maxPageNumber"/>
</head>
<docTitle>
<text>Learning the vi and Vim Editors
</docTitle>
<navMap>
<navPoint id="id2909437" playOrder="1">
<navLabel>
<text>Learning the vi and Vim Editors
</navLabel>
<content src="Text/index.html"/>
<navPoint id="id2857362" playOrder="2">
<navLabel>
<text>Preface
</navLabel>
<content src="Text/pr01.html"/>
<navPoint id="id2857202" playOrder="3">
<navLabel>
<text>Scope of This Book
</navLabel>
<content src="Text/pr01.html#vi7-ch-0-sect-1"/>
</navPoint>
<navPoint id="id3103816" playOrder="4">
<navLabel>
<text>How the Material Is Presented
</navLabel>
<content src="Text/pr01s02.html"/>
<navPoint id="id3168839" playOrder="5">
<navLabel>
<text>Discussion of vi Commands
</navLabel>
<content src="Text/pr01s02.html#vi7-ch-0-sect-2.1"/>
</navPoint>
<navPoint id="id3174260" playOrder="6">
<navLabel>
<text>Conventions
</navLabel>
<content src="Text/pr01s02.html#vi7-ch-0-sect-2.2"/>
</navPoint>
<navPoint id="id2856537" playOrder="7">
<navLabel>
<text>Keystrokes
</navLabel>
<content src="Text/pr01s02.html#vi7-ch-0-sect-2.3"/>
</navPoint>
</navPoint>
</navPoint>
</navMap>
</ncx>
The content.opf File
Shown below is an example of a content.opf file. I cut out most of this one,
too, because of length. The content.opf file has an opening and closing PACKAGE
tag with an XML namespace in the opening tag. It has four sections between the
PACKAGE tags, each separate from the other, and each with its own opening and
closing tags: METADATA, MANIFEST, SPINE and GUIDE.
METADATA: Contains information about you and your book. dc:indentifier,
dc:title, dc:language and meta are the only required elements in this
section.
MANIFEST: All of your book content must be listed here.
SPINE: This must have all of the book content, minus images, CSS, audio,
video, fonts and any scripts. The order in which the content is listed is the
order in which the content is presented in a reader.
GUIDE: Can be empty. The section must be in the file, though.
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0"
unique-identifier="bookid">
<metadata>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/"
id="bookid">urn:isbn:9780596159351
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Learning the vi and
Vim Editors
<dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/">Copyright © 2009
Arnold Robbins and Elbert Hannah
<dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">O'Reilly
Media
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">COMPUTERS /
Operating Systems / UNIX
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2009-06-30
<dc:description xmlns:dc="http://purl.org/dc/elements/1.1/"><p>The
standard guide for <em>vi</em> since 1986, this book has been
expanded to include detailed information on <em>vim</em>, the
leading <em>vi</em> clone that includes extra features for both
beginners and power users. You learn text editing basics and advanced tools for
both editors, such as writing macros and scripts to extend the editor, power
tools for programmers, multi-window editing -- all in the easy-to-follow style
that has made this book a classic.</p>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Arnold Robbins">Arnold
Robbins
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Elbert Hannah">Elbert
Hannah
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Linda Lamb">Linda
Lamb
<dc:language xmlns:dc="http://purl.org/dc/elements/1.1/">en
<meta name="cover" content="cover-image"/>
</metadata>
<manifest>
<item id="ncxtoc" media-type="application/x-dtbncx+xml" href="toc.ncx"/>
<item media-type="text/css" id="css" href="core.css"/>
<item id="cover" href="cover.html" media-type="application/xhtml+xml"/>
<item id="epub.embedded.font.1" href="LiberationMono-Bold.otf"
media-type="font/opentype"/>
<item id="epub.embedded.font.2" href="LiberationMono-BoldItalic.otf"
media-type="font/opentype"/>
<item id="epub.embedded.font.3" href="LiberationMono-Italic.otf"
media-type="font/opentype"/>
<item id="epub.embedded.font.4" href="LiberationMono.otf"
media-type="font/opentype"/>
<item id="epub.embedded.font.5" href="LiberationSerif.otf"
media-type="font/opentype"/>
<item id="id2909437" href="index.html"
media-type="application/xhtml+xml"/>
<item id="cover-image"
href="httpatomoreillycomsourceoreillyimages8936.jpg" media-type="image/jpeg"/>
<item id="id3093658" href="oreilly_large.gif" media-type="image/gif"/>
<item id="id2857362" href="pr01.html"
media-type="application/xhtml+xml"/>
<item id="id3175607" href="pt01.html"
media-type="application/xhtml+xml"/>
<item id="id3175744" href="ch01.html"
media-type="application/xhtml+xml"/>
<item id="id3176055" href="httpatomoreillycomsourceoreillyimages8938.png"
media-type="image/png"/>
<item id="id3346907" href="author_bios.html"
media-type="application/xhtml+xml"/>
<item id="id3130574callout1" href="callouts/1.png"
media-type="image/png"/>
<item id="id3130574callout15" href="callouts/15.png"
media-type="image/png"/>
</manifest>
<spine toc="ncxtoc">
<itemref idref="cover" linear="no"/>
<itemref idref="id2909437"/>
<itemref idref="id3103816"/>
<itemref idref="id3346923"/>
</spine>
<guide>
<reference href="cover.html" type="cover" title="Cover"/>
</guide>
</package>
Your ePub Content
Create or copy all the XHTML, images, CSS, Fonts, (EPUB3 won't need the font
files, but includes provisions for them), and any other files that are necessary
for the book, and save those in the OEBPS folder. You can save them into
organized sub folders under the OEBPS folder, if you want. Make sure your XHTML
and CSS files validate correctly with the validation tools on the W3C website.
That step is absolutely essential. Otherwise, your ePub WILL NOT work if you
don't have well formed documents. You can also validate your ePub files at
several web sites.
It is best to leave out any code that you would use to create your structural
look and layout. Just use content markup, h1-h6, p, em, strong, ul, ol, li, dl,
dt, dd, table, tr, th, td, div, blockquote tags, and some others I am probably
missing, and use CSS for all your styling. Stay away from position and size in
your CSS. Don't use layout controls because it will cause problems in the ePub
file, even if the HTML validates on W3C. Those controls disrupt a large part of
what the ePub reader is designed to do on its own. Your HTML/XHTML files need to
have the DOCTYPE, as shown below, in the HTML/XHTML files you use in your
ePub:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
Your HTML/XHTML files can contain internal hyperlinks to enable moving around in
the book, and even external hyperlinks to leave the book. (That is also
dependent on the reader you are using, and the device). You can have an HTML
menu file at the beginning of the book, with standard HTML hyperlinks directing
to the contents of the book. You can also have the toc.nxc, which is not
actually part of the book content, where the HTML menu would be. See idpf.org for exact specification of the conformance
of your XHTML documents.
EPUB2 supports CSS2 and up with limited support for CSS3, (EPUB3 supports more
of CSS3), HTML5, XHTML 1.1, XML 1.0, SVG 1.1.
I know my HTML files with the above DOCTYPE, saved as .html, have worked so far.
It looks as though the xhtml extension is required with EPUB3. So, I have
renamed all my files to .xhtml in the ePubs I have created. My ePub files still
work with the new file names inside. I had to edit a lot of hyperlinks. Creating
new files for your ePub in Sigil will create them as .xhtml.
Testing Your New ePub
Once you are finished creating the ePub content, clean up any back up files and
any other unnecessary files in the OEBPS folder. Then open a compression
application and zip the main folder and everything in it. Next, rename the
extension of the new zip file, changing it from zip to epub. Finally, try
opening the file in your favorite ebook reader. If you did everything correctly,
you will be able to read your new file. If you didn't do it correctly, go back
to the main folder that you started from. Check everything, making sure you have
all the required files and folders in the correct locations, and they are named
correctly. Check to see if the contents inside those essential files are typed
correctly. Once you are sure everything is right, just delete the old ePub file
and save a newly compressed zip file. Change it again from zip to epub, and try
opening it again. If you can make a set of HTML files work correctly on a
website, you should also be able to make your ePub files work. Be sure you have
all the essential files and folders in your ePub, and they are named correctly
and in the correct locations.
Epub Authority
The governing authority on ePub is idpf.org. There is a
lot of cool technology that is available for ePub files. But, there is also a
problem with the technology not being used fully by the different ePub readers.
I know on our PCLinuxOS distribution, the best ebook readers I have found so far
are the ebook viewer that comes with Calibre, and fbreader. I wouldn't waste my
time with any of the others. Okular will read ePub files, but it won't display
them, probably because it doesn't have the dynamic flow capability of the actual
ebook readers. There are also a couple of freely available epub reader add-ons
for Firefox. One is called EPUBReader
, and the other one is called Lucifox.
You Have Enough To Get Started
I gave you enough information to create well formed and properly working ePub
files that you can roll with just a text editor and a compression app. You can
learn a lot more by going to idpf.org and reading
the documentation, if you want to learn about ePub file creation in greater
detail. That is what I did, particularly after I was asked to write this
article. I am still going there to learn more to use as a reference. EPUB3 will
have a lot of improvement over EPUB2. So, that is reason enough to visit
idpf.org often.
Credit And Thanks
The same ePub file was used in all the examples, both in the images and some of
the copy and paste code examples. That ePub was purchased by me from O'Reilly
Media. And falls within O'Reilly Media's automatic permissions, which allow the
use of part of their work found in nearly every one of their publications.
Although they don't require it, I am adding the acknowledgment information for
the very small part of the ePub file I used for my examples.
Learning the vi and Vim Editors
Arnold Robbins, Elbert Hannah, Linda Lamb
Copyright © 2009 Arnold Robbins and Elbert Hannah
O'Reilly Media
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
http://www.oreilly.com
Thank you O'Reilly Media for your contribution and friendly position towards the
Open Source Community. Additionally, thanks for making your publications so
easily accessible for life, and in so many formats. Also, a big thanks for
making them DRM free!
Happy ePub rolling!
|