banner
Previous Page
PCLinuxOS Magazine
PCLinuxOS
Article List
Disclaimer
Next Page

How To Create, Edit ePub Files In Sigil

by stealth

So what is Sigil? It is a fancy data compression app. You can use it to open existing ebook files and edit them, or you can use it to create your own new ebook files.

What are ePub files? Are they a mystery to you? Ever wonder how they are created? The ePub files don't have to be a mystery, and you are about to learn how to create them. ePub files are, in essence, a self contained portable website with some improved features.


An Overview Of Sigil

If you just want to edit an existing ePub file or create one, and you don't care how it is put together or how it works, then use Sigil. It takes care of all the technical stuff for you and makes everything work correctly. However, Sigil does not appear to provide access to all of the features made available in the ePub standard as found at idpf.org. Sigil does have a slick Table of Contents (TOC) creation feature.


Sigil In Linux, Mac and Windows

There does seem to be some lost functionality when using Sigil in Linux. Windows and Mac users claim the ability to do things that just don't seem to work in the Linux version. Even the documentation says certain things can be done, but they don't work for Linux users. You can still use the app to create and edit ePub files. Just apparently not with the same ease that Windows and Mac users have. Your mileage may vary.


Three Editing Modes In Sigil

The image below shows you the Sigil Editor in the center section of the combination mode where you do the editing in WYSIWYG or the HTML code. The single tool icon on the toolbar (below the Tools menu circled in red) is how you get to the combination editor from the toolbar. You can see which file you are working in by the green highlight in the left Book Browser Window, as well as the tab above the editing window. The TOC for the book can be viewed in the Table of Contents window to the right. The two icons circled in red are the Metadata editor on the left and the TOC generation on the right.

You can see full WYSIWYG editing in this image, below. The toolbar icon for this mode is circled in red.

You can see Full Code editing in this image, below. The toolbar icon for this mode is circled in red.

The two files circled in the Book Browser in the left section of the image below are key files, which are in any ePub file. The toc.ncx is where the TOC, which you can see in the pane on the right, is stored. The content.opf stores the list of the entire contents of the ePub file. The content.opf file is also what controls the order the HTML files will be arranged in for viewing in an ebook reader.

It is also how you will see them in the Book Browser under the Text folder (above) in the pane on the left. All your HTML, SVG/PNG/JPG/GIF, CSS, fonts, audio, video and possible script files, such as JavaScript, are contained in the same folder in the ePub file. But, they can also be contained in sub folders, as you can see above on the left. Those folders are actually created, and the files are separated into them by Sigil when you open an existing ePub file, or when you create a new one.

The ePub standard from idpf.org suggests that they all be stored in the same file and not separated, although it is allowed. I’m not sure why the author of Sigil decided to go the route which is not suggested, but allowed. Maybe he is following Microsoft's lead of not following standards. The reason they suggest not doing it this way, as seen above, is because there is nothing in the standard requiring a reader to follow the linking behavior which is required for the separated folder setup to work. Doing it that way might work in one reader, but not in another. The linking behavior is identical to what you might do with a web site, but it is not required of the reader to handle the links or the folder structure properly. All the readers I have tried handle the links properly.

From within Sigil, you cannot see the upper folders or the other two essential files that are required in any ePub file. Those two folders and two files are identical in every ePub file, and it won't work if they aren't.

In the image above, you see the folders with little > pointers next to some of the folders. That means there are files in those folders, and none in the others lacking the > pointers. You will see, farther down, that those folders don't actually exist in ePub files generated by publishers. However, if you used Sigil to open and edit an existing file, then saved it, Sigil will create the folders and restructure all the files in that ePub file. Sigil will also correct any linking problems caused by the restructure.


Sigil's TOC Generation

Below is the automatic TOC generation feature I mentioned above. You will also see the search tool at the bottom of the editing window. The icon with the magnifying glass on it, just to the left of the Tools drop down menu in the image, is how you open, search and replace from the toolbar.

You can also do the TOC by hand if you want. You can also edit part of it after automatic generation. If you do edit the file after it is created, you will see activity in the right pane as you are editing.



Right Click Context Menus

There are right click context menus almost everywhere in Sigil. Right clicking on the text folder will let you start a new HTML file, or import existing files. Any existing files have to be well formed by the XHTML 1.1 standard, or Sigil will not bring them in.


Creating An ePub From Scratch

Here is what the same file that is in the previous images looks like from within Ark. For some reason, Sigil replaces that callouts folder in this ePub. It is at the same level as the sub folders you saw in the Sigil images above. It has some *.gif files in it. You can see a couple of different file types, including HTML and OTF (OpenType Font) files. You will also notice that the sub-folders that you saw in Sigil aren't here, because they don't actually exist in this ePub file. O'Reilly makes all their ePub files with all the content files in the same folder, with the exception of that callouts folder with the *.gif files in this ePub.

The image below shows more file types, including the content.opf, CSS and image files.

You can see the toc.ncx file at the bottom of the window in the image below.

In the image below, you can see the two essential sub-folders and the two essential files I mentioned above, which you cannot see when opening the ePub file in Sigil. The OEBPS (Open eBook Publication Structure) folder is where all the content files for the ebook are stored. That also includes the content.opf and toc.ncx. The contents of the OEBPS folder is all you see in Sigil. The OEBPS folder and the META-INF folder are one level down from the main folder and one level up from the contents you see in the Sigil images above, or what is actually the ePub file that you would open and read in an ebook reader.



How Does An ePub File Go Together?

If you are like me and have to find out how the ePub file works, or what the full process for making a properly working ePub file is, then you will have to roll up your sleeves and get ready to get all that technical stuff all over you. Just kidding! I sure am glad it is not like working on a gasoline engine.

You need to know how to use a compression app like ark, xarchiver, Q7Z, p7zip, file-roller, tar or zip. If you know how to create folders on your computer, and you know how to use a text editor, then you can create your own ePub file using your favorite text editor and compression app. My favorite text editor is Vim, of course. Oh, you will also have to know a little something about making well formed HTML, XHTML and XML files.


The container.xml & mimetype

So what is the container.xml file and the mimetype file? The container.xml file is, as the name implies, a container file which references the content.opf file found in the OEBPS folder inside the ePub file. An example is shown below.

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
       <rootfile full-path="OEBPS/content.opf"
media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

The mimetype file contains only what you see below. The ePub file is a zip file.
application/epub+zip


Create Your Own ePub File From Scratch

I prefer to do it this way. It lets me be in control, instead of an application. The current standard for ePub is EPUB3. It supersedes the older EPUB2 specification.

Open a file manager and create the folder and name for your ePub file name. Then create your two sub folders META-INF and OEBPS.

Here is the Structure of the ePub file as viewed in Ark before extraction.

Here is the structure of the ePub folder after extraction.

Epub file name
   |
   |____META-INF (1 Required)(Other Files Optional)
   |      |
   |      |____container.xml (1 Required)
   |
   |____OEBPS (1 Required)
   |      |
   |      |____HTML/XHTML (1 Required)
   |      |
   |      |____CSS (Optional)
   |      |
   |      |____SVG/PNG/JPG/GIF (Optional)
   |      |
   |      |____scripts (Optional)
   |      |
   |      |____videos (Optional)
   |      |
   |      |____audio (Optional)
   |      |
   |      |____fonts (Optional)
   |      |
   |      |____content.opf (1 Required)
   |      |
   |      |____toc.ncx (1 Required)
   |
   |____mimetype (1 Required)

Creating The Required Files In The ePub

Start your favorite text editor, and enter the following:
application/epub+zip

Save this file in the root folder that has the ePub file name, and name this file mimetype. Do not put the mimetype file in OEBPS or META-INF. It has to be at the same level they are.

Start a new file and add exactly what you see here for the container.xml file. Then save that to the META-INF folder.

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
       <rootfile full-path="OEBPS/content.opf"
media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

The toc.ncx File

Below is a sample of a toc.ncx file. Creating this from scratch can be a hassle. This is an XML version of the HTML list. It is an XML file with opening and closing NCX tags and the XML Namespace in the opening NCX tag. It can be used with or without a DOCTYPE declaration in the file. I have found that, sometimes, NOT using a DOCTYPE allows you to create a TOC that would otherwise not work. The sample toc.ncx below has three sections between the NCX tags: head, docTitle and navMap. The standard at idpf.org says you can also have two other sections called navPage and navList. Each section is separate from the others, and each has its own opening and closing tags. I cut out most of the middle of this toc.ncx file because it was too long. Within the navMap section you will find:

<navPoint>
   <navLabel>
       <text>Your TOC Text
   </navLabel>
   <content src="yourfile.html"/>
</navPoint>
You can nest the navPoint element like so.
<navPoint>
   <navLabel>
       <text>Your TOC Text
   </navLabel>
   <content src="yourfile.html"/>
   <navPoint>
       <navLabel>
           <text>Your TOC Text
       </navLabel>
       <content src="yourfile.html"/>
   </navPoint>
</navPoint>

Here is a sample DOCTYPE

This sample toc.ncx does NOT have a DOCTYPE declaration. If it did, it would go between the XML tag and the NCX tag, in basically the same manner as your XHTML files. Also, the XML tag below, starting with

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
 <head>
   <meta content="cover" name="cover"/>
   <meta content="isbn:9780596159351" name="dtb:uid"/>
   <meta content="-1" name="dtb:depth"/>
   <meta content="0" name="dtb:totalPageCount"/>
   <meta content="0" name="dtb:maxPageNumber"/>
 </head>
 <docTitle>
   <text>Learning the vi and Vim Editors
 </docTitle>
 <navMap>
   <navPoint id="id2909437" playOrder="1">
     <navLabel>
       <text>Learning the vi and Vim Editors
     </navLabel>
     <content src="Text/index.html"/>
     <navPoint id="id2857362" playOrder="2">
       <navLabel>
         <text>Preface
       </navLabel>
       <content src="Text/pr01.html"/>
       <navPoint id="id2857202" playOrder="3">
         <navLabel>
           <text>Scope of This Book
         </navLabel>
         <content src="Text/pr01.html#vi7-ch-0-sect-1"/>
       </navPoint>
       <navPoint id="id3103816" playOrder="4">
         <navLabel>
           <text>How the Material Is Presented
         </navLabel>
         <content src="Text/pr01s02.html"/>
         <navPoint id="id3168839" playOrder="5">
           <navLabel>
             <text>Discussion of vi Commands
           </navLabel>
           <content src="Text/pr01s02.html#vi7-ch-0-sect-2.1"/>
         </navPoint>
         <navPoint id="id3174260" playOrder="6">
           <navLabel>
             <text>Conventions
           </navLabel>
           <content src="Text/pr01s02.html#vi7-ch-0-sect-2.2"/>
         </navPoint>
         <navPoint id="id2856537" playOrder="7">
           <navLabel>
             <text>Keystrokes
           </navLabel>
           <content src="Text/pr01s02.html#vi7-ch-0-sect-2.3"/>
         </navPoint>
       </navPoint>
   </navPoint>
 </navMap>
</ncx>

The content.opf File

Shown below is an example of a content.opf file. I cut out most of this one, too, because of length. The content.opf file has an opening and closing PACKAGE tag with an XML namespace in the opening tag. It has four sections between the PACKAGE tags, each separate from the other, and each with its own opening and closing tags: METADATA, MANIFEST, SPINE and GUIDE.

METADATA: Contains information about you and your book. dc:indentifier, dc:title, dc:language and meta are the only required elements in this section.

MANIFEST: All of your book content must be listed here.

SPINE: This must have all of the book content, minus images, CSS, audio, video, fonts and any scripts. The order in which the content is listed is the order in which the content is presented in a reader.

GUIDE: Can be empty. The section must be in the file, though.

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0"
unique-identifier="bookid">
 <metadata>
   <dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/"
id="bookid">urn:isbn:9780596159351
   <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Learning the vi and
Vim Editors
   <dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/">Copyright © 2009
Arnold Robbins and Elbert Hannah
   <dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">O'Reilly
Media
   <dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">COMPUTERS /
Operating Systems / UNIX
   <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2009-06-30
   <dc:description xmlns:dc="http://purl.org/dc/elements/1.1/"><p>The
standard guide for <em>vi</em> since 1986, this book has been
expanded to include detailed information on <em>vim</em>, the
leading <em>vi</em> clone that includes extra features for both
beginners and power users. You learn text editing basics and advanced tools for
both editors, such as writing macros and scripts to extend the editor, power
tools for programmers, multi-window editing -- all in the easy-to-follow style
that has made this book a classic.</p>
   <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Arnold Robbins">Arnold
Robbins
   <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Elbert Hannah">Elbert
Hannah
   <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Linda Lamb">Linda
Lamb
   <dc:language xmlns:dc="http://purl.org/dc/elements/1.1/">en
   <meta name="cover" content="cover-image"/>
 </metadata>
 <manifest>
   <item id="ncxtoc" media-type="application/x-dtbncx+xml" href="toc.ncx"/>
   <item media-type="text/css" id="css" href="core.css"/>
   <item id="cover" href="cover.html" media-type="application/xhtml+xml"/>
   <item id="epub.embedded.font.1" href="LiberationMono-Bold.otf"
media-type="font/opentype"/>
   <item id="epub.embedded.font.2" href="LiberationMono-BoldItalic.otf"
media-type="font/opentype"/>
   <item id="epub.embedded.font.3" href="LiberationMono-Italic.otf"
media-type="font/opentype"/>
   <item id="epub.embedded.font.4" href="LiberationMono.otf"
media-type="font/opentype"/>
   <item id="epub.embedded.font.5" href="LiberationSerif.otf"
media-type="font/opentype"/>
   <item id="id2909437" href="index.html"
media-type="application/xhtml+xml"/>
   <item id="cover-image"
href="httpatomoreillycomsourceoreillyimages8936.jpg" media-type="image/jpeg"/>
   <item id="id3093658" href="oreilly_large.gif" media-type="image/gif"/>
   <item id="id2857362" href="pr01.html"
media-type="application/xhtml+xml"/>
   <item id="id3175607" href="pt01.html"
media-type="application/xhtml+xml"/>
   <item id="id3175744" href="ch01.html"
media-type="application/xhtml+xml"/>
   <item id="id3176055" href="httpatomoreillycomsourceoreillyimages8938.png"
media-type="image/png"/>
   <item id="id3346907" href="author_bios.html"
media-type="application/xhtml+xml"/>
   <item id="id3130574callout1" href="callouts/1.png"
media-type="image/png"/>
   <item id="id3130574callout15" href="callouts/15.png"
media-type="image/png"/>
 </manifest>
 <spine toc="ncxtoc">
   <itemref idref="cover" linear="no"/>
   <itemref idref="id2909437"/>
   <itemref idref="id3103816"/>
   <itemref idref="id3346923"/>
 </spine>
 <guide>
   <reference href="cover.html" type="cover" title="Cover"/>
 </guide>
</package>

Your ePub Content

Create or copy all the XHTML, images, CSS, Fonts, (EPUB3 won't need the font files, but includes provisions for them), and any other files that are necessary for the book, and save those in the OEBPS folder. You can save them into organized sub folders under the OEBPS folder, if you want. Make sure your XHTML and CSS files validate correctly with the validation tools on the W3C website. That step is absolutely essential. Otherwise, your ePub WILL NOT work if you don't have well formed documents. You can also validate your ePub files at several web sites.

It is best to leave out any code that you would use to create your structural look and layout. Just use content markup, h1-h6, p, em, strong, ul, ol, li, dl, dt, dd, table, tr, th, td, div, blockquote tags, and some others I am probably missing, and use CSS for all your styling. Stay away from position and size in your CSS. Don't use layout controls because it will cause problems in the ePub file, even if the HTML validates on W3C. Those controls disrupt a large part of what the ePub reader is designed to do on its own. Your HTML/XHTML files need to have the DOCTYPE, as shown below, in the HTML/XHTML files you use in your ePub:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Your HTML/XHTML files can contain internal hyperlinks to enable moving around in the book, and even external hyperlinks to leave the book. (That is also dependent on the reader you are using, and the device). You can have an HTML menu file at the beginning of the book, with standard HTML hyperlinks directing to the contents of the book. You can also have the toc.nxc, which is not actually part of the book content, where the HTML menu would be. See idpf.org for exact specification of the conformance of your XHTML documents.

EPUB2 supports CSS2 and up with limited support for CSS3, (EPUB3 supports more of CSS3), HTML5, XHTML 1.1, XML 1.0, SVG 1.1.

I know my HTML files with the above DOCTYPE, saved as .html, have worked so far. It looks as though the xhtml extension is required with EPUB3. So, I have renamed all my files to .xhtml in the ePubs I have created. My ePub files still work with the new file names inside. I had to edit a lot of hyperlinks. Creating new files for your ePub in Sigil will create them as .xhtml.


Testing Your New ePub

Once you are finished creating the ePub content, clean up any back up files and any other unnecessary files in the OEBPS folder. Then open a compression application and zip the main folder and everything in it. Next, rename the extension of the new zip file, changing it from zip to epub. Finally, try opening the file in your favorite ebook reader. If you did everything correctly, you will be able to read your new file. If you didn't do it correctly, go back to the main folder that you started from. Check everything, making sure you have all the required files and folders in the correct locations, and they are named correctly. Check to see if the contents inside those essential files are typed correctly. Once you are sure everything is right, just delete the old ePub file and save a newly compressed zip file. Change it again from zip to epub, and try opening it again. If you can make a set of HTML files work correctly on a website, you should also be able to make your ePub files work. Be sure you have all the essential files and folders in your ePub, and they are named correctly and in the correct locations.


Epub Authority

The governing authority on ePub is idpf.org. There is a lot of cool technology that is available for ePub files. But, there is also a problem with the technology not being used fully by the different ePub readers. I know on our PCLinuxOS distribution, the best ebook readers I have found so far are the ebook viewer that comes with Calibre, and fbreader. I wouldn't waste my time with any of the others. Okular will read ePub files, but it won't display them, probably because it doesn't have the dynamic flow capability of the actual ebook readers. There are also a couple of freely available epub reader add-ons for Firefox. One is called EPUBReader , and the other one is called Lucifox.


You Have Enough To Get Started

I gave you enough information to create well formed and properly working ePub files that you can roll with just a text editor and a compression app. You can learn a lot more by going to idpf.org and reading the documentation, if you want to learn about ePub file creation in greater detail. That is what I did, particularly after I was asked to write this article. I am still going there to learn more to use as a reference. EPUB3 will have a lot of improvement over EPUB2. So, that is reason enough to visit idpf.org often.


Credit And Thanks

The same ePub file was used in all the examples, both in the images and some of the copy and paste code examples. That ePub was purchased by me from O'Reilly Media. And falls within O'Reilly Media's automatic permissions, which allow the use of part of their work found in nearly every one of their publications.

Although they don't require it, I am adding the acknowledgment information for the very small part of the ePub file I used for my examples.

Learning the vi and Vim Editors
Arnold Robbins, Elbert Hannah, Linda Lamb
Copyright © 2009 Arnold Robbins and Elbert Hannah
O'Reilly Media
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
http://www.oreilly.com

Thank you O'Reilly Media for your contribution and friendly position towards the Open Source Community. Additionally, thanks for making your publications so easily accessible for life, and in so many formats. Also, a big thanks for making them DRM free!

Happy ePub rolling!



Previous Page              Top              Next Page
Copyright (c) 2013, The PCLinuxOS Magazine. All Rights Reserved.