Wednesday, August 24, 2011

Converting from DOC to MOBI in Calibre

Well, I just spent three hours converting Finding Fiona from a .doc to a .mobi with the software Calibre. I have a feeling it shouldn't have been as hard as it was, but here's a blog post for anyone who may have the same problems I did. You can read my entire, frustrating journey, or you can skip to the bottom, where I've bolded the steps of how I finally found a way that worked.

Calibre is a free e-book management software, but it also has the great capability to convert books. It's taken me a while to really get used to it. I've converted it for my own reading, but when it came to actually sending other people professional-looking files, I needed to do a lot to make sure it looked exactly the way I wanted it to. I wanted to try out Mobipocket Creator, but then I found out it was only a Windows program.

I wrote the book in Pages, a Mac program. I exported the story into .doc because Calibre can take a .doc file and zip it for conversion to .mobi, which is the file format I wanted to have available for book bloggers before I published the story on Smashwords.

At first, I opened the .doc in MS Word and saved the .doc as a Web Page, then converted it in Calibre. The converted .mobi file lost all bold and italics. I soon found out that Calibre doesn't pick up < span>< /span> (without the spaces) tags. It needs < i>< /i> tags instead. This gave me a huge problem because my .html file, exported by MS Word, was littered with < span> tags. Every paragraph had at least two, but my most important ones were italics. Finding Fiona has a lot of journal entries, especially in the beginning, but these just looked like the rest of the text.

So then I played around with file formats other than .html. I tried .pdf and .rtf, but I kept getting strange paragraph breaks. The journal entries were supposed to have two paragraph breaks to separate them from the rest of the text, but it didn't translate.

I got my .doc file again, and coded it into html myself. I added < i>< /i> tags around all the italics, < b>< /b> tags around the bold, and < br> at the beginning of every paragraph. I saved it in Dreamweaver, but you could also use Nvu, which is free. I tried to use it, but it kept freezing up on me. Maybe it's because I would copy the entire 43,000 words at a time. :P Any WYSIWYG html editor will work since it's very simple html. Just add the tags in your word processor, then copy it into your html editor, and save it as an html file. You can add the tags in your html editor if you want, but I used the find/replace function to find paragraph breaks and add < br>.

When I did the conversion, though, I still had strange paragraph break problems. If I removed the paragraph breaks and added indents, my journal entries ran with the rest of the text without breaks. But I couldn't figure out how to add indents to my html file. I was thinking of non-breaking spaces, but I wasn't sure if that would work.

I found this helpful link (http://manual.calibre-ebook.com/conversion.html) and skipped down to "Paragraph Spacing." Lo and behold, it answered my question! How do you remove spacing for only some paragraphs and also add indents to them? You need to add this into your Extra CSS box on the "Look & Feel" window in Calibre:

p, div { margin: 0pt; border: 0pt; text-indent: 1.5em }
.spacious { margin-bottom: 1em; text-indent: 0pt; }

That meant I first had to change the < br> to < p>, then I went to my html file, and changed the p class to "spacious" on the paragraphs that needed an extra space below them. (That just means that instead of writing < p> for a paragraph break, I wrote < p class="spacious">) I also closed the < /p> at the end of the spacious paragraphs. I changed the text-indent from "0pt" to "1.5em" so they looked like all the others but with an extra break separating it from the rest of the text. You can also make your scene dividers the spacious class. It worked beautifully!

. . .in Calibre. But NOT in my Kindle app or on my Kindle. The text still ran together. It ignored my "margin-bottom."

So, I started over. . .again. And I wished I'd done this from the beginning.

1. Copy/paste story into a new document in OpenOffice (you may be able to just open your .doc, but I figured that might bring over too much formatting)
2. Save/export as html
3. Load into Calibre as "new book"
4. Convert to .mobi in Calibre
5. Choose "remove spacing between paragraph breaks" in the Look & Feel tab

IT WORKS. I still have italics and I have double paragraph breaks separating journal entries and articles. I don't know know how or why OpenOffice's html was better than the thousand other things I did, but THANK GOD it works.

If you can figure it out yourself using a WYSIWYG, more power to you. It would have been fine if I didn't need two line breaks in a row, but this is the story I had to write!

Sidenotes: You will need to add page breaks because OpenOffice doesn't do this. Either you can do this in the html file by adding this code (without spaces) before each chapter heading (or wherever you want to put the page break):

< DIV style="page-break-after:always">< /DIV>

Or, you can enable "Heuristic Processing" in Calibre and uncheck everything but "Detect and markup unformatted chapter and subheadings." I read somewhere that heuristic processing can convert non-breaking spaces to regular spaces, so that's why I unchecked everything. I added the non-breaking spaces for a reason!

9 comments:

  1. This works! I decided to try converting straight from Word to html and then convert with Calibre. After my first attempt, I went back and removed Section Breaks. I still need to do a little fussing with the original, but it is readable as is.

    Thanks!

    ReplyDelete
  2. Very useful information. Thank you so much for sharing it with us all.

    ReplyDelete
  3. Brilliant. Thanks for this very useful information. Works like a charm.

    ReplyDelete
  4. Thanks for posting that. I thought I could import the .doc, but nope!

    ReplyDelete
  5. Thanks for the information. It saved me hours of investigation. As mentioned an earlier comment I saved from Word to htm. I know now to keep my documents simple if I plan on making .mobi files from them.

    ReplyDelete
  6. in .docx save as Web Page, Filtered (htm, html). This is the best way and it works.
    I changed it from the below instructions who had just save as html.
    Short Answer
    1. In your word document: (docx)
    set doc title & chapter titles to “Header 1″ & save as Web Page, Filtered*.html
    2. In Calibre:
    Calibre>Convert books>Look & Feel>Remove spacing between paragraphs
    Calibre>Convert books>Table of Contents>Level 1 TOC (XPath
    expression)://h:h1
    http://pdxnat.wordpress.com/2011/10/31/how-to-make-an-epub-mobi-file/

    ReplyDelete
  7. I usually just convert doc to docx or rtf, import in Calibre and convert to mobi. Just that, no extra steps needed.

    ReplyDelete