Every once in a while I sit down to collect my thoughts and views about Bookalope, book design, and publishing into a blog. Please note that these all are solely my personal opinions, and resemblances with yours are purely coincidental! If you’re curious for more, please also visit Bookalope’s Medium page and enjoy…

Podcast with We Are Futureproofs

Thank you John Pettigrew of We Are Futureproofs for an interesting and jolly interview. I enjoyed his questions and our conversation. This was my first public interview related to Bookalope! John has made the podcast available on his website, so please visit and peruse his other interviews. (February 2019)

Book Design and Hot Chocolate

This is a story of automation, and the story of how Bookalope came about. It was first published at Digital Book World in 2017 but has since then been removed when DBW was taken over by Score Publishing. The original blog has been republished here with permission, and a somewhat abridged version was later published by Booknet Canada. (Sometime 2017) 

I enjoy designing books, because typography and beautiful books are dear to my heart. When I receive a new manuscript to turn it into a print or ebook, I like to spend my time working with its visual design rather than laboriously cleaning up text and structuring content. I want to enjoy the typographical playground rather than fixing issues inherited from a history of editing the manuscript.

I want to click a button and enjoy my hot chocolate, while I watch my computer do the work. However, building such an automated work flow that is both comprehensive and intelligent enough is an intense journey, and I’m not done yet with it yet.

Several years ago I created my first ebook by hand.

At first I tried multiple web services and tools, but their results were not satisfying to me: the mess of the original manuscript was simply transferred to the ebook, there were rendering issues on different devices and apps, tools seldomly checked for issues in the text, and in the end the ebooks often failed to validate against official standards. So I opened a text editor and went to work myself.

One by one, I arduously transcribed the paragraphs from the original Word manuscript, and with tedious handiwork I cleaned up their formatting issues and typos. Sometimes, the text styles had me guessing the book’s structure, but leafing back and forth through the manuscript helped me to understand what the author may have intended the book’s structure to be. From that manual transcription, it was then easy to build the EPUB scaffolding and low and behold there was my first ebook! It took advantage of the e-reader’s default styles, it was clean and simple, it rendered well on whatever device I tried, and, most importantly, it validated perfectly.

In that same manner I worked through my second manuscript, but already I began to feel bored by the repetitiveness of the process. I felt like I wasted my time doing work that my computer could do so much better, and I felt robbed of the joy of actually being creative and designing the book itself.

These are the moments when being a software engineer comes in tremendously handy, and so I set out to write myself a small program that would construct the EPUB scaffolding from the HTML file that I produced so laboriously. Soon, I switched to using XML because HTML wasn’t expressive enough for my needs, it was too messy (hence the common term “HTML soup”), and industry-strength tools worked much better with XML.

I soon noticed that little mistakes snuck into my manual transcriptions. So I expanded my XML digestion with error checks to find spelling and punctuation problems and typographical nits, and to either fix them automatically or at least warn me about them.

Ebooks weren’t enough, though. Word and its clones don’t produce well designed print books, and often I looked at the original manuscripts wishing that they would look less ill-designed, less amateurish. When I then stumbled upon a tool that allows me to apply CSS styles to my XML files, I excitedly began to design and create print-ready PDF files that looked beautiful and professional.

Because both ebook and print book now came from the same source, I could edit the original manuscript at will and then generate the final books for e and print automatically.

I realized the sensibility and importance of separating a manuscript’s content and semantic structure from its visual design, how content and content presentation are really two very different and mostly independent things. (See my blog below, Of Carts and Horses.)

However, with all the automatic creation of print and ebooks from XML, I still found myself spending long hours insipidly transcribing more or less ill-designed text manuscripts to XML, stripping them of their visual mess, and cleaning them up manually.

Once again, I rolled up my software engineering sleeves and set out to make my life easier. This time around though, the task was not at all easy: I can automatically extract the content from a manuscript, but can I automatically structure semantically that content based on the manuscript’s visual styling? As it turns out, this is a non-trivial problem that has kept me busy for the better part of two years now.

Think about it: an author writes and structures a book using chapters and sections, emphasizes text portions, references other material in the book or externally, elaborates on the text using footnotes and endnotes, and so forth. All these different elements are styled visually to set them apart from one another, and to guide the reader through the book without becoming distracted.

I wanted to automate the reversal of this creative design process: derive the author’s intended semantic structure of the manuscript from how she styled the text elements!

And so I have been busy for about a two years now designing and implementing an intelligent structure and content classifier using AI/ML techniques (oh so buzzwordy these days: AI or artificial intelligence, ML or machine learning), combined with other heuristics that flow into the classification. As it turns out, this is not only an engineering challenge, but also an interesting academic research topic.

Since I wrote my first small script several years ago, my tools have gotten much better. I am getting where I wanted to be: click a button and slurp my hot chocolate while I watch my laptop do the tedious work. Still, because no algorithmic solution will ever be perfect, my process is interactive in that I want to confirm or adjust the guesses of my software, at least occasionally. But that’s just another mouse click, and I don’t have to put down my cup.

Finally, when I’m done with my hot chocolate, I can indulge in designing my book, contended knowing that the rest—content extraction, semantic structuring, and text cleanup of the final ebook and print book—have all been taken care of. And that’s what I really wanted all along…

Of Carts and Horses

The traditional process of making books is tailored to the static medium of print. However, in a world of portable reading devices of widely varying sizes and capabilities, print books rarely adapt well. Converting from static print-designs to a format for customizable electronic presentations is error-prone, cumbersome, and time-consuming, and therefore expensive. That is where Bookalope helps. (July 2015) 

The way we share written stories changed with the advent of computers.

For much of human history, a written story was—sometimes quite literally—set in stone. We read and listen to stories, change and retell them, we translate them from one language to another, imagine new ones. In Europe, medieval scribes began to turn story writing into an art, and since the invention of movable type in the fifteenth century (take a look at Alix Christie’s novel Gutenberg’s Apprentice), we have evolved the art of book design and typography into a perfected craft: typefaces, illustrations, inks, paper, binding. We create books, and we make them everyday items to behold, amuse, entertain, and educate (Amaranth Borsuk’s The Book offers a modernist overview).

Books can be beautiful if they are written, designed, and printed by talented artists. Richard Hendel’s insightful books, On Book Design and Aspects of Contemporary Book Design, illustrate how artists and skilled professionals create beautiful printed books. But printed books are static in their beauty, and their design is often constrained by budget and process. Within these constraints, the designer creates the page layout of typefaces and spaces and illustrations. Over centuries the publishing industry has developed and streamlined this process to accommodate the printed medium.

Computers, however, are a different beast altogether, and the design of books for a digital presentation must be reconsidered from the ground up. The traditional static print book design is being made obsolete by willful and dynamic user customization. The reader herself takes over design aspects of the written story, where she is now free to choose the dimension of her proverbial page (a wide landscape monitor or a small phone screen) as well as the typeface, its size, line width and spacing, margins, and so forth. In contrast to print books, the design of electronic books must adapt to the reader’s choices instead of dictating them.

Such freedom stirs a colorful mess, and I love it.

What astonishes me, though, is how slowly we embrace this new freedom. Instead of designing books that are no longer set in stone, we try to fit the stone (or the printed book) onto whatever medium and device the reader chooses. And that doesn’t go too well. We spend time and effort on turning a story into a print book, and then we try to turn that print book into an electronic presentation which, hopefully, mimics the print while being as readable and legible on our electronic reading devices—no matter how tiny—as the book is in print.

Figure 1 The traditional publishing workflow.

The modern process of manufacturing a book yields a PDF file, the de-facto file format for print today. Simply put, PDF is a paged presentation file format (for print) which describes the graphical layout of the book exactly: it contains the page dimensions and instructions on what to draw where on a page. These instructions render the glyphs that eventually make up the story on the page, but not necessarily in textual order. PDF has no inherent notion of the story’s text or the text’s structure, it doesn’t know of headers or footnotes or elements like emphasized text or a poem. This makes it difficult to extract text, and simply copy-and-pasting from a PDF can go awry. It is difficult to discriminate important text content and to dismiss inessentials like folios and headers (design relics that are not part of the story itself); it is difficult to extract text structure from styled content (e.g. “Is this line of bold large text a chapter title?”); and it is difficult to undo text changes that were applied by the design (e.g. “Is this hyphen at the end of the line intended, or is it just an ordinary line break?”). Using a PDF file (or similar paged presentation formats like InDesign or Word) to generate an ebook is cumbersome and can be prone to errors. It thus is time consuming and expensive. However, rethinking the process itself leads to a simpler, more efficient, and stable approach!

Bear with me for one more moment…

Today’s publishing industry largely looks at an ebook as an afterthought, as a continuation of the printed book. It’s a bit like putting the cart before the horse: the book’s story originates as a designed manuscript in a computer, and via the static print design (with all of its set-in-stone constraints) returns to it as an ebook. It’s a bit like glueing the pages of a book about graffiti onto a long wall, instead of using brush and spray paint to create a proper graffiti story.

This is where Bookalope is different.

It’s time to embrace the freedom that electronic devices offer instead of confining contemporary book design by a process that has hardened over decades and centuries. It’s time to make the electronic presentation of a story a first-class citizen equal to the printed book, and not its addendum. We have to rethink the process of book design for multiple media; we must design our stories for the various media simultaneously and equal to one another! However, this will work well for some stories and less so for others, because some stories want to be told with wide and sweeping images that cannot be confined to the small screen of a hand-held device. But, again, the reader chooses the media and its properties for her story, and the book designer ought to comply.

Figure 2 The Bookalope workflow.

Bookalope strictly separates the story’s content and structure from its visual presentation. Similar to the traditional publishing process, a book is designed based on the story and the publisher’s constraints, as well as the target medium. Bookalope provides various design templates, each tailored specifically for print book and ebook. Once a book has been built for a target medium, it is not the stepping-stone for yet another design for a different medium. By working from a single structured text source, Bookalope provides a highly efficient, consistent, and stable process without cumbersome and repetitive work that alternates between the story’s structure, content, and design.

When I read Richard Hendel’s essay “The Conundrum of the E-Book,” (in the aforementioned Aspects of Contemporary Book Design) I felt encouraged that Bookalope is doing the right thing, that it provides the right tools for this different way of building books. Bookalope strictly separates the content and structure of the story from its visual presentation; content and structure are edited and maintained throughout the entire process, independently of how the story will eventually be designed for a target medium.

Here is how it works. Bookalope provides intelligent and interactive tools that analyze the styling of a (finished) manuscript to help extract the story’s content and semantic structure from that manuscript. During the extraction, Bookalope cleans up the content and flags stylistic, textual, and typographical issues. Only when content and structure are smoothed and polished does Bookalope build books of various formats; it builds them depending on the target medium using styling specifically designed to accommodate each medium. The PDF print file is now equal to the electronic book, it is yet another output for Bookalope instead of being an intermediate step and input for the ebook addendum.

Print or e, large or small, whatever matters most to the reader—Bookalope puts the cart behind the horse where it belongs. It helps you design your book for any medium by providing an efficient process, without wasting time and resources on shuffling carts and horses around. Bookalope helps you build print books for print, ebooks for any device, and other formats for your custom workflow. It helps you design beautiful books. Because stories want to be told to amuse, entertain, and educate, no matter what the medium.

Welcome to Bookalope!