Jens Nöckel's Homepage

Computer notes home

Replacing Microsoft Word

Why you should not read this

I don't really have many uses for a conventional word processor because I create documents mainly using Xemacs, LaTeX and Mathematica (these links have some background information on the creators of TeX and Mathematica), and LyX. But occasionally people send me documents in Microsoft Word format, and by now I have only one computer left in my group that actually has Word (or any Office program) installed. So the problem is: how to render Word .doc files useful if they come your way relatively infrequently.

If you're interested specifically in how to convert from Word to LaTeX, have a look at the Word → LaTeX page.

The following alternatives to Word on Mac OS X are discussed on this page:

Of course, one should also ask: If we want to get rid of MS Word, what do we replace it with? There is no simple answer, but in an science-and-engineering environment one possible answer would be: any of a variety of editors that support LaTeX. If you want to find out more about that, there's a separate page dedicated to converting from MS Word to LaTeX format. On this page, the original motivation is the somewhat more general task of converting from Word to something else, just as long as it's not Word. As a by-product, I also raise the question of how to convert from HTML to something else. This is discussed from a Mac-OS X specific point of view, and therefore programs like Pages and TextEdit play a role.

What is Pages?

Shouldn't this read "what are pages?" No, I'm referring to a word processor and layout program named Pages which is sold by Apple as part of the iWork bundle. This is not an in-depth review, more an aside to my remarks on Keynote and on LaTeX. If you have to collaborate with MS-Word users, Pages may be a good solution because it supports not only the MS file format but also the important collaboration feature of change tracking. On UNIX platforms such as Mac OS X, the change tracking feature for text files is achieved using the ci, co -l and rcsdiff commands of the Revision Control System (RCS), see e.g. man rcsintro, but that's of course a solution which can't be applied with Word users.

Pages as an editor: good because of Services

Before describing ways of importing "foregin" formats using Pages, I'll try to give it some credit based on what it can do on its own. Ease of use is a big plus with Pages, and for short documents that are designed to be printed or emailed quickly, this program is a good choice. You can do similar small tasks with Word, but Pages has the big advantage that it is well-integrated with Mac OS X.

Unlike Word, Pages supports Services such as the ones provided by LaTeXiT, the math equation editor I also discuss in conjunction with Keynote and Adobe Illustrator. Take the following example of a text with a formula. To insert the equation, I create a Text Object Box (in the toolbar, Objects → Text) and type the LaTeX code into it:

Then I select the menu item Pages → Services → LaTeXiT → Typeset math inline to get a formula with the same character height as the text font in which I typed. The result looks like this:

Now the equation will automatically flow with the text paragraphs if I decide to, e.g., insert or delete material above the formula. If I don't like the size of this formula, I can drag it to scale it as large as I like. This works because the inset equation is actually a PDF vector graphic. The ability of math insets to flow with the text is very important, and Pages combines this with the full power of LaTeX by using the LaTeXiT Service.

The example equation above was displayed on a separate line. What about inline equations? Although this works in principle, the baseline of the formula may not appear correctly aligned with the surrounding text line. If this happens, there are two things you can try:

  1. In the Preferences of the LaTeXiT application, go to the Services tab and check the box "Align the equation in the original text", see below:

    With this, I get the following appearance in Pages:
  2. Manually fine-tune the alignment by highlighting the equation and then opening the Inspector to display the tab shown in the screenshot on the right. At the bottom you'll see the baseline offset box.

With the help of Services such as the one illustrated here, an editor (or any other application) can grow almost without limits in terms of capabilities and ease of use.

Another possible area where Pages could benefit from Services would be bibliography handling. There is currently no support for that in Pages, and I'm not aware of a Service. However, there is an AppleScript that looks promising, created by Jim Harrison. It works together with BibDesk, a bibtex-based citation manager that no LaTeX user on a Mac should be without (as I already mention on a page from several years ago).

The mechanics of writing a service is made easier by ThisService, a helper application that can be used with the programming language of your choice.

What if I don't have a TeX/LaTeX installation?

Mac OS X (Tiger) comes with a great little Utility called Grapher, which lets you type arbitrary mathematical expressions. There's a symbol palette, but the editor also understands many LaTeX-like shortcuts (e.g., "alpha"... for greek characters, and symbols like "infinity") so that you can make fairly complex equations in a natural way, and with immediate visual feedback.

Here's a screenshot:



Once you have an equation prepared like this, there are three things you can do (thanks to Derrick Johnson for pointing this out as a tool in conjunction with Pages):

  1. Make a graph (obviously, since that's the main purpose of the program)
  2. Right click on the equation and select Copy as LaTeX expression. This is useful for importing LaTeX snippets into another LaTeX source file.
  3. Right click on the equation, or highlight it and go to the Edit menu, and then select Copy as.... Here you can select, among others, a PDF version of the formula. This can then be pasted into other applications such as Pages. You can achieve the same thing by dragging the formula.

Note that you can use Grapher to write mathematical equations even if they are in no way ready to be plotted. The editor built into the program is basically an all-purpose equation editor!

Some alternatives to Pages

There are other programs that do a good job of replacing Word (at least as converters for .doc files). For example:

Some additional comments about Pages

I find Pages useful for reading Word documents. However, there are some reasons why I'll have to discuss alternative programs below:

Pages is not a general purpose editor or file format wizard. Its import and export capabilities are limited, not unlike those of MS Word. An important format to discuss is HTML. Pages 3 can't import or export HTML files, but does open and export RTF(d) format. This file format allows you to combine Pages with TextEdit. And TextEdit does do a reasonable job at importing and exporting HTML, as I'll describe below.

I don't have anything to say about exporting to Word format (Pages has that option), because I never do that. A good review with more in-depth information is found at obviousdiversion.com/.

Remember: if you send email to people outside your own immediate work environment, use universally accepted formats like plain text, RTF or PDF (listed by simplicity of the source, and I left out HTML because it runs into problems with spam filters). Pages is a great tool for doing this, but Microsoft word can of course do it, too! Now you're perhaps saying, "yes, Word can do it, but it's a hassle to export in those formats." Common sense will then tell you what a hassle it must be for people who want to import your Word document!

How to import HTML into Pages via TextEdit

TextEdit probably deserves more recognition than I have given it up to now. So the prescription for how to get HTML documents into Pages is:
  1. Open the HTML file in TextEdit
  2. Save from TextEdit
  3. Open the resulting rtfd file in Pages.

The TextEdit program does a decent job of reading in HTML layouts as well as images. Because of the image inclusion capabilities, this route for reading in HTML documents is better than what you get by directly using older version os Pages, and also better than what textutil does (see below).

Given the above state of affairs, one has to wonder why Apple didn't implement my three-step procedure as a menu item in Pages '08. After all, I've had this prescription on this page since August 2006. Wouldn't it be nice if TextEdit became more integrated with Pages? I suspect Apple don't read my web site. Well, maybe they read the warning at the top.

If you're having trouble with Word documents in Pages, definitely try TextEdit, too! It's free and small, but surprisingly versatile. This has recently been discovered by Macworld, too.

How to export HTML from TextEdit

The export options of TextEdit include webarchive and, if the file is formatted text but nothing else, also pure HTML. The webarchive format is used when several files (especially images) have to be bundled together with the web page. This archive is easily expanded back to a HTML file with the requisite files by typing textutil -convert html filename.webarchive in the Terminal (where filename is the name of your archive). More on textutil below. The command line can also be avoided by using WebArchive Extractor, a little application created precisely for this purpose.

The result has correct formatting but may not show all the images that were present in the original TextEdit file, if those images were not in a HTML-compatible format. For example, PDF or TIFF files can't be displayed as images by a web browser. Additional post-processing is needed to make the respective images show up on the exported page.

textutil

With Mac OS 10.4, the trend toward washing out the boundaries between applications and the operating system itself seems to continue. The OS is like a giant repository of functionality, not just the bookkeeper that watches over your files and processes.

For the problem of Word document conversion, the Tiger operating system offers a new function called textutil. This is not to be confused with the GNU textutils. textutil is a command-line utility that can convert between a variety of different formats, including Word doc, WordML, HTML and RTF. See my notes on conversion to LaTeX for specific applications of textutil.


noeckel@uoregon.edu
Last modified: Sat Aug 10 18:37:18 PDT 2013