Jens Nöckel's Homepage

Computer notes home

Converting from Word to LaTeX on Macs

This page is for LaTeX users who face one of three scenarios:

I'm going to collect information that I believe to be workable, without having done a huge amount of testing. That's simply because I don't have MS Office on this Mac computer.

Probably the biggest challenge is the conversion of formula objects into LaTeX. I'll return to this at the end. Conversion is a game that can be played at different levels of sophistication, and I'm looking for the simplest and cheapest routes here.

Different routes to get a LaTeX file

This is not a complete list of possible routes. More alternatives can be found at TUG. I'm only listing the things that I think are really worth trying.

Assume you have a Word file text.doc. Here I'll list some ways of dealing with this file, ranked in the order their quality:
Word → OpenOffice ODT → LaTeX
This produces the best LaTeX, to my knowledge.

The first ingredient is the platform-independent Java program w2l, which is part of Writer2LaTeX. It is now included with NeoOffice/OpenOffice. If you want to help test development versions of this software, download the latest version and navigate to the subfolder folder doc in the distribution. There you'll find user-manual.html which explains how to put the executables in your PATH (refer to the installation instructions for on "UNIX and friends").

As a second ingredient, you need to have NeoOffice or OpenOffice installed. This is because we need to load the Word text.doc into NeoOffice/Openoffice first. Obviousy, this step can be skipped if you've been writing the document in Neo/OpenOffice all along. Now choose Export → latex2e, and you're done.

Here is the (now outdated) command-line approach which I used before w2l was part of OpenOffice:
Save as text.odt. Once this is done, the rest is simple:
w2l text.odt text.tex
pdflatex file
pdflatex file

That's basically all there is to it. Additional options make it possible to customize the kind of PDF that is produced in the last step. If you have a literature database in OpenOffice, that can also be converted to a .bib file using the command w2l -bibtex, which is also part of Wiriter2LaTeX.

This method (exporting from NeoOffice) doesn't convert graphics, and doesn't work for displayed equations that are represented as images in OpenOffice. However, it does work for equations that have been entered within OpenOffice (as Formula Objects). Such equations can be converted to true LaTeX by w2l. If most of your documents use math formulas, there's a style file that you should use in order to make the latex output more compact (otherwise each w2l output contains the corresponding definitions in its preamble):

  1. copy ooomath.sty from the Writer2LaTeX distribution folder to your local texmf tree. Using fink's tetex, e.g., this would be /sw/share/texmf-local/tex/latex/ (or a subdirectory thereof). I'm assuming that you don't have a texmf tree in your home directory and want to install the style file for system-wide use.
  2. Run sudo texhash.

With these capabilities, it is feasible to make a lossless round-trip from OpenOffice to LaTeX and back again. The back-translation to OpenOffice can be done using tex4ht. For more information, check out the links on my LaTeX page.

Word → RTF → LaTeX
Use the rtf2latex2e tool. It can be installed via fink or i-Installer. If you decide to build the tool yourself, make sure you set the environment variable RTF2LATEX2E_DIR properly.
textutil -convert rtf text.doc
rtf2latex2e text.rtf
pdflatex text
This procedure doesn't handle math and figures, but preserves the formatting of text quite faithfully. However, this method ranks below w2l because rtf2latex can't handle formulas created in OpenOffice. Also, be prepared to edit the preamble of the output file by hand.
Word → WordML → LaTeX
WordML is a "dialect" of XML, the Extensible Markup Language. As such, it should in principle be powerful enough to encode all the textual, formatting and math equation content of a typical scientific document - but it isn't yet, at this point. There will be even more XML support in future versions of Office. Here are the steps one could take to produce LaTeX via this XML route:
  1. Make sure you have the xsltproc tool installed (e.g. via fink or i-Installer)
  2. Download wordml2latex.xsl, and put that file in a location where you can find it when needed. I'll assume for simplicity that it's in the same directory where your Word file text.doc is.
Now the procedure is as follows:
textutil -convert wordml text.doc
xsltproc -o text.tex wordml2latex.xsl text.xml
and you get a properly formatted latex file, text.tex. Neither graphics nor maths are currently supported, it seems.
Word → HTML → LaTeX
To get math and images into the LaTeX document, the simplest method is to treat them all as graphics.
textutil -convert html text.doc
The converted HTML document has graphics and bitmapped formulas included. HTML is in principle a very readable source format, and at this point I would say one actually gains almost nothing in taking the extra step of converting this to LaTeX. The main point of LaTeX for me would be to be able to edit math formulas easily. But HTML conversion eliminates that possibility because it creates bitmaps from formulas. Nevertheless, there are several converters that all share the obvious name html2latex but differ in their capabilities as well as their implementation. An official place where you can find these converters (plus converters from HTML to other formats) is html2things. Most of these are so old that they don't recognize modern HTML tags or, e.g., style sheets. I've tried and ruled out the sed script, and found latex bugs with nc-html2latex, so that the best remaining choice ended up being HTML to LaTeX (version 2.7). The fact that this converter happens to have no graphics support is really irrelevant for the reason stated above (images can't be edited anyway).

noeckel@uoregon.edu
Last modified: Sun Jul 8 11:38:48 PDT 2007