I'm going to collect information that I believe to be workable, without having done a huge amount of testing. That's simply because I don't have MS Office on this Mac computer.
Probably the biggest challenge is the conversion of formula objects into LaTeX. I'll return to this at the end. Conversion is a game that can be played at different levels of sophistication, and I'm looking for the simplest and cheapest routes here.
This is not a complete list of possible routes. More alternatives can be found at TUG. I'm only listing the things that I think are really worth trying.Assume you have a Word file
text.doc. Here I'll list some ways of dealing with this file, ranked in the order their quality:
The first ingredient is the platform-independent Java program
w2l, which is part of Writer2LaTeX. It is now included with NeoOffice/OpenOffice. If you want to help test development versions of this software, download the latest version and navigate to the subfolder folder
doc in the distribution. There you'll find
user-manual.html which explains how to put the executables in your
PATH (refer to the installation instructions for on "UNIX and friends").
As a second ingredient, you need to have NeoOffice or OpenOffice installed. This is because we need to load the Word
text.doc into NeoOffice/Openoffice first. Obviousy, this step can be skipped if you've been writing the document in Neo/OpenOffice all along. Now choose
Export → latex2e, and you're done.
text.odt. Once this is done, the rest is simple:
w2l text.odt text.tex pdflatex file pdflatex file
That's basically all there is to it. Additional options make it possible to customize the kind of PDF that is produced in the last step. If you have a literature database in OpenOffice, that can also be converted to a
.bib file using the command
w2l -bibtex, which is also part of Wiriter2LaTeX.
This method (exporting from NeoOffice) doesn't convert graphics, and doesn't work for displayed equations that are represented as images in OpenOffice. However, it does work for equations that have been entered within OpenOffice (as Formula Objects). Such equations can be converted to true LaTeX by w2l.
If most of your documents use math formulas, there's a style file that you should use in order to make the latex output more compact (otherwise each
w2l output contains the corresponding definitions in its preamble):
ooomath.styfrom the Writer2LaTeX distribution folder to your local texmf tree. Using fink's tetex, e.g., this would be
/sw/share/texmf-local/tex/latex/(or a subdirectory thereof). I'm assuming that you don't have a texmf tree in your home directory and want to install the style file for system-wide use.
With these capabilities, it is feasible to make a lossless round-trip from OpenOffice to LaTeX and back again. The back-translation to OpenOffice can be done using tex4ht. For more information, check out the links on my LaTeX page.
rtf2latex2etool. It can be installed via fink or i-Installer. If you decide to build the tool yourself, make sure you set the environment variable
textutil -convert rtf text.doc rtf2latex2e text.rtf pdflatex textThis procedure doesn't handle math and figures, but preserves the formatting of text quite faithfully. However, this method ranks below
rtf2latexcan't handle formulas created in OpenOffice. Also, be prepared to edit the preamble of the output file by hand.
xsltproctool installed (e.g. via fink or i-Installer)
textutil -convert wordml text.doc xsltproc -o text.tex wordml2latex.xsl text.xmland you get a properly formatted latex file,
text.tex. Neither graphics nor maths are currently supported, it seems.
textutil -convert html text.docThe converted HTML document has graphics and bitmapped formulas included. HTML is in principle a very readable source format, and at this point I would say one actually gains almost nothing in taking the extra step of converting this to LaTeX. The main point of LaTeX for me would be to be able to edit math formulas easily. But HTML conversion eliminates that possibility because it creates bitmaps from formulas. Nevertheless, there are several converters that all share the obvious name
html2latexbut differ in their capabilities as well as their implementation. An official place where you can find these converters (plus converters from HTML to other formats) is html2things. Most of these are so old that they don't recognize modern HTML tags or, e.g., style sheets. I've tried and ruled out the
sedscript, and found latex bugs with
nc-html2latex, so that the best remaining choice ended up being HTML to LaTeX (version 2.7). The fact that this converter happens to have no graphics support is really irrelevant for the reason stated above (images can't be edited anyway).