Jens Nöckel's Homepage

Computer notes home

Manipulating PDF on the Mac clipboard

This page addresses some older issues that may arise when you try to copy PDF clippings between different applications on Mac OS X. My most recent encounter with this type of problem is recorded on a separate page, "Pasting externally generated PDF into Mathematica". Depending on the combination of software you're using, you may find one or both of these pages helpful.

However, my own approach currently is different: Avoid using Preview.app as a PDF viewer. Use Skim instead! Especially if you intend to select a crop rectangle from your PDF and copy only that crop area, Skim will work better when combined with the solution to the above-mentioned Mathematica issue.

Table of contents
Starting point
Some background. For historical reasons, the sections are written in chronological order from the top down.
Fixing the Preview clipping dimensions with ghostscript
My oldest solution, included mainly for historical reasons and to help diagnose problems.
Fixing the Preview clipping dimensions without ghostscript
Works with Keynote '08, but doesn't fix Mathematica issue.
The same as above, but as application instead of a script
If you prefer applications over scripts.
Adjust MediaBox, no ghostscript
Most recent version, fixes the issue for Mathematica 7 and Keynote '08.
Revisions

Copying PDF selections from Preview to Mathematica or iWork '08 under Lepoard

The crop you sow and the crop you reap

If you want to copy snippets of a PDF document into Apple's Keynote '08 presentation software (or any other iWork '08 application, for that matter - i.e., Pages and Numbers), there is a new problem that "cropped" up with the upgrade to Leopard: a rectangle selected and copied within Peview doesn't get pasted into Keynote in its cropped form. The pasted PDF content in Keynote looks as if you never cropped anything at all.

This issue has been fixed in iWork '09. If you have an older version, read on. I'm leaving this page up for future reference because it shows a method to use Leopard's built-in Python to manipulate PDF data on the Cocoa Pasteboard.

The reason for the PDF cropping issue with Preview is explained by Martin Costabel in this mail thread.

On this page you can download a fix for this cropping issue and for a problem that manifests itself similarly in Mathematica 7. I packaged it in different flavors, so you should look through the following notes to see which solution you like best. My initial solution is an Applescript that requires an additional piece of software (ghostscript). In response to a suggestion by Ken Drake at KeynoteUser.com, I'm also posting a second version of that Applescript which does not require any additonal software. If you don't know what ghostscript is, just skip to the stand-alone version. Finally, you may prefer the even easier-to-install Application bundle.

Ghostscript required

The script can be downloaded as
clipPDF-gs-1.0.zip
Here are the steps to make this work painlessly:

If the PDF fix is successful, the script remains completely silent so as not to interfere with the workflow. I emphasize this so you don't get suspicious if nothing seems to happen after activating the menu item. If there is a problem (e.g., if there was no PDF on the clipboard), the script should tell you so with an Alert box.

If there is an error converting the PDF, the most likely reason is that gs hasn't been found. Make sure you set up the PATH as mentioned above. Alternatively, you can edit the python script to hard-wire the path to ghostscript. This is done by adding the path to the variable gsBin. Another potential cause of errors is an outdated version of ghostscript - use version 8.61 or above.

The python script is shown below:
#!/usr/bin/python
import os,sys,tempfile
from AppKit import *
from Foundation import *

rr=64
gsBin = "gs"
board=NSPasteboard.generalPasteboard()
result = board.dataForType_(NSPDFPboardType)
if result:
        tmpdir = tempfile.mkdtemp("","clipPDF","/tmp")
        inname=tmpdir+"/infile.pdf"
        outname=tmpdir+"/outfile.pdf"
        result.writeToFile_atomically_(inname,1)
#
#       ghostscript (gs) is used to rewrite the PDF file. You may have to hard-code the path in gsBin:
#
        rr=os.spawnlp(os.P_WAIT,gsBin,gsBin,"-sDEVICE=pdfwrite", "-sOutputFile="+outname, "-q", "-dbatch", "-dNOPAUSE", inname, "-c", "quit")
        if rr==0:
                content=NSData.dataWithContentsOfFile_(outname)
                board.declareTypes_owner_([NSPDFPboardType], None)
                board.setData_forType_(content, NSPDFPboardType)
                os.remove(outname)
        os.remove(inname)
        os.rmdir(tmpdir)
sys.exit(rr)

If you want to make modifications to the script, you can find it by selecting Show Contents from the contextual Finder menu of the clipPDF bundle. You can then edit Contents/Resources/Scripts/clipPDF.

No ghostscript required

For users who don't want to install ghostscript, here is another version of the script that requires no addional software at all. It does the job just fine, if all you care about is pasting into Keynote. With ghostscript, you get more peace of mind that you have a standards-compliant PDF on the clipboard, but it may not be worth the additional installation effort if you have no other use for gs.

The script below simply performs a search for "/ArtBox [...]" and deletes that entry from the PDF file.

The script can be downloaded as
clipPDF-noGS-1.1.zip

The installation instructions are simpler than above:

As with the other version: If the PDF fix is successful, the script remains completely silent so as not to interfere with the workflow. I emphasize this so you don't get suspicious if nothing seems to happen after activating the menu item. If there is a problem (e.g., if there was no PDF on the clipboard), the script should tell you so with an Alert box.

The Python code is shown here:
#!/usr/bin/python
#
#   clipPDF version 1.1
#
import os,sys,tempfile,re
from AppKit import *
from Foundation import *

rr=64
ruleT = re.compile(r"(\s*\/ArtBox\s*\[.*?\]\s*)", re.DOTALL)
board=NSPasteboard.generalPasteboard()
result = board.dataForType_(NSPDFPboardType)
if result:
        tmpdir = tempfile.mkdtemp("","clipPDF","/tmp")
        inname=tmpdir+"/infile.pdf"
        outname=tmpdir+"/outfile.pdf"
        result.writeToFile_atomically_(inname,1)
        #
        #   Writing to a file and re-opening seems to be the easiest
        #   way to avoid type conversion problems for the PDF binary file.
        #   If I wanted to avoid this, I'd have to additionally
        #   import PyObjCTools.Conversion, so the net timing is equal.
        #
        try:
                resultFile=open(inname,"r")
                result2 = resultFile.read()
                resultFile.close()
        except:
                rr=74
        else:
                content = ruleT.sub(" ",result2)
                try:
                        resultFile=open(outname,"w")
                        resultFile.write(content)
                        resultFile.close()
                except:
                        rr=74
                else:
                        content=NSData.dataWithContentsOfFile_(outname)
                        board.declareTypes_owner_([NSPDFPboardType], None)
                        board.setData_forType_(content, NSPDFPboardType)
                        os.remove(outname)
                        rr=0
        os.remove(inname)
        os.rmdir(tmpdir)
sys.exit(rr)

Windowless Application, no ghostscript

For some people it may be inconvenient to access clipPDF via the AppleScript menu. That's basically a matter of taste, but in case you prefer having the same functionality running as an Application that you can keep on the Dock, here is the solution:
clipPDFservant.zip
This does exactly the same as clipPDF, but there is zero installation required. I'd recommend dragging the clipPDFservant Application to a folder such as Applications or Utilities.

When you have a PDF clipping on the clipboard, launch clipPDFservant. It will do its job and stay open. Next time you want to use it, click on its icon in the Dock and it will do its job again. The Application has no window, but you launch and quit it the way you would for any other Application.

One thing that this version doesn't do yet is to re-execute the clipPDF script when it's selected by Command-Tab while already running. You actually have to click on its icon to stir it into action.

For reference, here is a thread on the Apple discussion forum where these modifications took shape.

Adjust MediaBox, no ghostscript

This version is adapted especially for use with Mathematica 7. The issue is discussed in this mailing list thread. The script not only removes the ArtBox information from the PDF file, but also sets the MediaBox equal to the CropBox, if it is present in the file. When pasting copied PDF content into a Mathematica 7 notebook, the MediaBox is used to determine which parts of the PDF is to be shown.

The script can be downloaded as
clipPDF-noGS-1.2.zip
On Leopard, the script should run out of the box. On OS X versions 10.4 and below, you have to install PyObjC first, and then change the first line of the file clipPDF.scptd/Contents/Resources/Scripts/clipPDF to point to the python interpreter for which PyObjC is installed.

In Mathematica 8, this script isn't needed. In fact, it now seems that Mathematica 8 doesn't correclty process pasted PDF after it has been treated with my script (although all other programs I tested accept the output of clipPdDF).

Here is the Python source:

#!/usr/bin/python
#
#   clipPDF version 1.2
#
#   This Python/PyObjC script removes /ArtBox from a PDF file, and 
#   sets the /MediaBox equal to the /CropBox if the latter is specified in the file.
#
import os,sys,tempfile,re
from AppKit import *
from Foundation import *

rr=64
ruleA = re.compile(r"(\s*\/ArtBox\s*\[.*?\]\s*)", re.DOTALL)
ruleM = re.compile(r"(\s*\/MediaBox\s*\[.*?\]\s*)", re.DOTALL)
ruleC = re.compile(r"(\s*\/CropBox\s*\[(.*?)\]\s*)", re.DOTALL)
board=NSPasteboard.generalPasteboard()
result = board.dataForType_(NSPDFPboardType)
if result:
        tmpdir = tempfile.mkdtemp("","clipPDF","/tmp")
        inname=tmpdir+"/infile.pdf"
        outname=tmpdir+"/outfile.pdf"
        result.writeToFile_atomically_(inname,1)
        #
        #   Writing to a file and re-opening seems to be the easiest
        #   way to avoid type conversion problems for the PDF binary file.
        #   If I wanted to avoid this, I'd have to additionally
        #   import PyObjCTools.Conversion, so the net timing is equal.
        #
        try:
                resultFile=open(inname,"r")
                result2 = resultFile.read()
                resultFile.close()
        except:
                rr=74
        else:
                content = ruleA.sub(" ",result2)
                cropbox = ruleC.search(content)
                if cropbox:
                        resizedContent = ruleM.sub(" /MediaBox ["+cropbox.group(2)+"] ", content)
                        content = resizedContent 
                try:
                        resultFile=open(outname,"w")
                        resultFile.write(content)
                        resultFile.close()
                except:
                        rr=74
                else:
                        content=NSData.dataWithContentsOfFile_(outname)
                        board.declareTypes_owner_([NSPDFPboardType], None)
                        board.setData_forType_(content, NSPDFPboardType)
                        os.remove(outname)
                        rr=0
        os.remove(inname)
        os.rmdir(tmpdir)
sys.exit(rr)

Revisions

clipPDF-1.2
Current version (04-26-2009). Differs from clipPDF-1.1 in that it also sets the MediaBox equal to the CropBox.
clipPDFservant-1.0
Initial version (04-13-2008). Same as clipPDF-1.1, but run as an Application.
clipPDF-1.1
Modified version (04-11-2008), for use mainly with iWork 08. This revised version stays silent even when the PDF on the clipboard required no fixing (e.g. if you copied from LaTeXiT).
clipPDF-1.0
Initial version of clipPDF (04-05-2008).

noeckel@uoregon.edu
Last modified: Sun Dec 23 11:05:13 PST 2012