Vector screenshot

Cobra

2014-02-23 16:31

I just had to prepare a poster based on roughly 30 publications, and for several of them I didn't have the original figures but only the manuscript as a pdf file. Using okular and a magnification of 800%, I've got screenshots of these figures as comparatively highly resolved bitmaps, but the price I had to pay was that the editing of the poster in LibreOffice (which I've used as the least common denominator) was getting almost unbearably slow.

I couldn't silence the thought that it should be possible to take a 'vector screenshot' from a pdf file. I had the vague idea that pdftocairo could be useful in this respect, since it can output arbitrary parts of a pdf file as pdf or svg. And it turned out that Peter Williams, a young radio astronomer from Harvard, had the same idea and came up with a script which does exactly what I wanted.

I've fixed a small error (pageh should also be an integer) and ensured Arch and Fedora compatibility (python2), but otherwise its Peter's script:

	`#! /bin/bash`

	`# original: <https://gist.github.com/pkgw/3892706>`
	`# see <http://newton.cx/~peter/2012/10/extracting-pdf-figures-as-pdfs-in-linux/>`

	`margin=1`

	`# XPDF gives its y coordinates in terms of the standard PDF coordinate`
	`# system, where (0,0) is the bottom left corner and y increases going`
	`# up. But pdftocairo uses Cairo coordinates, in which (0,0) is the top`
	`# left corner and y increases going down. We can use pdfinfo to get`
	`# the page size to translate between these conventions.`

	`file="$1"`
	`page="$2"`
	`pageh=$(pdfinfo -f $page -l $page "$file" \|grep '^Page.*size' \`
	`\|sed -e 's/.* x ' -e 's/pts.*$')`

	`# Our variables end up in Cairo convention, so the box height is ybr -`
	`# ytl.`

	`xtl=$(python2 -c "import math; print int (math.floor ($3))")`
	`ytl=$(python2 -c "import math; print int ($pageh) - int (math.ceil ($4))")`
	`xbr=$(python2 -c "import math; print int (math.ceil ($5))")`
	`ybr=$(python2 -c "import math; print int ($pageh) - int (math.floor ($6))")`
	`w=$(python2 -c "print $xbr - $xtl")`
	`h=$(python2 -c "print $ybr - $ytl")`

	`# Lamebrained uniqifying of output filename.`

	`n=1`

	`while [ -f fig$n.pdf ] ; do`
	`n=$((n + 1))`
	`done`

	`# OK to go.`

	`echo pdftocairo -pdf -f $page -l $page -x $xtl -y $ytl -W $w -H $h \`
	`-paperw $w -paperh $h "$file" '\|' pdfcrop --margin $margin fig$n.pdf`
	`exec pdftocairo -pdf -f $page -l $page -x $xtl -y $ytl -W $w -H $h \`
	`-paperw $w -paperh $h "$file" - \| pdfcrop --margin $margin - fig$n.pdf`

Unlike Peter (and thanks to piet and haui), I can show an actual vector screenshot made by this script:

Vectorshot!

The size of this shot is 2.7 kB. A bitmap of this size showing the same section is so terribly ugly that I've decided not to present it here.