Wednesday, September 4, 2013

Removing the background of an image

While in the process of polishing up a paper, I had to draw figures using a well-known proprietary program for drawing chemical formula whose name I will not write down, out of decency (but you know which one it is anyway). I'm using the mac version (yes, no version for Linux) which has a neat glitch in that the "Save As" button does not work at all: no way to export your chemical formula as a SVG/PDF/PNG picture or whatever ! So I resorted to the neat Print as PDF feature of macintosh. I had to edit the resulting PDF files by hand as some of the colors were wrong (great !), and when I tried to include the PDF, I realized they had a white background... No way !

So I dug in ImageMagick's convert documentation, and came up with the following code that converts the PDF file into a PNG with a transparent background:

convert -density 600 figure.pdf -channel alpha \
  -fx '((r == 1 && g == 1 && b == 1) ? 0 : 1)' figure.png

There are still some few points that look white but are probably not that white (due to antialiased rendering of the PDF file ?). There are many more things that can be done using the -fx operator, this page was helpful to me !

Edit: while the alpha channel seems to be on by default for PDF files, it is not necessarily the case for all images. If the above doesn't work, try adding -alpha Set before the -fx bit.

6 comments:

Anonymous said...

Hi Vincent.
As a scientist myself, I understand perfectly:)
Anyhow, if the number of your PDFs is of reasonably small order, you could try to:
1. Import your PDF into GIMP
2. Use the function named, ehm, "transform color to alpha" (or something like that) in "colors" menu. Choose white.

Results are FAR better, I believe. Also, Elsevier totally digs EPS-es (and others) exported in GIMP, so it is one of my primary 'publication' tools:)

I wish your papers good reviews.
regards
w.

Vincent Fourmond said...

That's a good tip: the quality is indeed better: while with imagemagick there is still a bit of white lingering around the black lines of the chemical figure, gimp does the thing properly.

I have no clue what is the difference, though, since in principle that's the same rendering engine (ie ghostscript). They must be using different options (there are so many).

Downside is, it's not command-line enough for me ;-)!

Thanks for your kind words !

Vincent Fourmond said...

Got it, it's aliasing. If I convert to PPM beforehand using pdftoppm and disabling antialiasing or using the option +antialias, it looks better (though still not as good as gimp, I wonder how they do it !)

nurul said...

I visited your site.I liked your sites.You wrote about the background removal.It is very helpful for using photoshop.I will visite your site again as soon as possible.

Anonymous said...

Would tikz and its associated packages for chemical plotting not work for this particular use case?

chrysn said...

you could try opening the pdf in inkscape. that will allow you to ungroup the image, identify the background plane, and remove it. that way, your vector graphics will stay vector graphics. (worst case you'd have to run the pdf through pdf2ps/ps2pdf before.)