Wednesday, 10 April 2013

Counting the number of colour pages in a PDF with Ghostscript

Using Centos and Ghostscript (9.05 and above):

I am developing a print on demand solution at work and came up against an issue where I wanted to to be able to inspect a PDF file and work out where the colour pages appeared (if at all).

I am using Centos 5 as the production servers but this solution should work on any Linux system running Ghostscript 9.05 or above.

Ghostscript 9.05 is required as you need access to the inkcov device to be able to work out the ink coverage for each of the CMYK plates.

Now I had to make the assumption that if C=M=Y that the page had 4 colour grey/black on it rather than it being a colour image.  This may not be a 100% foolproof method but it was the only way I could think of determining if the page was colour or greyscale.

Anyway, here is the script I came with:


gs -o - -sDEVICE=inkcov test.pdf  | grep -v "^ 0.00000  0.00000  0.00000" | awk '{if($1~/Page/){lastpage=$2;} if($1!~/Page/ && $1~/0./ && ($1!=$2 && $1!=3) && $1!=0.00000){ print lastpage}}'

Basically the command queries the PDF and ignores the lines returned where CMY are 0.

AWK is then used to ignore the lines that are irrelevant and print only the page number for the pages which were determined to be colour.

No comments:

Post a Comment