Zephyrnet Logo

Compare PDFs Visually

Date:

Sometimes a problem seems hard, but the right insight can make it easy. If you were asked to write a program to compare two PDF files and show the differences, how hard do you think that would be? If you are [serhack], you’ll make it much easier than you might guess.

Of course, sometimes making something simple depends on making simplifying assumptions. If you are expecting a “diff-like” utility that shows insertion and deletions, that’s not what’s going on here. Instead, you’ll see an image of the PDF with changes highlighted with a red box. This is easy because the program uses available utilities to render the PDFs as images and then simply compares pixels in the resulting images, drawing red boxes over the parts that don’t match.

Obviously, this is best for PDFs that just have a few changes. Inserting a paragraph, for example, makes the output pretty useless. For that, you might consider extracting the text from the PDF using something like pdf2text (which uses the same underlying library this uses to generate images).

The program thows a lot of messages about missing files but seems to do the job anyway. Here is the result of comparing two versions of the Hackaday home page captured to PDF a few minutes apart:

You can see, though, that if a new article was posted and everything slid down by one, you’d have nothing but a giant red block.

It is still a clever idea. There are surprisingly few tools out there for this, although we did find a few others. There are, of course, plenty of Linux tools for manipulating PDFs. Many of them are mashups of other tools like this one is.

spot_img

Latest Intelligence

spot_img