Friday 6 February 2015

How to remove a watermark from a PDF file With Software And Command Prompt?

Posted by with No comments

I thought this would be a simple task, but it turned out the other way.

The watermark is the very same (overlapping, but transparent) image on every single page. I created the PDF file myself (so no copyright worries here) using PDFCreator 0.9.8.

I have already tried my friend's Adobe Acrobat Pro, but it didn't work. It tries to remove it, but it can't. I tried to remove header/footer, etc., but the watermark just won't disappear. 

For image-based watermarks, there are several tools that promise their automatic removal. For example:

All of these are free to try, but require a license to actually produce the desired output.

A less time-consuming solution would be to "manually" remove the watermark. We need:

Steps

  1. Download Pdftk and extract pdftk.exe and libiconv2.dll to %windir%\System32, a directory in the path or any other location of your choice.

  2. Download and install Notepad++.

  3. PDF streams are usually compressed using the DEFLATE algorithm. This saves space, but it makes the PDF file illegible.

    The command

    pdftk original.pdf output uncompressed.pdf uncompress

    uncompresses all streams, so they can be modified by a text editor.

  4. Open uncompressed.pdf with Notepad++ to reveal the structure of the watermark.
    In this specific case, every page begins with the block
    q 9 0 0 9 2997 4118.67 cm
    BI
    /CS/RGB
    /W 1
    /H 1
    /BPC 8
    ID Ÿ®¼
    EI Q
    
    and nearly 4,000 blocks just like this one. This particular block sets only one (/W 1 /H 1) of the watermark's pixels.
    Scrolling down until the pattern changes reveals that the watermark's stream is 95,906 bytes long (counting newlines). The exact same stream is repeated on every page of the PDF file.
  5. Press Ctrl + H and set the following:

    Find:               q 9 0 0 9 2997 4118\.67 cm.{95881}
    Replace:            (blank)
    Match case:         checked
    Wrap around:        checked
    Regular expression: selected
    . matches newline:  checked 
    

    The regular expression q 9 0 0 9 2997 4118\.67 cm.{95881} matches the first line of the above block (q 9 0 0 9 2997 4118.67 cm) and all following 95,881 characters, i.e., the watermark's stream.
    Clicking Replace All removes it from all pages of the PDF file.
  6. The watermark has now been removed, but the PDF file has errors (the streams' lengths are incorrect) and it's uncompressed.
    The command
    pdftk uncompressed.pdf output nowatermark.pdf compress
    
    takes care of both.
  7. uncompressed.pdf is no longer needed. You can delete it.
The result is the same PDF without the watermark (and about half the size).

 

0 comments:

Post a Comment