Converting HTML to PDF

When I’m looking at a web page with Safari, all I have to do is hit Command-P and save as PDF, and I have a nice PDF of the page.



Is there a way to do this automated and server-side on a standard linux system? I need to have a web app generate these PDFs dynamically on the fly. And they need to look good, with different designs, etc., so it’s more than just getting text to show on a page.



I’ve come across HTMLDOC, and it does the job in theory, but only for crummy HTML—supports most of HTML 3.2, some of HTML 4.0, and no CSS. Not great, if I also want to produce reasonable HTML output.



Are there any other options out there that you’re aware of? Is there code in Webkit for this, and would it be possible to get some of that to run on a Debian box? Hm, doesn’t sound likely.

7 comments

Mark Aufflick
 

First render the html to postscript. There are a few options for this: One I just found (but haven't tried) is: <a href="http://user.it.uu.se/~jan/html2ps.html">http://user.it.uu.se/~jan/html2ps.html</a> The way I always used to do it was to use netscape in batch print mode and direct the output to a file. I assume you can do the same with firefox. The beauty of this approach is that you know you will get the same rendering as firefox, the downside is that you can't install firefox on a server without X libraries and it's a fairly heavy way of doing it. Once you have the postscript, ps2pdf will make it into a pdf for you - it's available as a default package in every linux distro I've ever used.
Read more
Read less
  Cancel
Mark Aufflick
 

Looks like typo just typo'd the tilde in the url. It should read: http://user.it.uu.se/~jan/html2ps.html
Read more
Read less
  Cancel
Andreas Haugstrup
 

Prince, which I haven't tried, has a command line interface. Don't know much about it. <a href="http://www.princexml.com/">http://www.princexml.com/</a>
Read more
Read less
  Cancel
Lars Pind
 

Mark, thanks for the tip, it's amazing how helpful you always are, it's deeply appreciated. Thank you! Andreas, Prince looks like a perfect fit, except, of course, for the $3800 server license, which is a little steep. But at a first glance, it looks clean.
Read more
Read less
  Cancel
Andreas Haugstrup
 

Heh, that's what I get for not looking at the price before commenting. :o)
Read more
Read less
  Cancel
Malte Sussdorff
 

You could install OpenOffice, run a vncserver so OpenOffice has something to connect to in the background, and have a Macro to convert HTML to PDF using OpenOffice (which would allow you to convert anything OO can read to PDF).
Read more
Read less
  Cancel
Lars Pind
 

Hi Malte Thanks for the tip. Thanks to your suggestion, I found "this":http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html, which has some example macros, and spells out in more detail how this could be done. /Lars
Read more
Read less
  Cancel

Leave a comment