Friday, 7 January 2011

HTML to Word Document, the easy(ish) way

Ever wanted to write dynamic HTML and output it as a Word document? If not, I'd respectfully suggest you move on to another web page round about now ...

Still here? Great. The scenario is that I'm creating a large report in dynamically-written HTML (usual tool set of VB.Net/ASP.Net 3.5, SQL Server 2008); and I want to output it directly into a Word document.

I was previously converting the HTML into PDF, buut the third-party product we use for that grinds to a halt after about 2Mb of HTML data, and some of these reports are much bigger than that. The other thing everyone on our client base has on their machine is Office, so Word suddenly becomes an attractive output target.

It took a great deal of messing about and Googling, but it all boils down to
  1. Write the various Word XML headers.
  2. Write your HTML.
  3. Write the footers and XML closing tags.
  4. (And this is the crucial bit) Save the whole file out as a *.rtf.
Word will then happily open your document and display it as if it was a native Word document.

Have a look at http://www.pbdr.com/ostips/wordfoot.htm for the basic tags (Including how you can specify a footer to appear on each printed page), then build on that with http://mvark.blogspot.com/2007/02/how-to-add-header-or-footer-to.html.

If you want to 'manually' insert page breaks, just add a <br style='page-break-before:always'> tag at the appropriate point (page-break-after is also valid BTW).

Fun for all the family!

0 comments: