Convert PDF to HTML C#

Are you looking for high-speed C# library for converting PDF files to HTML? If yes, this article is going to make your life easier by providing a powerful and high-quality solution for PDF to HTML conversion in C#. You will also learn how to customize PDF to HTML conversion using various options. So let’s go through a step-by-step guide and code samples to perform this conversion.

  • C# PDF to HTML Conversion Library
  • Convert PDF to HTML using C#
  • Convert PDF to HTML with Embedded Resources in C#
  • Save Images to Specific Folder in PDF to HTML Conversion
  • Convert PDF to Multipage HTML in C#

C# Library to Convert PDF to HTML

For PDF to HTML conversion, we will use Aspose.PDF for .NET. It is a powerful PDF library to create, process, and convert PDF file. Also, you can perform PDF to HTML conversion ABSOLUTELY FREE.

You can download Aspose.PDF for .NET or add it to your project using NuGet Package Manager.

PM> Install-Package Aspose.PDF

Convert PDF to HTML in C#

Converting a PDF document to HTML is as simple as pie and you can do it in a couple of lines of code. Simply, follow the below steps.

  • Load the PDF document using Document class.
  • Save PDF as HTML using using Document.Save() method.

The following code sample shows how to convert a PDF to HTML in C#.

// Open the source PDF document
Document pdfDocument = new Document("PDFToHTML.pdf");

// Save the file into MS document format
pdfDocument.Save("output_out.html", SaveFormat.Html);

Aspose.PDF takes care of reading all the internal details of the PDF format and converts it to HTML. Interestingly, you don’t need to have PDF reader programs installed at your end.

C# PDF to HTML Conversion with Embedded Resources

You can also convert PDF to HTML with all the resources as part of the output HTML. This will result in making all the elements of a PDF file (images, CSS, and fonts) embedded into the output HTML. This is achieved by using the HtmlSaveOptions.PartsEmbeddingMode enumerator.

The following code sample shows how to convert PDF to HTML with embedded resources using C#.

// Load source PDF file
Document doc = new Document("input.pdf");
// Instantiate HTML Save options object
HtmlSaveOptions newOptions = new HtmlSaveOptions();

// Enable option to embed all resources inside the HTML
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted 
newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

// Output file path 
string outHtmlFile = "SingleHTML_out.html";
doc.Save(outHtmlFile, newOptions);

C# Save PDF as HTML with Image Folder

A PDF document can contain images in addition to textual details. On the other hand, an HTML file can also contain images that are based-64 encoded inside the HTML or referenced from a folder where these images are located. Aspose.PDF for .NET has rich features of saving images to a user-specified folder on a disc.

The following code sample shows how to save images to a specific folder during the conversion of PDF to HTML in C#.

// Load source PDF file
Document doc = new Document("input.pdf");

// Create HtmlSaveOption with tested feature
HtmlSaveOptions newOptions = new HtmlSaveOptions();

// Specify the separate folder to save images
newOptions.SpecialFolderForAllImages = "MyFolder";

// Output file path 
string outHtmlFile = "HTML.html";
doc.Save(outHtmlFile, newOptions);

Export PDF to Multipage HTML in C#

Aspose.PDF doesn’t stop you here as it has a lot of options to customize PDF to HTML conversion. You can also split the HTML into multiple pages during conversion.

The following code sample shows how to export PDF to a multipage HTML in C#.

// Open the source PDF document
Document pdfDocument = new Document("PDFToHTML.pdf");

// Instantiate HTML SaveOptions object
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();

// Specify to split the output into multiple pages
htmlOptions.SplitIntoPages = true;

// Save the document
pdfDocument.Save(@"MultiPageHTML_out.html", htmlOptions);

Setting the SplitIntoPages flag to true takes care of everything for you and the output HTML consists of multiple pages instead of a single page.

Free C# PDF to HTML Converter

You can get a free temporary license and convert PDF to HTML without any limitations.

Explore C# PDF to HTML Library

You can head on to the documentation section, PDF to HTML which lists some advanced-level features for applying more options during conversion.

Download your free copy of Aspose.PDF for .NET and you can get started in no time by following the documentation. If you have any queries, feel free to post to Aspose.PDF forum. We’ll be glad to assist you with your queries and inquiries.

Conclusion

This article covered various scenarios of how to convert PDF to HTML programmatically using C#. It also demonstrated how to customize the conversion using different options. You can easily use the provided code samples in your application. Now, you can also create a powerful online PDF to HTML converter in your .NET applications.