PDF is one of the most popular document formats these days which is used by a variety of applications as the final output. Due to its support for a wide range of data types and portability, it is the format of choice for creating and sharing content. As a .NET application developer who is interested in developing document management applications, you may want to embed processing features to read and convert PDF documents to other file formats such as HTML.
In this post, we’ll explore and demonstrate the powerful conversion feature of Aspose.PDF for .NET API to read and convert a PDF file to HTML with several options.
Convert PDF to HTML using C#
Aspose.PDF for .NET API lets you read and convert PDF files to HTML in your .NET applications. It is simple to use and you can get started with the basic conversion using the following simple two lines of code.
// The path to the documents directory. string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion(); // Open the source PDF document Document pdfDocument = new Document(dataDir + "PDFToHTML.pdf"); // Save the file into MS document format pdfDocument.Save(dataDir + "output_out.html", SaveFormat.Html);
It is that simple to convert PDF to HTML in your C# applications. The API takes care of reading all the internal details of PDF file format and converts it to HTML. Interestingly, you don’t need to have PDF reader programs installed at your end or any other computer where your application will finally run.
Convert PDF to HTML with Embedded Resources
You can also convert PDF to HTML with all the resources as part of the output HTML. This will result in making all the elements of a PDF file (images, CSS, and fonts) embedded into the output HTML. This is achieved by using the HtmlSaveOptions.PartsEmbeddingModes enumerator as shown in the following code sample.
// Load source PDF file Document doc = new Document("input.pdf"); // Instantiate HTML Save options object HtmlSaveOptions newOptions = new HtmlSaveOptions(); // Enable option to embed all resources inside the HTML newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml; // This is just optimization for IE and can be omitted newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss; newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground; newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats; // Output file path string outHtmlFile = "SingleHTML_out.html"; doc.Save(outHtmlFile, newOptions);
Saving Images to Specific Folder
Everyone knows that a PDF document can contain images in addition to textual details. An HTML can contain images that are based-64 encoded inside the HTML or can reference images from a folder where these images reside. Aspose.PDF API has rich features of saving images to user-specified folder on disc. The following code sample shows how to save images to a specific folder during conversion of PDF to HTML.
// Create HtmlSaveOption with tested feature HtmlSaveOptions newOptions = new HtmlSaveOptions(); // Specify the separate folder to save images newOptions.SpecialFolderForAllImages = dataDir;
Convert PDF to Multipage HTML
The API doesn’t stop you here as it has a lot of options to control the resultant HTML. For example, you can split the HTML in the above step into multiple pages during PDF to HTML conversion using the following sample code.
// The path to the documents directory. string dataDir = RunExamples.GetDataDir_AsposePdf_DocumentConversion(); // Open the source PDF document Document pdfDocument = new Document(dataDir + "PDFToHTML.pdf"); // Instantiate HTML SaveOptions object HtmlSaveOptions htmlOptions = new HtmlSaveOptions(); // Specify to split the output into multiple pages htmlOptions.SplitIntoPages = true; // Save the document pdfDocument.Save(@"MultiPageHTML_out.html", htmlOptions);
Setting the SplitIntoPages flag to true takes care of everything for you and the output HTML consists of multiple pages instead of a single page.
Still want more? You can head-on to the APIs documentation section, PDF to HTML that lists some advance level features for applying more options during conversion. Download your free copy of Aspose.PDF for .NET and you can get started in no time by following the API documentation. If you have any queries, feel free to post to Aspose.PDF forum. We’ll be glad to assist you with your queries and inquiries.