Text Extraction Error Reporting and PDF Incremental Updates Features

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

We are pleased to announce that Aspose.Pdf for Java 17.12 is available for download with new features and improvements. In case you are planning to updgrade your existing API to the latest version, we recommend you to please check the release notes of Aspose.Pdf for Java 17.12 for an overview of public API changes and improvements. However, following are some major improvements and fixes in terms of text extraction and document manipulation features.

Implemented Text Extraction Error Reporting Functionality

While investigating a scenario where a PDF document used PDF Type 3 fonts, it was observed that the TextAbsorber class was not retrieving the text correctly. Reason was that the fonts used in the PDF, contained different encoding and it is not possible to extract text from such documents, by using Adobe Reader itself. We realized the necessity to implement a functionality in the API that such error in the document can be reported. We are pleased to inform you that text extraction error reporting has been implemented for TextAbsorber and TextFragmentAbsorber classes, which is available with Aspose.Pdf for Java 17.12. Following code snippet can be used to detect errors while extracting text from a PDF document:

Document pdf = new Document("test.pdf");
TextAbsorber absorber = new TextAbsorber();
absorber.getTextSearchOptions().setLogTextExtractionErrors(true);
pdf.getPages().accept(absorber);
if (absorber.hasErrors()) {
// Information about found errors and locations is stored in
// Errors collection.
for (TextExtractionError error : absorber.getErrors()) {
// TextExtractionError object contains information about the
// text extraction error found during processing concrete
// text fragment.
System.out.println(error);
System.out.println(String.format("Extracted text: '{0}'",
error.getExtractedText()));
}
}

Save PDF document into Stream object using Incremental Updates

It was observed that when you load a PDF document from binary, manipulate it (i.e add some annotations) and save it to a different binary – the content of the PDF document was used to be totally changed. In order to avoid such issues, we have implemented an additional method i.e saveIncrementally() into the Document class. Now you will be able to save document into a Stream object, using Incremental Updates.

Miscellaneous Fixes

As it always recommended to use latest release of our API’s as they include latest features / improvements and fixes related to issues reported in earlier released versions. Therefore, please download the latest release of Aspose.Pdf for Java 17.12.


To keep up with our news, you can follow us on Twitter or follow our Facebook page.