Scanned PDF to Excel OCR Java

Scanned PDF files comprise data in image format because they are often created by scanners. In certain situations, you may require numerical information from a scanned PDF file. So you can perform OCR operations for creating an Excel file. This article covers how to create a scanned PDF to Excel converter with OCR feature programmatically in Java.

Create Scanned PDF to Excel Converter with OCR – Java API Installation

You can optically recognize the text in a PDF file with the OCR feature using Aspose.OCR for Java API. Simply install the API by downloading the JAR file from the New Releases section, or using the Maven specifications below:

Repository:

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>http://repository.aspose.com/repo/</url>
</repository>

Dependency:

<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-ocr</artifactId>
    <version>21.12</version>
</dependency>

Convert Scanned PDF to Excel Programmatically in Java

You can convert a scanned PDF file to Excel with OCR by following the steps below:

  1. Create a AsposeOcr class object.
  2. Specify the settings with DocumentRecognitionSettings class.
  3. Recognize the scanned PDF file using RecognizePdf method.
  4. Save output OCR result as an Excel file.

The following code snippet elaborates how to convert a scanned PDF to an Excel file programmatically in Java:

Get Free Evaluation License

You can evaluate the API for creating scanned PDF to Excel converter by OCR operations without any limitations by requesting a free temporary license.

Conclusion

In this article, you have understood how to convert a scanned PDF file to an Excel file with the OCR feature programmatically in Java. Moreover, please take a look at other OCR-related features by visiting the documentation. Feel free to write to us at the forum in case of any concerns.

See Also