scanned pdf to text csharp

A scanned PDF file contains one or more flat images captured by a scanner or a camera. You cannot copy, paste, or process information from such files. This article covers how to convert a scanned PDF to text in C#.

Scanned PDF to Text Conversion – C# API Installation

Aspose.OCR for .NET API is used to perform OCR operations. It can recognize the characters optically from images or scanned PDF documents. Please configure the API by downloading the DLL file from the New Releases section, or with the following NuGet installation command.

PM> Install-Package Aspose.OCR

Convert Scanned PDF to Text String in C#

You can convert a scanned PDF file to a text string by performing OCR operations on it. You need to follow the steps below to print the text from a scanned PDF document:

  1. Specify the setting for recognizing the scanned PDF file.
  2. Initialize AsposeOcr class instance.
  3. Initialize RecognitionResult class object.
  4. Print the text after recognizing it from a scanned PDF.

The following code snippet shows how to recognize text from scanned PDF in C#:

Convert Scanned PDF to TXT File Programmatically in C#

You can convert a scanned PDF file to a TXT file with the following steps:

  1. Instantiate AsposeOcr class object.
  2. Create DocumentRecognitionSettings class object.
  3. Save recognition results and initialize StringBuilder class instance.
  4. Save the result in a TXT file.

The code snippet below explains how to convert a scanned PDF file to a TXT file programmatically in C#:

Get Free Evaluation License

You can request a free evaluation license to test the API in its full capacity.

Conclusion

In this article, you have learned how to convert a scanned PDF to a text string or a text file programmatically using C#. Moreover, you may check several other features of the API by visiting documentation. Please feel free to contact us at forum in case of any concerns.

See Also