Scanned PDF to Word OCR csharp

Scanned PDF files contain images where text cannot be selected or edited. In certain situations, you may need to convert scanned PDF to Word document. In this article, you will learn how to convert scanned PDF to Word document in DOCX or DOC format programmatically using C#:

Scanned PDF to Word DOCX Converter – C# API Installation

You can work with scanned PDF files by performing OCR operations with Aspose.OCR for .NET API and then create a Word document using Aspose.Words for .NET API programmatically using C#. You can configure the APIs by downloading the DLL files from the New Releases, or with the following NuGet installation commands:

PM> Install-Package Aspose.OCR
PM> Install-Package Aspose.Words

Convert Scanned PDF to Word Document Programmatically using C#

You can convert scanned PDF files to Word documents by recognizing the text optically. OCR operations convert the scanned PDF to text and then the Word document is generated in DOC or DOCX format. Please follow the steps below to convert scanned PDF to a Word document:

  1. Initialize AsposeOcr class instance.
  2. Recognize images from PDF with DocumentRecognitionSettings class.
  3. Initialize StringBuilder class object and save the text.
  4. Initialize word document with the Document class.
  5. Specify font and paragraph formatting.
  6. Save output Word document as DOCX or DOC file.

The following code snippet shows how to convert a scanned PDF file to a Word document programmatically using C#:

Get Free Evaluation License

You can test the APIs in full capacity by requesting a free temporary license.

Conclusion

In this article, you have learned how to convert a scanned PDF file to a Word document in DOCX or DOC format programmatically using C#. Moreover, you can explore several other OCR-related features by visiting the documentation. Please feel free to get in touch with us at the forum in case of any queries.

See Also

Tip: If you ever need to get a Word document from a PowerPoint presentation, you can use Aspose Presentation to Word Document converter.