PDF (Portable Document Format) files are among the most trusted and widely used formats for sharing documents across different platforms—ensuring consistent layout and appearance regardless of software or device. However, there are situations where developers need to extract text from PDF files programmatically, such as analyzing content, indexing documents, or converting PDFs into editable text formats.

In this article, we’ll explore how to extract text from PDF files and build a PDF to Text converter using .NET REST API, enabling seamless and automated text extraction through REST API calls.

PDF Processing API

Leverage the power of Aspose.PDF Cloud SDK for .NET to extract text from PDF files efficiently. In addition to text extraction, the SDK allows you to create PDF documents from scratch or templates, edit existing PDFs, and convert them to other supported formats. You can also perform tasks like decrypting, merging, and manipulating PDF files directly through the .NET REST API.

Now to get started, we need to install the SDK in our .NET project.

NuGet\Install-Package Aspose.Pdf-Cloud -Version 25.9.0

We also need to create a free account on Cloud Dashboard and obtain our personalized client credentials.

Perform PDF to Text Conversion in C#

Please follow the steps mentioned below to extract the text from PDF file using C# .NET.

PdfApi pdfApi = new PdfApi(clientSecret, clientID);

Create an object of PdfApi class where we pass the client credentials obtained above as arguments.

String inputFile = "sourceFile.pdf";
var sourceFile = System.IO.File.OpenRead(inputFile);
pdfApi.UploadFile("sourceFile.pdf", sourceFile);

Read the input PDF from local drive and upload to cloud storage using UploadFile(...) method.

TextRectsResponse response = pdfApi.GetText("inputPDF.pdf", LLX, LLY, URX, URY, null, null, null, null, null);

We need to specify the region on PDF file from where we need to extract the text content using GetText(...) method.

for (int counter = 0; counter <= response.TextOccurrences.List.Count - 1; counter++)
{
    // write text content in console
    Console.WriteLine(response.TextOccurrences.List[counter].Text);
}

Once the text content has been extracted, either we can save on local drive or print it in the console.

Extract Text from PDF using cURL

Apart from using .NET or Java code snippets, you can also extract text from PDF files using Aspose.PDF Cloud through cURL commands. So in this approach, the prerequisites is to generate a JWT access token (based on your client credentials), which can be obtained using the following command.

Step 1. - Obtain JWT_Access token:

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=XXXXX-XXXXX-XXXXXX-ff5c3a6aa4a2&client_secret=XXXXXXXXXXXX" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Step 1. - Extract text from PDF file:

curl -v "https://api.aspose.cloud/v3.0/pdf/{inputPDF}/text?splitRects=true&LLX=10&LLY=10&URX=800&URY=800" \
-X GET \
-H  "accept: application/json" \
-H  "authorization: Bearer {Access_Token}" \
-o "extractedContent.txt"
  • Once the command is successfully executed, the text from specified rectangular region is extracted into Text file.

Free PDF Parser App

If you are looking to test the capabilities of the API without any coding or cURL commands, then try using our [Free PDF Parser] application built on top of .NET REST APIs.

pdf parser app

Concluding Remarks

In this article, we have learned the details on how to integrate Aspose.PDF Cloud SDK for .NET into our .NET project for text extraction purposes and at the same time, we have explored the option of using cURL commands to perform PDF text extraction via command line interface. So, whether your goal is data analysis, machine learning, or other automation purposes, the SDK empowers you with reliable tools to handle PDF content efficiently. Put these skills into practice and streamline your PDF handling like a pro!

Reading Material

We also recommend visit the following links to learn more about: