PDF to TXT

Convert PDF to TXT online in Java

The PDF file is one of the widely used formats for data and information sharing over the internet. However, sometimes they content content using custom fonts and the size of resultant file might get increased. Furthermore, we may have a requirement to extract the textual content for further processing, so in this scenario, the conversion of PDF to Text is one of the viable solutions. In this article, we are going to discuss the details on how we can convert PDF to TXT format.

PDF to Text Conversion API

Aspose.Words Cloud SDK for Java is leading solution for Word document processing. It’s an amazing solution providing the capabilities to creat, edit and transform variety of files to DOC, HTML, JPEG and various other supported formats. Other than Word documents, it also provides the support to load PDF files and save the output to TXT, HTML, XPS, TIFF etc. Now in order to use the SDK, please add the following details to your pom.xml of your maven build type.

<repositories>
    <repository>
        <id>AsposeJavaAPI</id>
        <name>Aspose Java API</name>
        <url>https://repository.aspose.cloud/repo/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
		<groupId>com.aspose</groupId>
		<artifactId>aspose-words-cloud</artifactId>
		<version>22.5.0</version>
	</dependency>
</dependencies>

Now we need to create a free account by visiting Aspose.Cloud dashboard. Simply Sign up using your existing GitHub or Google account, or click the Create a new Account button.

Convert PDF to TXT in Java

Please follow the instructions below to accomplish your requirements using Java code snippet.

  • First, create an object of WordsApi using a Client ID and Client Secret as parameters
  • Secondly, read the PDF file content from the local drive using the File object
  • Thirdly, read the content of the PDF file using the UploadFileRequest instance
  • The next step is to call uploadFile(…) method to upload the PDF to cloud storage
  • Now, create an object of GetDocumentWithFormatRequest(…) where we specify the output format as TXT format
  • Finally, call the method getDocumentWithFormat(…) to convert PDF to Text and save the output in Cloud storage.

Please use the following code snippet to load PDF from local drive and save to TXT format. After the conversion, the resultant file is saved in cloud storage.

  • First, create an object of WordsApi using a Client ID and Client Secret as parameters
  • Secondly, read the PDF file content from the local drive using the File object
  • Thirdly, create an object of ConvertDocumentRequest(…) where we provide arguments of File object, TXT as resultant format and output path of resultant TXT file
  • Finally, call the method convertDocument(…) to convert PDF to Text and save the output in Cloud storage
PDF to Text

Image 1:- Preview of PDF to Text conversion in cloud storage

PDF to TXT using cURL Commands

In this section, we are going to perform the PDF to Text conversion using cURL commands. Now as a pre-requisite, we need to generate the JWT access token based on client credentials using following command.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=bbf94a2c-6d7e-4020-b4d2-b9809741374e&client_secret=1c9379bb7d701c26cc87e741a29987bb" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Now please execute the following command to upload the PDF file to cloud storage.

curl -v -X PUT "https://api.aspose.cloud/v4.0/words/storage/file/input.pdf" \
-H  "accept: application/json" \
-H  "Authorization: Bearer <JWT Token>" \
-H  "Content-Type: multipart/form-data" \
-d {"fileContent":{}}

Finally, execute the following command to convert PDF to Text format.

curl -v -X GET "https://api.aspose.cloud/v4.0/words/Quick%20Start%20Guide-%20Inventory%20Software%20Suite.pdf?format=TXT&outPath=extracted.txt" \
-H  "accept: application/octet-stream" \
-H  "Authorization: Bearer <JWT Token>"

Conclusion

This article explained the details on converting PDF to Text using Java code snippet. Similarly, we have also learned the steps for converting PDF to TXT using the cURL commands. For better learning, you may explore the API using Swagger UI within web browser.

Should you have any related query or you encounter any issues, please feel free to contact us via free technical support forum.

We highly recommend going through the following articles to learn more about: