word to markdown

Convert Word to Markdown in Java

Microsoft Word is widely used to create, edit and transform Word documents (DOC/DOCX) into various formats. Similarly, Markdown is a lightweight markup language that you can use to add formatting elements to plain text documents. It is a plain text document which shall be readable without tags mussing everything up, but there should still be ways to add text modifiers like lists, bold, italics, etc. So if we have a Word document and we need to create an equivalent file in Markdown syntax, it gets difficult to manually create it. However, a programmatic solution can solve problem. This article is going to explain all the details on how to develop word to markdown converter using Java Cloud SDK.

Word to Markdown Conversion API

Our REST based API named Aspose.Words Cloud is an amazing solution to implement MS Word document creation, manipulation and conversion operations to variety of supported formats. Now in order to implement same document conversion and processing capabilities in Java application, we need to use Aspose.Words Cloud SDK for Java which is a wrapper around REST API. So in the first step of SDK utilization, we need to add its reference in our Java project by including the following information in pom.xml (maven build type project).

<repositories> 
    <repository>
        <id>aspose-cloud</id>
        <name>artifact.aspose-cloud-releases</name>
        <url>http://artifact.aspose.cloud/repo</url>
    </repository>   
</repositories>

<dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words-cloud</artifactId>
        <version>22.12.0</version>
    </dependency>
</dependencies>

Once the SDK reference has been added in project, the next important step is to obtain your client credentials from Cloud Dashboard. Else, you need to first register a free account while using a valid email address.

Word to MD in Java

This section explains the steps & related details on how we can convert Word to MD format using Java code snippet. We are going to use two options for loading the input Word Document i.e. form Cloud storage or local drive, and then transform it to Markdown format.

Load Word Document from local drive

  • First of all, create an instance of WordsApi and pass personalized credentials as arguments
  • Secondly, read the content of input Word document using Files.readAllBytes(…) method and get returned value in a byte[] array
  • Thirdly, create an object of ConvertDocumentRequest which takes input Word file, MD format and resultant Markdown file’s name as arguments
  • Now, call method convertDocument(…) for Word to MD conversion. The resultant Markdown is returned as response stream, to be saved into byte[] instance
  • Finally, in order to save the resultant Markdown to local drive, create an object of FileOutputStream and use its write(…) method
word to Markdown

Image:- Word to Markdown conversion preview

You may consider downloading the input Word document from sample_EmbeddedOLE.docx.

Load Word Document from Cloud Storage

  • Similarly, first we need to create an instance of WordsApi while passing personalized credentials as arguments
  • Secondly, create an object of GetDocumentWithFormatRequest which takes input Word file name, MD format and resultant Markdown file’s name as arguments
  • Finally, call the method getDocumentWithFormat(..) which triggers the Word to Markdown conversion operation. The resultant MD file is saved in Cloud storage

DOC to Markdown using cURL Commands

The REST APIs also provide the flexibility to be accessed from any platform with the help of cURL commands. So in this section, we are going to discuss the details on how to load Word document form Cloud storage, perform DOCX to Markdown conversion and save the resultant MD file on local drive. Now first we need to generate the JWT access token (based on client credentials) using following command and then perform DOCX to Markdown conversion.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=bb959721-5780-4be6-be35-ff5c3a6aa4a2&client_secret=4d84d5f6584160cbd91dba1fe145db14" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once the JWT is generated, please execute the following command to load Word document from Cloud storage and perform Word to Markdown conversion. The resultant MD file is then stored on local drive

curl -v -X GET "https://api.aspose.cloud/v4.0/words/sample_EmbeddedOLE.docx?format=md" \
-H  "accept: application/octet-stream" \
-H  "Authorization: Bearer <JWT Token>" \
-o "newOutput.md"

Conclusion

We have reached to the end of this article where we have learned the details on how we can programmatically convert Word to Markdown using Java. Similarly, we have also explored the steps for converting DOCX to Markdown via cURL commands.

Another option to explore the capabilities of API is through SwaggerUI within a web browser. We also recommend exploring the Product Documentation which is an amazing source of information to learn about other exciting features. In case you need to download and modify the source code of Cloud SDK, it is available on GitHub (published under MIT license).

Lastly, in case you encounter any issues while using the API, you may consider approaching us for a quick resolution via free product support forum.

Please visit the following links to learn more about: