How to use pandoc to convert files on the linux command line

In a previous article, I covered the process of batch converting a handful of Markdown files to HTML using pandoc. Several HTML files were created in this article, but pandoc can do a lot more. It has been called “the Swiss Army Knife” of document conversion – and for good reason. There is not much it cannot do.

Pandoc can convert .docx, .odt, .html, .epub, LaTeX, DocBook etc. to these and other formats like JATS, TEI Simple, AsciiDoc and more.

Yes, that means pandoc can convert .docx files to .pdf and .html, but you might be thinking, “Word can also export files to .pdf and .html. Why should I need Pandoc? “

You would have a good point, but since pandoc can convert so many formats, it could become your tool of choice for all of your conversion tasks. For example, many of us know that Markdown editors can export their Markdown files to .html. Pandoc can also convert Markdown files to numerous other formats.

I rarely export Markdown to HTML; I usually have Pandoc do it.

Converting file formats with Pandoc

How to use pandoc to convert files on the linux command line

Here I am going to convert Markdown files to various formats. I write almost everything in Markdown syntax, but I often have to convert to another format: .docx files are usually used for schoolwork, .html for web pages I create – and for .epub work, .pdf for flyers and Handouts and occasionally even a TEI Simple file for a university digital humanities project. Pandoc can handle all of this and more with ease.

First you have to Install Pandoc. LaTeX is also required to create .pdf files. The package I prefer is TeX Live.

Note: If you’d like to try out pandoc before installing, there is an online test page at: https://pandoc.org/try/

Install Pandoc and texlive

Ubuntu and other Debian distros users can type the following commands into the terminal:

Notice on the second line that you are installing pandoc and texlive at the same time. The apt-get command won’t have a problem with that, but go get coffee; This can take a few minutes.

Get to the conversion

Once pandoc and texlive are installed you can blow some work!

The model document for this project will be an article first published in the North American Review in December 1894, entitled “How to Stop Train Robbers.” The markdown file I’ll be using was created some time ago as part of a restoration project.

The file: how_to_repel_train_robbers.md is in my Documents directory, in a subdirectory called Samples. This is what it looks like in ghostwriters.

How to use pandoc to convert files on the linux command line

Markdown file in ghostwriter

I want to create .docx, .pdf, and .html versions of this file.

The first conversion

I’ll start by making a .pdf copy first as I went to the trouble of installing a LaTeX package.

/ Documents / samples / directory, I’ll enter the following to create a .pdf file:

The above command creates a file called htrtr.pdf from the how_to_repel_train_robbers.md file. The reason I used htrtr as the name was because it is shorter than how_to_repel_train_robbers – htrtr is the first letter of each word in the long title.

Here is a snapshot of the .pdf file once it was created:

How to use pandoc to convert files on the linux command line

Converted PDF file that will be displayed in Ocular

The second conversion

Next I want to create a .docx file. The command is almost identical to the one I used to create the PDF and is:

A .docx file is created in no time. This is what it looks like in Libre Writer:

How to use pandoc to convert files on the linux command line

Converted DOCX file to be viewed in Libre Writer

The third conversion

Maybe I want to post this on the web, so a website would be nice. I am creating a .html file with this command:

Again, the build command is very similar to the last two conversions. This is what the .html file looks like in a browser:

How to use pandoc to convert files on the linux command line

Converted HTML file to be viewed in Firefox

Already noticed something?

Let’s look at the past commands again. They are:

The only difference to these three commands is the extension next to htrtr. This gives you an indication that pandoc is relying on the extension of the output filename you specified.

diploma

Pandoc can do much more than the three small modifications here. If you write in a preferred format but need to convert the file to a different format, chances are that pandoc can do it for you.

What would you do with it? Would you automate that? What if you had a website that your readers could download articles from? You can modify these little commands to work as a script and your readers can decide which format they want. You can offer .docx, .pdf, .odt, .epub or more. Your readers choose, the correct conversion script runs, and your readers download their file. It can be done.

In an earlier article, I covered the procedure to batch convert a handful of Markdown files to HTML using pandoc. In that article, multiple HTML files were created, but pandoc can do much more. It has been called “the Swiss army knife” of document conversion – and with good reason. There isn’t a lot that it can’t do.

Pandoc can covert .docx, .odt, .html, .epub, LaTeX, DocBook, etc. to these and other formats, such as JATS, TEI Simple, AsciiDoc, and more.

Yes, this means that pandoc can convert .docx files to .pdf and .html, but you may be thinking: “Word can export files to .pdf and .html too. Why would I need pandoc?”

You would have a good point there, but since pandoc can convert so many formats, it could well become your go-to tool for all of your conversion tasks. For example, many of us know that Markdown editors can export its Markdown files to .html. With pandoc, Markdown files can be converted to numerous other formats as well.

I rarely have Markdown export to HTML; I normally let pandoc do it.

Converting File Formats with Pandoc

How to use pandoc to convert files on the linux command line

Here, I will convert Markdown files into a few different formats. I do almost all of my writing using Markdown syntax, but I often have to convert to another format: .docx files are usually required for school work, .html for web pages that I create – and for .epub work, .pdf for flyers and handouts, and even an occasional TEI Simple file for a university digital humanities project. Pandoc can handle all of these, and more, easily.

First, you need to install pandoc. Also, to create .pdf files, LaTeX will be needed as well. The package I prefer is TeX Live.

Note: If you would like to try out pandoc before installing it, there is an online try-out page at: http://pandoc.org/try/

Installing pandoc and texlive

Users of Ubuntu and other Debian distros can type the following commands in the terminal:

Notice on the second line, you are installing pandoc and texlive in one shot. apt-get command will have no problem with this, but go get some coffee; this may take a few minutes.

Getting to Conversion

Once pandoc and texlive are installed, you can burn through some work!

The sample document for this project will be an article that was first published in the North American Review in December of 1894, and is titled: “How To Repel Train Robbers”. The Markdown file that I will be using was created some time ago as part of a restoration project.

The file: how_to_repel_train_robbers.md is located in my Documents directory, in a sub-directory named samples. Here is what it looks like in Ghostwriter.

How to use pandoc to convert files on the linux command line

I want to create .docx, .pdf, and .html versions of this file.

The First Conversion

I’ll start with making a .pdf copy first, since I went through the trouble of installing a LaTeX package.

/Documents/samples/ directory, I type the following to create a .pdf file:

The above command will create a file called htrtr.pdf from the how_to_repel_train_robbers.md file. The reason I used htrtr as a name was that it is shorter than how_to_repel_train_robbers – htrtr is the first letter of each word in the long title.

Here is a snapshot of the .pdf file once it is made:

How to use pandoc to convert files on the linux command line

The Second Conversion

Next, I want to create a .docx file. The command is almost identical to the one I used to create the .pdf and it is:

In no time, a .docx file is created. Here is what it looks like in Libre Writer:

How to use pandoc to convert files on the linux command line

The Third Conversion

I may want to post this on the web, so a web page would be nice. I will create a .html file with this command:

Again, the command to create it is very much like the last two conversions. Here is what the .html file looks like in a browser:

How to use pandoc to convert files on the linux command line

Noticed Anything Yet?

Let’s look at the past commands again. They were:

The only thing different about these three commands is the extension next to htrtr. This gives you a hint that pandoc relies on the extension of the output filename you provide.

Conclusion

Pandoc can do far more than the three little conversions done here. If you write with a preferred format, but need to convert the file to another format, chances are great that pandoc will be able to do it for you.

What would you do with this? Would you automate this? What if you had a web site that had articles for your readers to download? You could modify these little commands to work as a script and your readers could decide which format they would like. You could offer .docx, .pdf, .odt, .epub, or more. Your readers choose, the proper conversion script runs, and your readers download their file. It can be done.

Markdown is a free and open source command line application that can convert Markdown files to HTML files. It is a command line utility developed by creators of the Markdown syntax itself. To install it in Ubuntu, use the command below:

You can install Markdown command line tool in other Linux distributions from the package manager. You can also compile it from its source code available here.

To convert a “.md” file to “.html” file, run a command in the following format:

The first argument is the input “.md” file that you want to convert to a “.html” file. The second argument is the name of the “.html” output file. Replace these names as needed.

For more information on the “markdown” command, run the command below:

Pandoc

Pandoc is a free and open source document conversion utility that can convert documents written in markup languages into a number of different file formats. It supports conversion into numerous file formats, more than any other command line tool that is capable of doing document conversion. Besides converting to “.html” format, it can convert files to “.odt”, “.docx”, “.pdf”, and “.csv” formats as well. It can even convert Markdown files to “.epub” file format allowing you to read content on ereaders.

To install Pandoc in Ubuntu, use the command below:

You can install Pandoc in other Linux distributions from the package manager. More packages and installation instructions are available here.

To convert a “.md” file to a “.html” file using Pandoc, run a command in the following format:

Replace “file.md” with the name of the input file. The “-f” switch is used to specify the format of the input file. The “-t” switch can be used to specify the format of the output file. The “-s” is required to properly construct the output file. The “-o” switch can be used to provide a name for the output file.

To convert a “.md” file to a “.docx” file, run a command in the following format:

For more information on Pandoc, run the following two commands:

Kramdown

Kramdown is a free and open source Markdown converter written in Ruby programming language. It is mainly designed to convert Markdown files to HTML files. However, you can use it to convert Markdown files to kramdown, LaTeX and PDF file formats as well.

You can install Kramdown in Ubuntu using the command specified below:

You can install Kramdown in other Linux distributions from the package manager. Further installation instructions are available here.

To convert a “.md” file to a “.html” file using Kramdown, run a command in the following format:

Replace “file.md” to change input file name. The “-i” switch takes a name for input file format while the “-o” switch can be used to specify the format for the converted output. Replace “file.html” with your desired name for the output file.

For more information on Kramdown, run the following two commands:

Cmark

Cmark or CommonMark is a free and open source Markdown parser and converter written in C programming language. It claims to be much faster than other Markdown parsing apps available on the Web. It also provides a modified version of Markdown syntax, aimed to make it easier to write rich text content.

You can install Cmark in Ubuntu using the command specified below:

You can install Cmark in other Linux distributions from the package manager. Further installation instructions are available here.

To convert a “.md” file to a “.html” file using Cmark, run a command in the following format:

Replace “file.md” to change input file name. The “-t” switch is used to specify the output file format. Replace “file.html” with your desired name for the output file. You can convert “.md” files to xml, html, commonmark, latex, and man (manpage) formats using Cmark.

For more information on Cmark, run the following two command:

Grip is a free and open source Markdown file renderer and previewer written in Python. It is mainly designed to preview GitHub compatible “README.md” files. But you can use it to convert other Markdown files to HTML file format as well.

You can install Grip in Ubuntu using the command specified below:

You can install Grip in other Linux distributions from the package manager. Further installation instructions are available here.

To convert a “.md” file to a “.html” file using Grip, run a command in the following format:

Replace “file.md” to change input file name. Replace “file.html” with your desired name for the output file. Make sure that the output file name ends with “.html” extension to convert the file properly without errors.

For more information on Grip, run the following two commands:

Conclusion

These are some of the best command line applications that can be used to convert Markdown files to a variety of useful file formats. These tools are especially useful for those users who write content using Markdown syntax but publish it in a different markup language or in a different file format.

About the author

Nitesh Kumar

I am a freelancer software developer and content writer who loves Linux, open source software and the free software community.

How to use pandoc to convert files on the linux command line

If you live your life in plaintext, there invariably comes a time when someone asks for a word processor document. I run into this issue frequently, especially at the Day Job TM . Although I’ve introduced one of the development teams I work with to a Docs Like Code workflow for writing and reviewing release notes, there are a small number of people who have no interest in GitHub or working with Markdown. They prefer documents formatted for a certain proprietary application.

The good news is that you’re not stuck copying and pasting unformatted text into a word processor document. Using pandoc, you can quickly give people what they want. Let’s take a look at how to convert a document from Markdown to a word processor format in Linux using pandoc. ​​​​

Note that pandoc is also available for a wide variety of operating systems, ranging from two flavors of BSD (NetBSD and FreeBSD) to Chrome OS, MacOS, and Windows.

Converting basics

To begin, install pandoc on your computer. Then, crack open a console terminal window and navigate to the directory containing the file that you want to convert.

Type this command to create an ODT file (which you can open with a word processor like LibreOffice Writer or AbiWord):

pandoc -t odt filename.md -o filename.odt

Remember to replace filename with the file’s actual name. And if you need to create a file for that other word processor (you know the one I mean), replace odt on the command line with docx. Here’s what this article looks like when converted to an ODT file:

How to use pandoc to convert files on the linux command line

These results are serviceable, but a bit bland. Let’s look at how to add a bit more style to the converted documents.

Converting with style

pandoc has a nifty feature enabling you to specify a style template when converting a marked-up plaintext file to a word processor format. In this file, you can edit a small number of styles in the document, including those that control the look of paragraphs, headings, captions, titles and subtitles, a basic table, and hyperlinks.

Let’s look at the possibilities.

Creating a template

In order to style your documents, you can’t just use any template. You need to generate what pandoc calls a reference template, which is the template it uses when converting text files to word processor documents. To create this file, type the following in a terminal window:

pandoc -o custom-reference.odt –print-default-data-file reference.odt

This command creates a file called custom-reference.odt. If you’re using that other word processor, change the references to odt on the command line to docx.

Open the template file in LibreOffice Writer, and then press F11 to open LibreOffice Writer’s Styles pane. Although the pandoc manual advises against making other changes to the file, I change the page size and add headers and footers when necessary.

Using the template

So, how do you use that template you just created? There are two ways to do this.

The easiest way is to drop the template in your /home directory’s .pandoc folder—you might have to create the folder first if it doesn’t exist. When it’s time to convert a document, pandoc uses this template file. See the next section on how to choose from multiple templates if you need more than one.

The other way to use your template is to type this set of conversion options at the command line:

pandoc -t odt file-name.md –reference-doc=path-to-your-file/reference.odt -o file-name.odt

If you’re wondering what a converted file looks like with a customized template, here’s an example:

How to use pandoc to convert files on the linux command line

Choosing from multiple templates

Many people only need one pandoc template. Some people, however, need more than one.

At my day job, for example, I use several templates—one with a DRAFT watermark, one with a watermark stating FOR INTERNAL USE, and one for a document’s final versions. Each type of document needs a different template.

If you have similar needs, start the same way you do for a single template, by creating the file custom-reference.odt. Rename the resulting file—for example, to custom-reference-draft.odt—then open it in LibreOffice Writer and modify the styles. Repeat this process for each template you need.

Next, copy the files into your /home directory. You can even put them in the .pandoc folder if you want to.

To select a specific template at conversion time, you’ll need to run this command in a terminal:

pandoc -t odt file-name.md –reference-doc=path-to-your-file/custom-template.odt -o file-name.odt

Change custom-template.odt to your template file’s name.

Wrapping up

To avoid having to remember a set of options I don’t regularly use, I cobbled together some simple, very lame one-line scripts that encapsulate the options for each template. For example, I run the script todraft.sh to create a word processor document using the template with a DRAFT watermark. You might want to do the same.

Here’s an example of a script using the template containing a DRAFT watermark:

pandoc -t odt $1.md -o $1.odt –reference-doc=

Using pandoc is a great way to provide documents in the format that people ask for, without having to give up the command line life. This tool doesn’t just work with Markdown, either. What I’ve discussed in this article also allows you to create and convert documents between a wide variety of markup languages. See the pandoc site linked earlier for more details.

How to use pandoc to convert files on the linux command line

Convert files at the command line with this Pandoc cheat sheet

Download new cheat sheet of common Pandoc options and handy syntax for frequently used conversions.

How to use pandoc to convert files on the linux command line

Has anyone ever sent you a document in a format that just isn’t quite right for you? Maybe you don’t have access to the application used to create the document, or maybe you don’t need the document so much as you need what’s in it, or maybe you just flat out don’t like the format. There’s no wrong reason for disliking a file format. If it’s not your preferred format, whether you find it cumbersome to use or you just don’t like how its metadata is organized, then that’s enough of a reason for you to convert it. However, there’s rarely a good reason to convert a document manually, and Pandoc is here to ensure you never have to.

Install pandoc

If you’re on Linux, you can install pandoc from your software repository.

On Fedora or CentOS or similar:

On Ubuntu, Elementary, Debian, or similar:

If you’re on Windows or macOS, you can use third-party installers. For Windows, there’s Chocolatey, and on macOS, you can use MacPorts or Homebrew.

Once you have it installed, you can verify with a simple version check:

Pandoc basics

At its most basic, the pandoc command is among the easiest commands to use. You type pandoc into a terminal, provide it the file you want to convert, then type –output and a name for the output file you want. Pandoc can usually auto-detect both formats from their filename extensions and convert from one to the other.

Here’s a simple example to convert from a .docx file to .odt:

If you’re not used to using a terminal, keep in mind that in most modern terminal applications, you can drag-and-drop a file from your desktop into the terminal to have it translated into a full path that your computer understands.

You can specify nearly any format you can think of:

That’s right: Pandoc enables you to output many different formats from one single source format.

Find your source format

More Great Content

It doesn’t take long to realize that Pandoc is possibly more flexible than you are, or at least, it’s more flexible than you care to be. Because it’s just a piece of software, Pandoc doesn’t care whether you’ve written your latest thesis paper in LaTeX, Docbook, Markdown, or even JSON (warning: don’t write your thesis paper in JSON). It can process whatever you have handy and turn it into whatever format you need. As with so many open source projects, you have the freedom to choose which tool you like best.

If you know rudimentary HTML and want to write everything in that, then grab a good HTML editor and start writing. Pandoc will convert it to whatever your boss or client or professor needs. Or maybe you prefer Docbook, or LaTeX, CommonMark, Org mode, or just a plain old LibreOffice .odt. It doesn’t matter to Pandoc. Find your favorite format, the one that lets you concentrate on getting your work done, and let Pandoc do the hard part.

Pandoc options

It may not seem like it, but now you know all the basics of Pandoc. It’s a straightforward command that converts from one document format to another. If that’s all you need, you’re finished with this article.

However, Pandoc is a big application with lots of options for every format it can process. If you’re already a Pandoc user or you want to delve deeper into what Pandoc can do, you need to look at its command options.

From and to

The first options you need to know are the –from and –to flags. These explicitly tell Pandoc what format to process from and to, and you can use them when Pandoc’s output doesn’t match what you expected, or when you need to differentiate between formats that may share the same extension.

For example, CommonMark, Markdown, markdown_phpextra, markdown_strict, and markdown_github may all use either the .md or .txt extension. Both HTML and HTML5 use the .html extension, and EPUB versions 2 and 3 both use the .epub extension. Specifying exactly what format conversion you want ensures Pandoc provides you with the expected output:

Table of contents

It varies from format to format, but Pandoc doesn’t always provide a table of contents. The –table-of-contents option, or –toc for short, ensures that a document with chapter breaks (or subheading markers such as h2 in HTML, ## in Markdown, and so on) are prepended with a list of chapters.

If you have chapters with subsections and sections in those subsections, then you may use –toc-depth to set how many subheadings are listed under each chapter.

Epub for eBooks

Epub, an open standard, is one of the most popular formats for eBooks. You can generate them from applications like LibreOffice, Calibre, Scribus, and many others, or you can just convert to Epub using Pandoc. If you know a little bit of CSS, you can easily style your Epub by providing a stylesheet when running Pandoc:

Additionally, you can set your own metadata so that Epub readers know how to sort the book. To do this, create a simple XML file in any text editor:

Save the file, and then use it as your metadata source when converting:

PDF options

Most POSIX systems have the ability to “print” to PDF. This makes generating PDFs easy, but sometimes it results in some quirks, like incorrect metadata. If you purchase independent and RPG eBooks, then you’ve surely come across an otherwise professional-quality PDF with an embedded title of “Word Document.docx” or a PDF with hyperlinks rendered in bright blue regardless of the document style (and they often aren’t even active).

One way to control how your PDF renders is to use Pandoc. With Pandoc, you can use LaTeX commands in your source document to affect PDF output, and you can add your own metadata keys and values:

Download the Pandoc cheat sheet

Pandoc is a powerhouse for anyone who needs to convert document formats. Even when it fails to give you exactly what you want, it’s almost always able to get you closer to what you need. Use open and standardized formats when writing content, and rest assured that Pandoc can convert to whatever else you need. The more you use Pandoc, the more you’re sure to discover.

To help you along with your exploration, we’ve developed an updated Pandoc cheat sheet as a handy reference. The cheat sheet hardly covers everything Pandoc is capable of, but it provides some common commands in common contexts and provides a sense of the general workflow you can expect.

I am trying to convert a large number of HTML files into Markdown using Pandoc in Windows, and have found an answer on how to do this on a Mac, but receive errors when attempting to run the following in Windows PowerShell.

Can someone help me translate this to work in Windows?

How to use pandoc to convert files on the linux command line

6 Answers 6

Help us improve our answers.

Are the answers below sorted in a way that puts the best answer at or near the top?

to convert files in folders recursively try this (Windows prompt command line):

For use in a batch file double the % .

How to use pandoc to convert files on the linux command line

  • Most of the answers here ( for . solutions) are for cmd.exe , not PowerShell. is on the right track, but has a bug with respect to targeting each input file; also, it is hard to parse visually.

The functionally equivalent PowerShell command is:

Endoro’s answer is great, don’t get confused by the parameters added to %i .

For helping others, I needed to convert from RST (restructured text) to dokuwiki syntax, so I created a convert.bat with:

Works for all rst files in folders and subfolders.

How to use pandoc to convert files on the linux command line

If you want to go recursively through a directory and its subdirectories to compile all the files of type, say, *.md , then you can use the batch file I wrote in answer to another question How can I use pandoc for all files in the folder in Windows? . I call it pancompile.bat and the usage is below. Go to the other answer for the code.

Using the powershell built-in gci:

I created a python script that I’ve been using to convert a tree of markdown files into a single output file. It’s available on github:

Editing a PDF file requires converting it to a text document first. But how do you do this?

Unlike a text file, you can’t edit a PDF directly. There are multiple ways to generate PDF files using text. But what if you want to go the other way round and convert PDFs to text files?

Luckily, Linux allows you to easily modify these files from the terminal. This article will demonstrate how to convert a PDF file to a text document on Linux.

Convert PDF to Text From the Terminal

Poppler is a software library used to render and modify PDF files. It contains a utility, known as pdftotext, that allows users to generate text files from PDFs. Since poppler-utils is not a part of the standard Linux packages, you’ll have to install it manually using a package manager.

On Ubuntu and Debian:

To install Poppler on Arch Linux:

Installing the poppler-utils package on CentOS, Fedora, and other RHEL-based distributions is easy.

Convert an Entire PDF to Text

The basic syntax of the pdftotext command is:

. where pdffile is the absolute or relative path to the PDF file, and textfile is the name of the output file.

For example, to convert lorem-ipsum.pdf to a text file:

If the file you’re converting has watermarks or unaligned text, you can discard them in the output by using the -nodiag flag.

Process Pages Within a Specific Range

Use the -f and -l flag if you want to convert pages that fall within a specific range. For example, to convert pages one to five in lorem-ipsum.pdf to text:

To convert only the first page of the PDF file:

Convert Password-Protected PDF Files to Text

Pdftotext can even convert password-protected PDFs to text files. The -upw and -opw flags, which stand for user password and owner password respectively, take care of the authentication process while converting the PDF files.

Make sure to replace password with the password of the PDF file.

You can also combine multiple flags to get the desired output. For example, to convert pages one to three of a password-protected PDF to text:

Graphically Convert PDF to a Text File

If working with the command line is not your cup of tea, you can convert PDFs to text files using graphical software like Calibre. It is an ebook management application that you can use to view, organize, and modify PDF files on your system.

Calibre is available on the official Linux distro repositories and anyone can download it using a package manager.

To install Calibre on Ubuntu and Debian:

On RHEL-based distributions like CentOS and Fedora, you can download Calibre using either DNF or Yum.

How to Use Calibre to Convert PDF Files

Once installed, launch Calibre on your system using the Applications Menu. Alternatively, you can start Calibre from the terminal by typing:

To generate text files using PDF with Calibre:

    Click on the Add Books option from the menu.

The Open Document .odt files can contain rich formats for the content. However, some times a plain text file is more handy. We may convert .odt files to plain text files for such needs. In this post, we discuss 3 ways of how to convert .odt files to .text files in command line in Linux. The ways here can be easily organized into a Bash script to do batch processing of a set of files too. Together with the ways of .docx/.doc to .odt File Conversion in Command Line in Linux, the methods here can be used to do .docx/.doc to plain text file conversion.

We use the LibreOffice and pandoc software. Make sure the software packages are installed in the Linux system. As an example, we use a .odt file as follows.

How to use pandoc to convert files on the linux command line

As shown in the following examples, different ways have different pros/cons. In actual usage, we may choose one suitable way or combine the results from different ways together according to the files or the purposes.

.odt to .txt file conversion using LibreOffice

Table of Contents

We can use the –convert-to feature of the LibreOffice software to conver the .odt file to .txt file. The command to convertt the .odt file to .txt file is as follows.

The converted .txt file looks like this.

Here, we can see all the text including spaces (those in the code section) are kept. However, the format (like bold, italic, titles) are not included.

Convert .odt to .txt file using pandoc

The pandoc tool can convert many file formats. It can also read .odt files and generate .txt files.

Here is the command to convert the .odt file to .txt file is as follows.

The .txt file generated is as follows.

Here, we can see pandoc keeps some of the format (using BOLD for bold fonts, and _italic_ for italic format). However, it removes some spaces in the code section.

Convert .odt to Markdown .txt file using pandoc

Markdown format is a plain text format with its special markup elements into the text document to indicate formats. The markup elements are also in plain text and readable. It can be a good alternative plain text format.

Here is the command to convert the .odt file to Markdown format.

The converted Markdown file is as follows.

Here, we can see the format are marked using Markdown markups ( **bold** , ==== and *italic* ). It is much better although it is not ideal regarding the code section handling.

Summary

This post introduce 3 ways of how to convert .odt to .txt files in command line in Linux. The ways have their pros and cons. But these methods can help us do the majority part of the conversion job. For example, for the example document in this post, by manually adjusting the Markdown plain text file based on pandoc ‘s output and LibreOffice ‘s output (for the code section), we can have a good plain text for the document.

Word documents are widely used in data sharing and data saving format. Nowadays there are more than 1.2 billion office users in the world with more than 60 million offices 360 commercial users/customers. Office 2019 is the latest and recent version of MS office. Moreover, Mac mostly has its own software, there is a special version of Office written especially for Mac operating system, so Mac users can also get benefitted from excel to pdf , PPT To PDF , and word while working on their MacBook.

Using AbiWord from the command line:

In this article, we are going to learn how to use AbiWord from the command line to convert a folder full of MS word documents into PDF files. For this, you will need to have AbiWord installed on your Linux operating system. Most Linux distributions have AbiWord in their package manager, so it should be easy enough to install AbiWord if you do not already have it.

About AbiWord is?

AbiWord can save MS word files to PDF files, but what makes AbiWord so useful is that it can be done from the command line without invoking the full graphical user interface.

How could it be done?

Get into the file and right-click there, then open the terminal. type AbiWord –help in the command line and press enter. First of all, convert the alma doc file into a PDF by calling AbiWord. Tell your command line to convert the file to PDF by passing it an output file by typing ‘abiword –to=PDF –o alma.doc.PDF alma.doc’. you can see that GUI is not loaded but AbiWord loads the files into the memory. To process the whole folder of file, combine this command with a find command, ‘alias PDF = ‘find *.doc –exec abiword –to=PDF –o <> .PDF <> \;’’. Assign the command to an alias of PDF which will save it having to retype it. This command finds all the doc files, and it will convert all the documents in the current folder. After typing this command, enter it and it will start converting doc files into PDF files.

Convert Word files to PDF on Mac

The conversion of Word to PDF file format on Mac is very simple and easy. Get into a folder that contains word files, and open a word document that you want to convert into PDF format. On the very left top corner of your MacBook, there is a similar option to the option on windows, which is ‘File’. Click on File and you see a dropdown. In that drop-down go for print option and click on it, a new window pops up, with the name print. On that pop-up, you see a drop-down in the left bottom, captioned as PDF. Click on that dropdown and save your file as a PDF file format. Now you get a screen which shows save as, where to save and so on. Select the destination for your file so you can find it easily. I would recommend that you save it on the desktop and you can move it afterward. Now click on the Save button to get your job completed. Go and check out your destination. You have a PDF version of your word document on your device.

How to use pandoc to convert files on the linux command line

OK, there aren’t quite that many file formats. That said, you’ve probably never heard of many of the formats that are commonly used enough to warrant listing on Wikipedia. Chances are, you’ll never see and never use most of them. If, however, you want or need to convert between file formats, then there are a quite a few applications for the job.

Let’s take a look at three solid file conversion tools for the Linux command line.

Pandoc

Everyone I know who works with markup languages says Pandoc is the go-to utility for converting between those languages. And for good reason: Pandoc not only does some pretty nifty conversions, it’s fast, too.

Have a file formatted with Markdown that you want to convert to a LibreOffice Writer document? How about a LaTeX document that you want to turn into an EPUB? Or maybe you have an HTML file that you want to turn into a slide deck. Pandoc is up to all of those tasks. And more.

Here’s how to use Pandoc for a simple conversion (in this case, from HTML to reStructuredText):

You’re not just limited to straight conversions. You can, for example, add a table of contents, typographic quotes, custom headers, and syntax highlighting to the resulting file. Take a peek at Pandoc’s documentation for details.

Pandoc, however, only handles text-based files. What happens if you have a binary file, such as a word processor document? Help at the command line comes from an unexpected source.

LibreOffice

You’re probably thinking, “Hold on! LibreOffice is a GUI application.” Yes, it is. But what many people don’t know is that you can run LibreOffice from the command line to quickly convert one or more files.

How? To, for example, transform a LibreOffice Impress slide deck to PDF, you’d type the following:

You’d just replace pdf with the extension of whatever file format you want to convert to. The –headless option, in case you’re wondering, stops an empty LibreOffice window from opening on your desktop.

Using LibreOffice at the command line to convert a single file is overkill. However, turning to the command line is a great way to convert several files at once. If, say, you want to convert all of the Microsoft Word documents in a folder to LibreOffice Writer format, you’d type:

The conversion takes far less time than opening all of those files in LibreOffice Writer and doing the conversion manually.

FFmpeg

Whereas Pandoc is the Swiss Army Knife for converting between markup languages, FFmpeg is Pandoc’s opposite number for audio and video formats.

FFmpeg is a set of libraries and executables that give you the ability to convert seemlessly between nearly any format.

Here’s an example of a simple conversion of a video from AVI to Ogg Theora:

FFmpeg can do a lot more than that. You can set the frame rate of videos and add subtitles to them, change the aspect ratio, change the quality of audio, and more.

The command line can get quite crowded with those options, should you choose to use more than a couple of them. It’s easy to forget the options, especially if you only use FFmpeg every so often. Take it from an old technical writer: There’s no shame in reading the documentation.

Do you have a favorite command-line file conversion tool? Feel free to share it by leaving a comment below.

Pandoc is an open source command-line utility that serves as a format converter, changing files between markup languages. It was created in 2006 by John MacFarlane and written in Haskell. This tool is compatible with Windows, CentOS, and most Unix-like systems.

A markup language is an annotation system used to format text in a visually distinctive way. In short, markup languages are vital in making the Internet pretty.

Here are some examples of markup languages:

  • HTML
  • XML
  • Markdown (considered lightweight markup)

A great tool to use when dealing with several files using different formats is Pandoc.

The goal of Pandoc is to only convert the markup of any given document without modifying its source content. This article will provide an overview of how to install Pandoc on CentOS 7 (also valid for Red Hat Enterprise Linux), along with some basic usage examples.

How to Install Pandoc

Requirements

Before starting with the installation process, there are a few requirements necessary for a successful procedure.

  • SSH access.
  • Root privileges.
  • The Extra Packages for Enterprise Linux (EPEL) repository.

Step 1. Getting Ready to Install Pandoc

Verify the local packages are updated.

Install EPEL if it is missing.

Verify that the repository was added correctly.

Step 2. Installing the Pandoc Package

When installing the Pandoc package, there are several dependencies needed for it to work correctly. Accept the installation to complete the process.

How to use pandoc to convert files on the linux command line

Pandoc Installation

Once complete, it produces an output similar to the image below.

How to use pandoc to convert files on the linux command linePandoc Installation Output

Step 3. Verify the Version of Pandoc

Use the following command to check the version of Pandoc.

Use Cabal to Install Pandoc

Alternatively, we can use Cabal, a package manager for Haskell libraries and programs, to install Pandoc.

Basic Usage

The usage of Pandoc is relatively simple, with a standard format. We can use the –help tag or review the man pages to get more insights into all the supported formats, but we’ll usually use the below syntax.

  • -s (standalone): Produces output with the necessary metadata according to the output format.
  • -f (from): Specify the input format.
  • -t (to): Specify the output format.
  • -o (output): Specifies a new file containing the results of the conversion.

By default, Pandoc sends the converted file to stdout. So, if no output file is specified, it will print the conversion results on the terminal.

For example, say we want to convert the file test.html to plain text. In this case, it will convert the document to a .txt file. Here are the file contents shown within the terminal.

To convert the document, use the below command. Here, we specified that the input format was HTML and the output plain text.

The output redirects to a new file called test.txt.

Additionally, external sources can be used in place of local files. For example, to convert the index of a website to markdown and print the output to the terminal (omitting the -o tag), use the following syntax.

Another exciting feature is that Pandoc tries to guess the input format from the file extensions when missing. So, for example, here is what it looks like converting a LaTeX document to AsciiDoc.

Now, convert this document and check the contents of the newly generated file.

As you can see, Pandoc is a powerful tool for converting markup to plain text with no compromises to the document.

Conclusion

The flexibility of Pandoc is vast, and the range of applications can make a difference compared to not using this tool. The scope of this article was to cover the basics and the process of how to install Pandoc. Still, there are more advanced options, such as adding plugins for custom formats, or support for ebooks and PDF documents, opening an entirely new world of use cases.

For 24-hour assistance any day of the year, contact one of Liquid Web’s Most Helpful Humans in Hosting. We are here to help!

Related Articles:

About the Author: Misael Ramirez

A former support technician, I have a degree in mechatronics; the career suited me because I’m always trying new things. I have a wide range of interests, but mainly I love music, movies (old ones), and physics.

How to use pandoc to convert files on the linux command line

Join our mailing list to receive news, tips, strategies, and inspiration you need to grow your business

Pandoc is the swiss-army knife for converting files from one markup format into another:

What does Pandoc do?

Pandoc can convert documents from

  • markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, EPUB, or Haddock markup
  • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
  • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
  • Ebooks: EPUB version 2 or 3, FictionBook2
  • Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup
  • Page layout formats: InDesign ICML
  • Outline formats: OPML
  • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
  • PDF via LaTeX
  • Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile

What does Pandoc do for me?

I use pandoc to convert documents from

  • markdown
  • HTML
  • Microsoft Word docx (force majeure!), OpenOffice/LibreOffice ODT, OpenDocument XML
  • LaTeX Beamer slides
  • PDF via LaTeX

What does Pandoc do better than the specialized tools?

Accessibility:

Code in markdown is easily readable text. In comparison:

  • Markdown syntax is handier than (La)TeX syntax (Donald Knuth, inventor of TeX , wondered why it took so long to evolve from LaTeX to a more efficient markup language that compiles down to TeX , such as markdown ),
  • in particular Markdown syntax is handier than LaTeX Beamer syntax,
  • math formulas are more easily written in Markdown than in Microsft Word or LibreOffice ,
  • it is especially suited for creating short HTML articles, such as blog entries.

What does Pandoc do worse than the specialized tools?

  • Functions specific to a markup language
    • either cannot be used,
    • or can be used, but may turn compilation into other languages invalid. (The pandoc syntax is as reduced as the common base among all markup languages into which it converts.)
    • the output sometimes rough and needs to be retouched,
    • documentation is incomplete,
    • smaller ecosystem of tools, like editors and IDEs, for example:
      • LaTeX supports forward and inverse document search, that lets you jump from a position in the source TeX file to the corresponding position in the compiled pdf file, and the other way around. There is no such thing for markdown: markdown first compiles to TeX and then to pdf .
      • The Vim plugin for markdown is young and basic in comparison to that for LaTeX which is stable and powerful.

      Markdown is simple, concise and intuitive: Its cheat-sheet and documentation are one.

      Source:

      Output:

      An emphasized itemization:

      • dog
      • fox

      A bold enumeration:

      1. Mum
      2. Dad

      A table

      mum dad
      weight 100 kg 200 kg
      height 1,20 m 2,10 m
      • a Makefile , that sets a couple of compilation options, and
      • a main markdown file, that sets a couple of document options.

      Which parameters can be set by the command line, and which in the document, this choice is somewhat arbitrary and perhaps a shadow of pandoc ’s unfinished state.

      Pandoc parameters

      We can pass many options to pandoc , among those the most important ones (for us) are:

      Makefile

      By a makefile, instead of having to pass the options for

      • compilation,
      • running,
      • checking and cleaning,

      each time on the command line, we call make (run/check/clean) and use those once and for all set in the makefile.

      The command make , corresponding to the entry all: , generates the output file, in our case the pdf document. For example,

      • make docx generates a docx document,
      • make html generates a HTML document,
      • make latex generates a TeX document,

      The option all: latex pdf is the default option, that is,

      • make generates first a TeX and then a pdf document.

      We recommend latexrun as a good LaTeX “debugger”. Still, note that we first have to spot first the error in the TeX , then the corresponding one in the markdown document.

      The command run displays the output file, for example,

      • make run-html shows the HTML document in a browser (such as Firefox ),
      • make run-odt shows the ODT document in LibreOffice,

      The option run is the default option, that is,

      • make run displays the pdf document in a pdf-viewer (such as zathura).

      Finally, make clean removes all output files.

      Main file

      This file sets at the top the title, author and date of the document. Below, additional options,

        one general option, lang that controls for example the labeling of the table of content and references, and

      various TeX options, such as:

      • document type,
      • font size, and
      • depth of the section numbering.

      Let us facilitate compilation and editing of pandoc files, the first by built-in functionality, latter by dedicated plugins.

      Automatic compilation and reload

      To make Vim compile our file after every save, add to the (newly created, if necessary) file

      Pandoc is a small program that has the ability to convert document files from one format into another. It is extremely powerful and has a lot of options. In order to enable Zettlr to import and export files, Pandoc needs to be installed on your computer. Zettlr does not have the capability of importing or exporting in itself. The reason for this is that Pandoc does this job extremely well, it is also free and Open Source, and is available on all platforms that Zettlr supports.

      Zettlr requires Pandoc version 2.0 or higher. Some Linux repositories still have older versions available. If this is the case for you, please install Pandoc from the download page.

      Windows¶

      On Windows, Pandoc can be installed by visiting the download page and retrieving the Windows installer. It can be installed like any other software and should be recognized immediately by Zettlr. You can test if it works by attempting to export or import something.

      In rare cases it may be that Zettlr cannot detect Pandoc even if it is installed. This is especially the case if Pandoc has not been installed into the default directory. If this is the case, you can drop the full path to the pandoc.exe into the corresponding field in the “Advanced” tab of the preferences.

      Please note that due to the fact that Pandoc is a CLI-program (Command Line Interface), it cannot show you whether there is an update available. Simply visit the download page from time to time to get the newest version.

      macOS¶

      On macOS, Pandoc can be installed in a variety of ways. You can install it using an installer package, but due to the ease of use we recommend to install it using Homebrew.

      Recommended method: Homebrew¶

      The preferred method is Homebrew. Homebrew is a package manager that makes it easy to install command line programs such as Pandoc and makes it easy to maintain it. Make sure to install Homebrew, and then simply run the following command in the Terminal:

      To update Pandoc from time to time, use this command:

      This will upgrade all installed formulae (as they are called) to the newest version.

      Installing with Homebrew is recommended, as it is not only faster, but also more convenient.

      Install using the official installer¶

      To install Pandoc the old way, simply head over to the download page and get the macOS installer. Once it is done, Pandoc should be available on your system. Please remember that by installing Pandoc this way, you need to check for updates manually.

      Linux¶

      On Linux, installing Pandoc is simple. Simply use your package manager to search for, and install Pandoc. The provided packages aren’t always up-to-date, but they should fit. If you want to install the newest version, you’d have to download the Linux installer and follow the install instructions on the Pandoc site.

      You may need to set up pandoc-citeproc manually by installing it using the preferred method on your operating system.

      How to use pandoc to convert files on the linux command line

      I prefer to use Microsoft Word for most of my writing but I really like Markdown. I prefer Word because its spell and grammar checker is superior to every other word processor or text editor I have tried. In addition, word has text to speech build in. I use text to speech to have my text spoken to me in order to catch errors and I catch a lot of errors this way. While I write my blog posts in English. English is not my first language and I need these tools to keep spelling and grammar errors to a minimum.

      This blog uses the static site generator Pelican (Update: this blog is now using WordPress) and it generates the blog from ether restructured text or markdown files. I have written about Pelican in my blog post The Static Site Generator Pelican VS WordPress.

      I have been using Pandoc to convert markdown to Word documents or PDFs for years. A Google search for a way to convert from Word to markdown did not give any usable result. Therefore, up until now I have just copied and pasted the text making sure not to do any markdown syntax until after I had done spell checking in Word.

      Then a couple of weeks ago I was reading the Pandoc docs to solve a different problem and I came across the section where it is described how Pandoc can convert from docx to markdown. I do not know if this is new or why Google did not find this for me but I immediately forgot the problem I was trying to solve and began testing it.

      It turns out to be quite simple to convert a docx to markdown. The following example is from the Pandoc demos site.

      However, the generated markdown from the above command has a few issues.

      The lines are only 80 characters long. I do not know why an 80-character line length is the default but I do not like it. This is fortunately quite easy to fix with the option –no-wrap.

      Links do not use the reference style. I prefer the reference style links because it makes the text less cluttered by moving the link itself to the bottom of the file. This is also easy to fix with the option –reference-links.

      With the two options added the command looks like this.

      Now the generated markdown is very readable and close to what I would write myself. I only use Word to write text with simple formatting like lists, italic, bold, and links. The syntax for images and code I add to the generated markdown file along site the metadata that Pelican needs. Although I do not use it at this time, Pandoc can extract images from a .docx.

      The option to extract images from the docx file and more can be found on the Pandoc options page.

      Edit: The options page URL has changed and is now http://pandoc.org/README.html#reader-options

      So there you have it, sometimes what you need is right under your nose :).

      In the last article, we learned how Markdown can quickly help you produce clean HTML code to be used in a website or blog. But what if you also want to produce an ebook using the same content as you have on the web? While the Markdown tool set is targeted at creating web content, there is another tool that allows you to take Markdown and turn it into OpenOffice/LibreOffice documents, PDF’s, or even e-books suitable for a Kindle or other e-reader – Pandoc.

      Installing the pandoc package on an Ubuntu system is dead simple with the following command:

      Once installed, you can immediately use Pandoc in place of Markdown to create HTML with the following command:

      The syntax and flags are as follow:

      • “-r” – read format
      • “-w” – write format
      • “-o” – filename of the output

      What the above command do is to read from a markdown file and output the file in HTML format with the same filename.

      The above example outputs the file in HTML format, but you can use Pandoc to generate other formats as well.

      Open Document Text Format (ODT)

      If you’ll need to exchange your document with people using a more generic office suite, such as OpenOffice/LibreOffice or Microsoft Office, you can convert it to ODT format using Pandoc. If you think you’ll do this often, it’s useful to set up a template beforehand. Firstly, create a simple document (such as a header and a line or two of text) in Markdown and convert it to ODT with the following command:

      Then, open the “pandoctemplate.odt” file in Open/LibreOffice to change the fonts, spacing, margins, etc… to your liking. Be sure to use Styles to configure this – some details on the use of styles are available here. Once your document is set up to your liking, you can use it as a template for creating ODT files from Markdown in the future by adding it to the above command:

      Now when you convert a Markdown file to ODT, it will automatically be formatted with the styles you have created earlier. Pandoc also supports conversion to the new (version 2007 and later) Microsoft Word format with the flag “ –reference-docx=templatefile.docx “.

      How to use pandoc to convert files on the linux command line

      How to use pandoc to convert files on the linux command line

      How to use pandoc to convert files on the linux command line

      Portable Document Format (PDF)

      When I need to generate PDF files from Markdown, I’ll most often convert it ODT, and use either LibreOffice’s Export to PDF function, or if it’s a large group of files, the “unoconv” command line utility. If you’re a LaTeX user, and have a number of packages installed (this section of the Pandoc documentation describes what’s required), you can output PDF’s with the following command:

      Note the absence of the “-w” flag in this case.

      ePub e-Books

      To publish e-books suitable for most electronic readers (ePub is a format handled by almost all readers), you may want to have some items specific to that format prepared in advance. These include:

      • A stylesheet, written in CSS, that describes how the ePub will look
      • Metadata, such as the creator, description, rights to the work, and language
      • A cover image

      If you don’t have these, however, Pandoc will use some reasonable defaults. The following command will convert your Markdown document to an ePub:

      Additional Markdown Tips

      Here are some additional tips and tricks I use in the course of using Markdown for my writing tasks:

      • Since it’s plain text, if you use DropBox to keep files in sync between devices, you can use the built-in text editor to create or update your Markdown documents on the Web. There are also editors available for Linux (I happen to like ReText a lot) and Android (I’ve been switching between Writer and the code editor DroidEdit lately).
      • Also, since it’s plain text, concurrent versioning systems (such as Subversion) do an excellent job of tracking versions and showing the differences between them.
      • Once you’ve converted a couple of documents, and know which flags you need for all the formats you want, you can create a simple shell script that will output them all at once.

      I’ve found Markdown to be an excellent way to draft content, in a “distraction-free” environment (most plain text editors are), that supports output to multiple formats, yet doesn’t require any dedicated applications.

      This repo contains a collection of Dockerfiles to build various pandoc container images.

      Contents

      Docker images hosted here have a the variants “minimal”, “core”, and “latex”.

      • minimal: kept as small as possible. See the pandoc/minimal repository.
      • core: suitable for common conversion tasks; includes additional libraries and programs. See the pandoc/core repository.
      • latex: builds on top of the core image, and provides a basic LaTeX installation in addition. This includes all packages that pandoc might use, and any libraries needed by these packages. See the pandoc/latex repository.

      Note: this section describes how to use the docker images. Please refer to the pandoc manual for usage information about pandoc .

      Docker images are pre-provisioned computing environments, similar to virtual machines, but smaller and cleverer. You can use these images to convert document wherever you can run docker images, without having to worry about pandoc or its dependencies. The images bring along everything they need to get the job done.

      Install Docker if you don’t have it already.

      Start up Docker. Usually you will have an application called “Docker” on your computer with a rudimentary graphical user interface (GUI). You can also run this command in the command-line interface (CLI):

      Open a shell and navigate to wherever the files are that you want to convert.

      You can always run pwd to check whether you’re in the right place.

      Run docker by entering the below commands in your favorite shell.

      Let’s say you have a README.md in your working directory that you’d like to convert to HTML.

      The –volume flag maps some directory on your machine (lefthand side of the colons) to some directory in the container (righthand side), so that you have your source files available for pandoc to convert. pwd is quoted to protect against spaces in filenames.

      Ownership of the output file is determined by the user executing pandoc in the container. This will generally be a user different from the local user. It is hence a good idea to specify for docker the user and group IDs to use via the –user flag.

      pandoc/latex:2.6 declares the image that you’re going to run. It’s always a good idea to hardcode the version, lest future releases break your code.

      It may look weird to you that you can just add README.md at the end of this line, but that’s just because the pandoc/latex:2.6 will simply prepend pandoc in front of anything you write after pandoc/latex:2.6 (this is known as the ENTRYPOINT field of the Dockerfile). So what you’re really running here is pandoc README.md , which is a valid pandoc command.

      If you don’t have the current docker image on your computer yet, the downloading and unpacking is going to take a while. It’ll be (much) faster the next time. You don’t have to worry about where/how Docker keeps these images.

      Pandoc commands have a way of getting pretty long, and so typing them into the command line can get a little unwieldy. To get a better handle of long pandoc commands, you can store them in a script file, a simple text file with an *.sh extension such as

      The first line, known as the shebang tells the container that the following commands are to be executed as shell commands. In our case, we really don’t use a lot of shell magic, we just call pandoc in the second line (though you can get fancier, if you like). Notice that the #!/bin/sh will not get you a full bash shell, but only the more basic ash shell that comes with Alpine linux on which the pandoc containers are based. This won’t matter for most uses, but if you want to write writing more complicated scripts you may want to refer to the ash manual.

      Once you have stored this script, you must make it executable by running the following command on it (this may apply only to UNIX-type systems):

      You only have to do this once for each script file.

      You can then run the completed script file in a pandoc docker container like so:

      Notice that the above script.sh did specify pandoc , and you can’t just omit it as in the simpler command above. This is because the –entrypoint flag overrides the ENTRYPOINT field in the docker file ( pandoc , in our case), so you must include the command.

      GitHub Actions is an Infrastructure as a Service (IaaS) from GitHub that allows you to automatically run code on GitHub’s servers on every push (or a bunch of other GitHub events).

      Such continuous integration and delivery (CI/CD) may be useful for many pandoc users. Perhaps, you’re using pandoc convert some markdown source document into HTML and deploy the results to a webserver. If the source document is under version control (such as git), you might want pandoc to convert and deploy on every commit. That is what CI/CD does.

      To use pandoc on GitHub Actions, you can leverage the docker images of this project.

      To learn more how to use the docker pandoc images in your GitHub Actions workflow, see these examples.

      Building custom images

      The official images are bare-bones, providing everything required to use pandoc and Lua filters, but not much more. Often, one will want to have additional software available. This is best achieved by building custom Docker images.

      For example, one may want to use advanced spellchecking as demonstrated in the [spellcheck] in the Lua filters collection. This requires the aspell package as well as language-specific packages. A good solution would be to define a new Dockerfile and to use pandoc/core as the base package:

      Create a new image by running docker build –tag=pandoc-with-aspell . in the directory containing the Dockerfile. Now you can use pandoc-with-aspell instead of pandoc/core to get access to spellchecking in your image.

      See Docker documentation for more details, for example part 2 of the Get Started guide.

      Code in this repository is licensed under the GNU General Public License, version 2.0 or later.

      Many times, when I use Markdown, I work on one file and when I’m done with it, I convert it to HTML or some other format. Occasionally, I have to create a few files. When I do work with more than one Markdown file, I usually wait until I have finished them before I convert them.

      I use pandoc to convert files, and it’s possible convert all the Markdown files in one shot.

      Markdown can convert its files to .html, but if there’s a chance that I will have to convert to other formats like epub, pandoc is the tool to use. I prefer to use the command line, so I will cover that first, but you can also do this in VSCodium without the command line. I’ll cover that too.

      Converting multiple Markdown files to another format with Pandoc [command line method]

      To get started quickly, Ubuntu, and other Debian distros can type the following commands in the terminal:

      In this example, I have four Markdown files in a directory called md_test.

      There are no HTML files yet. Now I’ll use Pandoc to do its magic on the collection of files. To do this, I run a one-line command that:

      • calls pandoc
      • reads the .md files and exports them as .html

      This is the command:

      If you are not aware already, ; is used for running multiple commands at once in Linux.

      Here’s what the display looks like once I have executed the command:

      Let me use the ls command once more to see if HTML files were created:

      The conversion was a success, and you have four HTML files ready to go on the Web server.

      Pandoc is quite versatile and you can convert the markdown files to some other supported format by specifying the extension of the output files. You can understand why it is considered among the best open source tools for writers.

      Recommended Read:

      How to use pandoc to convert files on the linux command line

      The second way to install is through VSCodium’s plug-in, or extension, manager:

      1. Click on the blocks on the left side of the VSCodium window. A list of extensions will appear. At the top of the list, there will be a search bar.
      2. In the search bar, type: Markdown All in One . The extension will be listed at the top of the list. Click on the Install button to install it. If it is already installed, a gear icon will appear in place of the install button.

      Once the extension is installed, you can open the folder that contains the Markdown files you want to convert.

      Click on the paper icon located on the left side of the VSCodium window. You’ll be given the opportunity to choose your folder. Once a folder is open, you’ll need to open at least one file. You can open as many files as you want, but one is the minimum.

      Once a file is open, bring up the Command Palette by pressing CTRL+SHIFT+P . Then, start typing Markdown in the search bar that will appear. As you do this, a list of Markdown related commands will appear. One of these will be Markdown All in One: Print documents to HTML command. Click on that one.

      How to use pandoc to convert files on the linux command line

      You’ll be asked to choose a folder containing the files. This is so an output directory (called out ) can be made and this is where the HTML files will go. The image below shows that the HTML was made after exporting the Markdown documents. From here, you can open, view, and edit the HTML as you wish.

      How to use pandoc to convert files on the linux command line

      By waiting to convert your Markdown files, you can concentrate more on writing. Conversion to HTML can come when you’re ready – and you have two ways to get that done.

      Imagine this scenario – You have a folder that contains ten, twenty… or fifty PDF files that need to be converted to Word or Excel. But, the question is – how do you do it if you’re using Linux? It is not an easy task to find a reliable PDF tool that can precisely convert a single PDF file on Linux, let alone convert multiple PDFs at once. But, we’ve found one – Able2Extract Professional 12. It converts PDF to all popular formats, including Excel, Word, CSV and AutoCAD. In a nutshell, it can edit PDF content, text, and paragraphs effortlessly in real time. It works well on GNU Linux, Mac OS X, and Windows.

      In this tutorial, I’ll show you how to perform a PDF to Word or Excel batch conversion on Linux with Able2Extract.

      Features

      Just before we jump right into it, here are a few more things about Able2Extract you might find interesting. After all, it’s not your ordinary PDF tool for Linux. It can do pretty much everything.

      Here’s what Able2Extract is capable of, besides the ability to batch convert PDF files:

      • Convert scanned and native PDFs to OpenOffice, Excel, PPT.
      • Edit PDF files (pages, text, images).
      • Creating, editing and filling interactive forms.
      • Add sticky notes, watermarks and other annotations.
      • Create a PDF from almost any printable format.
      • Password encrypt PDFs.

      Right, now back to the main point – batch conversion.

      Steps to Batch Convert PDF Files on Linux

      Just follow the steps described below and you’ll batch convert your PDF files in a matter of minutes.

      Step 1. Install & Run Able2Extract Professional

      Go to the developer’s website and download Able2Extract for Linux.

      How to use pandoc to convert files on the linux command line

      From there, just start the installation and follow the setup wizard to complete it. As of writing this, Able2Extract is available for Ubuntu and Fedora Linux distributions. Once it’s done, run the program.

      Step 2. Locate the PDF files

      In Able2Extract, click on the Batch icon located on the toolbar and the Batch conversion window will show up.

      How to use pandoc to convert files on the linux command line

      Batch Convert PDF Files With Able2Extract

      Now, in the pop-up window there are a few options to choose from:

      • Add Files. – Use this option to add files to the conversion queue one by one.
      • Add Directory. – Use this option to add an entire folder to the conversion queue.
      • Remove Selected – Use this option to remove a file you’ve accidentally added.

      Step 3. Start Batch Conversion

      Once you’ve located all the files that need to be converted, you need to determine the output folder. Check the Same as Source option if you want to keep the files in the same directory or designate a new location by using the Browse button.

      From there, choose the output file format, like Word in our case, enter the security code (not case sensitive) and click on the Convert button at the bottom.

      How to use pandoc to convert files on the linux command line

      That’s it. Able2Extract will start to convert your PDFs. Time to completion depends on the volume of PDF files you previously inputted.

      As you can see, the process is as simple as it can get. Although Able2Extract is a proprietary software tool that comes at a price tag of $149.95, it’s definitely worth checking out. On the bright side, you’re not only getting a tool that can convert PDF files, it can tackle all your major PDF needs. You already know that there aren’t many viable solutions when it comes to professional PDF tools for Linux, but Able2Extract is one solution that is more than capable of doing what you need to do with your PDFs.

      Have you tried Able2Extract already? Great! Tell us what do you think about it in the comment section below.

      I have a typical scientific manuscript in a LaTeX .tex file, and I need to convert it to MS Word .doc file. The reason for having to convert to MS Word is I’m submitting the manuscript to an academic journal and they only accept MS Word (I know. )

      The manuscript includes title page, figures, tables, equations (inline and in their own align environment), footnotes, bibliography, and an annex. The tables are in their own separate tables.tex file, which I include using the \include command. Most tables take up a whole landscape page, and were generated sing the package pdflscape . I am using Windows 7 Professional.

      My plan is to use pandoc to go from .tex to .odt , open the latter in Libre Office, and convert to .doc . I have read a related question but it is too general. Similarly the examples in the Pandoc website are too simple. I have played around but I am unable to accomplish what I want. This is surprising since converting a scientific manuscript is probably the most common use case for Pandoc. Here are some sample failures:

      Example 1

      I open a command line in the project folder, and execute the following:

      I get this error message:

      where figure1 is the name of a figure file (e.g. figure1.png ) in the project folder referenced in a line as \includegraphics[width=5.8in] . I suspect pandoc expects a .png extension but not sure how to provide it.

      Example 2

      Next I try .html , and excute the following:

      The program executes fine. I open HTML file. Footnotes are there but figures are missing, tables are displayed as LaTeX, bibliography is missing, in-line math displays well, but math in align environment does not, section labels are displayed, and some other minor issues.

      So given that mine is probably a typical use case scenario, my question is this: What commands should I use to get the .odt file I want? I could not find a fully worked out example on the web.

      Here is a specific list of errors. I’ll update how I corrected them based on community suggestions:

      PDF or Portable Document Format is mostly the first choice when it comes to printing, sharing and emailing documents, especially the larger ones. For Windows and macOS, you might be very much familiar, and also dependent on, the widely used Acrobat products for pdf creation, viewing, and editing. Unfortunately, there is no default, dedicated, pdf-creator available on your Linux systems. You can, however, use the LibreOffice products to create PDF files in Ubuntu. In this article, we will explain how you can use the Ubuntu command line, the Terminal, in order to convert and batch convert .doc and .docx files to their pdf versions.

      Why the Command Line?

      If you are a Terminal-savvy person, you wouldn’t want to leave the comfort of the command line and go somewhere else to do any of your daily technical activities. There is always a way to do almost all of our stuff right inside the Terminal. So, why should pdf conversion be any different! Using the Terminal makes certain tasks more efficient and, even faster. The command-line tools do not use too many resources and thus form great alternatives to the widely used graphical applications, especially if you are stuck up with older hardware.

      We have run the commands and procedures mentioned in this article on a Ubuntu 18.04 LTS system.

      Using the LibreOffice CLI ‘Lowriter’ for pdf conversion

      LibreOffice Write is part of the LibreOffice package and is mostly available by default in most Linux distros. If your system lacks it, you can easily install it from the Ubuntu Software Manager:

      How to use pandoc to convert files on the linux command line

      Here, we will be making use of the CLI of the same in order to convert our documents to pdfs.

      Here is how you can use the Lowriter:

      Open your Ubuntu command line, the Terminal, either through the Ubuntu Application Launcher search or by using the Ctrl+Alt+T shortcut.

      Please make sure that lowriter is installed on your system by running the following command:

      Convert a single file to PDF format

      Use the following command syntax in order to convert a single file located in your current directory:

      Here is how I converted a .docx file to pdf. located in my Downloads folder. Advertisement

      As you can see above, when I listed the contents of my current folder through the ls command, I could see the newly converted pdf file listed as well.

      Batch Convert files to pdf

      Use the following command syntax to batch convert all .doc or .docx files to pdf, located in your current directory:

      For .docx files, use:

      How to use pandoc to convert files on the linux command line

      This is how you can make use of the LibreOffice Writer’s CLI to convert your documents from .doc and .docx to pdfs. No extra installations or lengthy procedures are required and you have exactly what you need; a .doc/.docx to pdf conversion right through the Ubuntu command line.

      Karim Buzdar

      About the Author: Karim Buzdar holds a degree in telecommunication engineering and holds several sysadmin certifications. As an IT engineer and technical author, he writes for various web sites. You can reach Karim on LinkedIn

      PDF or the Portable Document Format is mostly our first choice when it comes to printing, sharing and emailing documents, especially the larger ones. For Windows and macOS, you might be very much familiar, and also dependent on, the widely used Acrobat products for pdf file creation, viewing, and editing. Unfortunately, there is no default pdf creator available on your Linux systems. You can, however, use the LibreOffice shell tools to create PDF files in Debian. In this article, we will explain how you can use the Debian command line, the Terminal, in order to convert and batch convert .doc and .docx files to their pdf versions.

      Why the Command Line?

      If you are a Terminal-savvy person, you wouldn’t want to leave the comfort of the command line and go somewhere else to do any of your daily technical activities. There is always a way to do almost all of our stuff right inside the Terminal. So, why should pdf conversion be any different! Using the Terminal makes certain tasks more efficient, and even faster. The command-line tools do not use too many resources and thus form great alternatives to the widely used graphical applications, especially if you are stuck up with older hardware.

      We have run the commands and procedures mentioned in this article on a Debian 10 Buster system.

      Using the LibreOffice CLI Lowriter for pdf conversion

      LibreOffice Write is part of the LibreOffice package and is mostly available by default in most Linux distros. If your system lacks it, you can easily install it from the Debian Software Manager:

      How to use pandoc to convert files on the linux command line

      Here, we will be making use of the CLI of the same in order to convert our documents to pdfs.

      Here is how you can use the LOwriter from the command line:

      Open your Debian command line, the Terminal, through the Debian Application Launcher search as follows:

      The Application Launcher can be accessed using the Super/Windows key.

      Please make sure that lowriter is installed on your system by running the following command:

      How to use pandoc to convert files on the linux command line

      Advertisement

      Convert a single ODT, DOC or DOCX file to PDF

      Use the following syntax in order to convert a single file located in your current directory:

      Here is how I converted a .docx file to pdf located in my Downloads folder.

      In Case of Error:

      If you get the following error while trying to convert the file:

      Then, try installing the libreoffice-java-common package as follows:

      As you can see below, when I listed the contents of my current folder, after the pdf conversion process, through the ls command, I could see the newly converted pdf file as well.

      Batch Conversion of DOC and DOCX or ODT files to pdf

      Use the following syntax to batch convert all .doc or .docx files to pdf, located in your current directory:

      This is how you can make use of the LibreOffice CLI to convert your documents from .doc and .docx to pdfs. No extra installations or lengthy procedures are required and you have exactly what you need; a .doc/.docx to pdf conversion right through the Debian command line.

      Karim Buzdar

      About the Author: Karim Buzdar holds a degree in telecommunication engineering and holds several sysadmin certifications. As an IT engineer and technical author, he writes for various web sites. You can reach Karim on LinkedIn

      According to official site, Pandoc is your swiss-army knify to convert files from one markup format into another.

      Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, EPUB, or Haddock markup to

      • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
      • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
      • Ebooks: EPUB version 2 or 3, FictionBook2
      • Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup
      • Page layout formats: InDesign ICML
      • Outline formats: OPML
      • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
      • PDF via LaTeX
      • Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile
      • Custom formats: custom writers can be written in lua.

      How to Install Pandoc

      As for Windows users, download a package installer at pandoc’s download page and install on your computer. After that, run pandoc -v in command prompt to verify if it is correctly installed.

      NOTE: The default package doesn’t support PDF output, additional tool LaTeX is needed. MiKTeX is recommended by official site. However, it does have some issues with Chinese characters exporting. In this case, CTeX Full instead is a better choice.

      For users of Mac OS X or Linux, refer to offcial site for more information about installation.

      How to Convert Document With Pandoc

      Convert a webpage(html) to docx

      Convert a html to markdown

      Convert a html to pdf

      Convert a markdown to mediawiki

      How to Export Document with Chinese Characters to PDF

      If your task is all about documents with English characters only, you can skip this section. This part talks about problems of exporting documents with Chinese characters to PDF.

      Install CTeX Full instead of MiKTeX

      Export Pandoc standard template using the following command:

      Open the template template.tex and find phrase % if luatex or xelatex , add the code below after this phrase.

      Note:
      In my version of Pandoc(1.13.2), below is the default code after phrase % if luatex or xelatex .

      Errors occur if just add code after Line#20. Finally, it turns out to be OK to add the code at Line#27.

      Export documents to PDf using the following command:

      template.tex is just the template modified in stage 2.

      Thanks to this blog for solving the problem.

      According to another blog, it’s also possible to download pm-template.latex and use this template to export documents to PDF. For this template, the only thing needs to be noticed is, replace LiHei Pro to a Chinese font you have installed in your machine.

      Pandoc’s Markdown

      Pandoc’s author is really proud of its extension of markdown, or he wouldn’t put 2/3 of the document talking about it.

      How to Produce Slide Shows with Pandoc

      It’s fantastic to find that simple and concise slides can be made by Pandoc. One could keep collecting knowledges while occasionly transform them to slides to share with other people, without put so much time considering how to write PPT.

      Markdown is a popular text formatting syntax among developers these days. Popular Sites like Github or Bitbucket use Markdown for project documentation and various other types of user generated content. These sites automatically convert markdown syntax to HTML, so it can be displayed in a browser.

      However, maybe you want to use Markdown as document format without using a platform that does the conversion for you. Or you are in need of an output format other than HTML. In this case you need a tool that can convert markdown to the desired target format. Pandoc is is a document conversion tool that can be used for exactly this (and a lot of other things). With Pandoc you can convert Markdown documents to PDF, HTML, Words DOCX or many other formats.

      After installing Pandoc, you can simply run it from command line.

      Note: By default, Pandoc uses LaTeX to generate PDF documents. So, if you want to generate PDF documents, you need to install a LaTex processor first (list of required LaTeX packages).

      To convert a doc.md Markdown file into a PDF document, the following command can be used:

      Pandoc is able to merge multiple Markdown files into a single PDF document. To generate a single PDF document out of two Markdown files you can use:

      By default the page margins in the resulting PDF document are quite large. You can change this by passing a margin parameter:

      To create HTML or DOCX documents you simply have to change the file extension of the target file:

      The resulting documents are well formatted. The following two screenshot show a DOCX and a PDF document created out of two small example markdown files:

      However, just because it’s ubiquitous and easy to get started with doesn’t mean it’s always the right choice, because it’s not without its shortcomings.

      Here are a few key reasons why you shouldn’t use Markdown for technical documentation:

      • There is no formalised standard (though this is changing)
      • It only supports limited formatting functionality; admonitions, footnotes, advanced tables anyone?

      Write the Docs founder Eric Holscher wrote an excellent article on why it’s not the right choice for technical documentation. I strongly recommend you read it!

      If you and your team have already invested a lot of effort in creating Markdown-based documentation, it’s likely that you’ve already experienced more than a few of these limitations?

      While it’s easy to get started with, and so many people know about it, the more you use it the more you’re going to hit these limitations and wish you had something better.

      So, when this time comes, when you’re ready for a more feature-rich file format, such as AsciiDoc and reStructuredText, you’re going to need a tool to migrate your existing content. Gladly, there’s an open source tool that just about does all you need; it’s called Pandoc.

      What is Pandoc?

      Self-described as a “general markup converter”, Pandoc converts content from one markup format to another. It can read and write numerous formats, including:

      • CommonMark, Daring Fireball Markdown, GitHub-Flavored Markdown, & Multi-Markdown
      • DocBook
      • EPUB
      • HTML/HTML5
      • LaTeX
      • MediaWiki markup
      • Microsoft Word docx
      • ODT
      • Subsets of Textile
      • reStructuredText

      As you can see from this list of file formats (and from its man page), Pandoc is an incredibly powerful tool. It formed the core of several migrations I undertook for ownCloud over the last few months, as their documentation was migrated from Sphinx-Doc to Antora.

      During the course of those migrations, I learned so much about what it can do, and just how easy it makes converting file formats commonly used for technical documentation.

      Install Pandoc

      If you don’t have it installed already, it’s, generally, quite straightforward to do so. If you’re using a Linux distribution, then it should be available via your distribution’s package manager. Otherwise, use the instructions in the Pandoc documentation.

      How To Convert Markdown to AsciiDoc

      After you’ve installed Pandoc and have a sample Markdown file (or a host of files) ready to convert, use the following command example, changing the name of the input and output file as necessary. To summarise, this will convert the Markdown file file.md to AsciiDoc format, and name it file.adoc .

      The options passed to Pandoc will:

      • Convert the headers to my preferred style, ATX style headers, instead of the default Setext style headers
      • Add an automatic table of contents at the top of the file
      • Not artificially wrap lines at an artificial line length. I prefer this as I’m a fan of one line per-sentence, one of the recommended AsciiDoc best practices.
      • Use reference-style links, rather than inline links
      • Produce output with an appropriate header and footer

      Have a read of Pandoc’s man page and see if there are other options that would be helpful to use.

      If you want to (recursively) convert an entire directory structure of Markdown files, here’s a Bash one-liner to help you do that:

      This uses find to find all the Markdown files located under the current directory, and passes them to Pandoc to convert, just as in the earlier example.

      The new files will be named the same as the Markdown file, but have an extra .adoc extension added to their name. For example, if the original file was myfile.md , then the new file will be named myfile.md.adoc . To get around that, here’s another Bash one-liner to remove all of the .md extensions from the AsciiDoc files:

      Want a More Natural Solution?

      If you’re looking for a more natural solution for migrating from Markdown to AsciiDoc — one developed by the Asciidoctor team — then checkout my follow-up article on Kramdoc.

      In Conclusion

      And that’s the basics of how to migrate Markdown documents to AsciiDoc (or another format) using the power tool Pandoc.

      As I alluded to, it’s not perfect, as there are things that it doesn’t quite get correct. For example, it doesn’t always convert code blocks from one format to another correctly. However, it does perform a majority of the migration legwork for you.

      Imagine trying to do it on your own, or to create a custom migration script. If you did, or felt that you had to, you’d spend more time writing and maintaining the script than maintaining your content – which isn’t what you’re there to do. So I strongly recommend that you give it a go, and see how you go.

      If you’ve been using it for some time, I’d love to hear your experiences with it.

      This tutorial shows you how to use pandoc to convert files in various formats.

      Table 1 shows current conversions.

      Table 1 Table 1. Conversions ¶

      Convert From To
      reSTructured Text to HTML .rst .html
      Markdown to reSTructured Text .md .rst
      reSTructured Text to HTML .rst .html

      Scope¶

      Conversion should be a preliminary step in migrating to Sphinx for your documentation project. The results of conversion will complete about 80% of the work. It’s expected that you’ll edit converted files as post-processing. We recommend studying the Pandoc website to learn more.

      In the Clear Linux* documention, we established Documentation Contribution Guidelines, which explain conventions and usage of reST syntax. We encourage you to establish similar guidelines for your team. Even minimal initial investment in guidelines greatly reduces future work. More importantly, providing clear guidelines supports your team’s long-term maintence efforts.

      Note: See also the Clear Linux reSTructured Text Guide.

      Prerequisites¶

      Introduction¶

      We recommend following reST conventions and best practices for Sphinx. To learn more, visit the Sphinx website

      On your host machine, clone the files in the latter directory.

      Navigate to the cloned directory with your CLI.

      You do not need to initialize this directory for Git or set up a remote.

      In your text editor, view the files. They are now in a finished state.

      Remove only two documents with the .html and .rst formats.