Document Composition

Using LaTeX, Markdown, pandoc #

Now that you are familiar with the Terminal and have installed the software we are going to be using you are ready to begin addressing the core competencies of working with plain text software for your research and writing. The essential workflow that we are adopting here is writing in Markdown, a simple markup langauge with a few extra bells and whistles, and then using the pandoc program to convert these into HTML, PDF, ad Word files. To establish why this is the appropriate workflow to adopt I will outline what markup langauges are and how they work, as well as the conventions used in LaTeX and then Markdown.

LaTeX #

Though you won’t need to write with LaTeX you will need to know what a LaTeX document looks like and how to work with one. Once you have a feel for the LaTeX document setup, you will be able to create your own document templates for pretty much any use. For example, I have LaTeX templates for memos, letters with my contact and university affiliation, article drafts, class handouts, single columned general prose, double column general prose, and even the documentation you are looking at right now was created using this workflow.

LaTeX at bottom is something called a markup language, or a means of encoding text where different characters are used to designate structure, formatting and style. In a Word document, all that appears on your screen is the text you are composing, and its style and structure are handled by the program. If you want to change the type face, or the margin, or the line spacing, you go to the top bar and click through menus to change the style. There is a lot of behind the scenes action going in a Word file as the means by which document is styled is hidden from you. In a markup language, all that information is present in the document itself as you explicitly declare the style and structure using the conventions of that particular language. To give a specific example, where in Word you would click the italics button to make text italics, in a markup language like LaTeX you wrap the text you want to be italicized with the command \textit{}. There is a lot more that goes into it, and diving into practical examples will help to give you an idea of how markup languages work and LaTeX in particular.

LaTeX is notorious for being unintuitive and hard to pickup and is the bane of any graduate students existence when they take STATS102 for the first time. The thing about LaTeX is that it is complicated and unintuitive but there are only a few complicated and unintuitive conventions to understand before LaTeX seems a lot clearer. Below is an example of the simplest LaTeX document you can make, and examining it helps us to begin to understand how markup langauges are different from something like Word and what LaTeX in particular looks like:

\documentclass{article}

\begin{document}

  Hello World!

\end{document}

In LaTeX, anything that begins with a backlash “\” is a command, and commands usually have brackets at the end to take in some arguments. You can think of working with a LaTeX document like working in narrower and narrower environments. In any file, you first need to declare the kind of document you are creating with \documentclass and then declare the beginning of the document with \begin{document}. The \documentclass command creates the underlying environment with some associated styles in which you will be composing prose, before the \begin{document} command creates the environment in which the prose and content of your document go.

The convention of open and closing environments applies to most features of a LaTeX document. We open a document with \begin{docuemnt} declare the end of the document with \end{document} while figures or images in your document need to be between the commands \begin{figure} and \end{figure} and the same is true of tables, which go between\begin{table} and \end{table}. The body of your document can be 2 words or 200 pages worth of content, and I simply use the “Hello World!” document as an example. You could make that document your self in a matter of seconds.

The \documentclass command specifies how the document will be formatted and loads a set of default settings. There are a variety of options to choose from depending on your need including article, book, report, slides, and letter. A very common document class that you’ll see is memoir which is used for general prose writing. This command can take in arguments to further specify the formatting of your document and usually includes things like the paper size, whether it is one or two sided, font size, line spacing, among many other things. Here is an example where we are setting a document to have 11pt font, to be one sided and use formatting for an article type document, using all the bells and whistles from the memoir document class:

\documentclass[11pt,article,oneside]{memoir}

Another convention in LaTeX is to declare metadata and other information in the preamble of the document. The preamble is the part of the file that is above the \begin{document} command. In the above example the preamble includes only the \documentclass command but you can do a lot more here including telling LaTeX about all the formatting of your document as well as loading any packages that you will be using to create your document. Also, you can declare metadata in the preamble that then will appear in your document such as the author name, title of the document, abstract, date, and pretty much anything else. Here is another simple example:

\documentclass{article}
\title{My First \LaTeX Document}
\date{1/1/1597}
\author{William Shakespeare}

\begin{document}

\maketitle

Hello World!

\end{document}

The \title, \date and \author commands are components of the \maketitle command that you can invoke after you begin the document to create a well formatted title block for your document. The above code renders a document that looks like Figure 1.

A document made with LaTeX.

Figure 1: A document made with LaTeX.

As mentioned above, the body of the document (here it is just “Hello World!”) can be as long as you want and encompass all sorts of different things that are particularly helpful to social scientists who write documents that use data to make an argument. For example, and we will cover all of these shortly, you can include tables, figures, citations, theorems, proofs, equations, diagrams, code blocks, and any number of other information that you want typeset in a beautiful way.

To summarize here are the important parts for understanding a LaTeX document and markup languages in general. Remember to:

  1. Declare the beginnings and ends of environments in which different formatting or content is located.
  2. Include metadata and package calls in the preamble of the document.
  3. Render formatting using commands like \textit{} or \textbf{}

Learning the exact conventions of LaTeX is a task that is worth exploring. If you want to be a truly autonomous plain text writer and researcher, you will have to learn more about LaTeX so you can create your own templates. But, if you want to stick to thinking and writing then you’ll need to know more about the accessible alternative to LaTeX: Markdown.

Markdown #

Markdown is a much easier to learn and intuitive markup language than LaTeX and is becoming more and more ubiquitous. In Markdown, italicized text is created by wrapping the text in *asterisks*, while bold is done with **two asterisks**. There are certain limitations to Markdown; for instance there is no underlining in Markdown so you have to get by with the other ways of styling text. Here is an example of some text in Markdown before being typeset:

# Header 1

You might have some introductory comments

## Sub-Header

Some text with different styling such as *italics* and **bold**.
Remember there is no underlining which is old fashioned anyways, but
you can insert [Hyperlinks](https://en.wikipedia.org/wiki/Hyperlink)
and create tables:

| Movie          | Actor         |
|----------------|---------------|
| Apocalypse Now | Marlon Brando |
| Wizard of Oz   | Judy Garland  |
| Sound of Music | Julie Andrews |

There are also other important aspects of academic writing.^[Like footnotes that are rendered at the footer of the page in which they appear.]

![Here is a figure inserted into a Markdown file.](figures/healy_plaintext_vis.png)

The good thing about Markdown is that it is much easier to use than LaTeX so instead of messing around with a document preamble, document classes, and style files you can get straight to writing your document. I am not going to cover all the different Markdown conventions in this documentation but you can find them here. Markdown is pretty straight forward but you will still want to take advantage of all the fancy features of LaTeX while maintaining the usability and ease of Markdown. Luckily, someone already thought up this idea and created the program pandoc.

pandoc #

pandoc is a command line based program that allows you to convert files between different file formats. It is a pretty powerful program because it will take the markup conventions from one format and translate them into another. So, when you write your next article in Markdown and really emphasize some part (“And to reiterate again, it is the Lumpenproletariat that ultimately must be motivated by a vanguard party to seize the means of production from the Petty bourgeoisie.”) pandoc will convert the text surrounded by *asterisks* (which indicates italics) and converts them to an \textit{} command. pandoc is the key link in the workflow that allows us to convert our simple Markdown files into LaTeX files and then into PDFs and how we are able to accommodate the different resources we need for academic writing, like bibliographies, footnotes, tables, figures, and diagrams.

The basic usage of pandoc looks like the following:

pandoc my_doc.md -o my_doc.pdf

On the left, directly after pandoc is the Markdown file that we want to convert to a PDF file. The -o flag indicates that the my_doc.pdf file is in fact the intended output document from the input. That is pretty simple, but so too will be the output PDF. Compare the line above to this one:

pandoc my_doc.md -o my_doc.pdf --from=Markdown --template=/Users/timothyelder/.pandoc/templates/eisvogel.latex --listings --filter pandoc-xnos

That is the command for rendering the document your are looking at right now. It does the same thing that we saw above, specifying an input file and then an output file, but it includes all sorts of extra flags that indicate how to handle different attributes of your document and the proper formatting for the output. The --from= flag makes it perfectly explicit to the software what the input format is (in this case it is Markdown), while the --template= flag tells the software where to look and which template to use when rendering the output. The final two flags (--listings and --filter pandoc-xnos) indicate the style for code blocks (those shaded areas that display computer code throughout this document) and how to cross reference different parts of the document respectively.

It is important to get clear on what is actually going on when we use the pandoc command to create a PDF. What we are doing can be seen in Figure 2.

Writing in Markdown and typesetting into a pdf with a tex intermediate step.

Figure 2: Writing in Markdown and typesetting into a pdf with a tex intermediate step.

We write our files using the conventions of the Markdown format (indicated by the .md file extension) and then use pandoc to typeset these into a PDF. pandoc is not a typesetting software itself per se but relies upon LaTeX to typeset into PDF. When you invoke pandoc to create a PDF, pandoc performs a quiet intermediate step where it creates a LaTeX copy of the Markdown document (the .tex file in Figure 2) and typesets that into a PDF by invoking the LaTeX program. pandoc is super smart, and understands how to translate the conventions of one markup language into another, and that is how we are able to write in Markdown (with all its simplicity and speed) while still taking advantage of all the power of LaTeX.

We are able to add a few bells and whistles to pandoc to define the specific formatting of the document. We do this partly by manipulating the characters and conventions in our document directly, but also outside our document, by invoking templates and style files that you can install. This can be seen in Figure 3.

There are a few extra little dependencies.

Figure 3: There are a few extra little dependencies.

Document Composition with Markdown #

In this section, I want to explain how you put together the various pieces that we have up to this point only been reading about. We are going to be covering:

  1. Inserting figures
  2. Creating tables.
  3. Including math, both numbered equations and inline math

Writing the Document #

This part is pretty self explanatory. You type characters, which form words, words create phrases and sentences, and you keep doing that until you have something that looks like a defensible argument. Once you have that, you put in all the nifty evidence that you have discovered to support your argument and there are all sorts of things that go into that.

Headers and Metadata #

Where in a LaTeX document you put metadata such as the author name, date, title, abstract, etc. in the preamble of the document, in Markdown you use what is called a YAML (yam-mul) header. Since you don’t want the metadata being represented in the document as it is, you need to contain it in special characters (sort of like how you create an environment in LaTeX with \begin{} and \end{}). Three dashes opens and closes your YAML metadata header like this:

---
title: My First Markdown Paper
author: Timothy Elder
date: February 17th, 2023
---

The information that the YAML header contains is arbitrary though there are a few standard entries like “author”, “title”, “date”, and “abstract”. Because pandoc translates your Markdown document into LaTeX before typesetting, you have the full arsenal of LaTeX options available to you, which means you can change the font, margins, headers and footers among many many others. Too many to cover here but you can review the full set of options in the pandoc documentation here (see the section “Variables for LaTeX”).

Figures #

A figure or image in a Markdown file can be included with the following notation:

![Here is where you put your caption.](/absolute/path/to/image.png){@fig:example-crossref-label}

The image will render on the page around where you insert the line but you will not have exact control over it unless you go edit the intermediate .tex file. In our templates we specify some options that help images render in an attractive way and LaTeX uses an algorithm for image placements when not manually placed. Instead of manually numbering figures and referencing them in the body of your paper, you assign them a label, which follows the call to the image, wrapped in curly brackets with the @fig: prefix. Rather than having to keep track of how your figures are numbered, when you want to reference the figure in the body of your paper, you do so like this:

Here I am explaining the really impressive figure that I made that is going to
definitively demonstrate some strong association between two variables. This
can be seen in Figure @fig:example-crossref-label.

This means of cross-referencing figures is really convenient especially when you decide to reorder your paper, and suddenly figure #5 is where figure #1 was and all the labels would have to be manually changed if you were using something like Word. NOTE: Whenever using pandoc-xnos, include it with --filter pandoc-xnos as a flag for the pandoc command in.

Tables #

Tables are pretty easy in Markdown and can be styled different ways. This is a pretty bare bones table:

Table 1: Some information about Star Wars characters.
Character Height (cm) Mass (kg) Homeworld Species
Han Solo 180 80 Corellia Human
Wedge Antilles 170 77 Corellia Human
Yoda 66 17 NA Yoda's species
Ackbar 180 83 Mon Cala Mon Calamari
Mace Windu 188 84 Haruun Kal Human

This table is created with the following code:

------------------------------------------------------------------------
   Character      Height (cm)   Mass (kg)   Homeworld       Species
---------------- ------------- ----------- ------------ ----------------
    Han Solo          180          80        Corellia        Human

 Wedge Antilles       170          77        Corellia        Human

      Yoda            66           17           NA       Yoda's species

     Ackbar           180          83        Mon Cala     Mon Calamari

   Mace Windu         188          84       Haruun Kal       Human
------------------------------------------------------------------------

Table: Some information about Star Wars characters. {#tbl:id}

You will be able to output from most statistical software into the Markdown format or LaTeX. And tables in Markdown can also be labeled, just like figures, so you can reference them without keeping track of what order they appear in.

Math #

If you are a quantitative scholar and need to describe your modeling strategy you can do that in Markdown. This requires a bit of knowledge of LaTeX: as mentioned above, pandoc converts your Markdown to LaTeX before typesetting, you can use LaTeX conventions when we really need to. This is such an instance. So if you want to explain to someone the standard deviation of a population you would write the formula using LaTeX math commands (\sigma, \sum, \frac, \bar, exponents and subscripts, etc.) on its own line and wrap it in double dollar signs ($$) like this:

$$ \sigma = \sqrt{\frac{ \sum (x_i - \bar{x})^2 } {N - 1}} $$

which renders your math beautifully:

\[\sigma = \sqrt{\frac{ \sum (x_i - \bar{x})^2 } {N - 1}} \]

You can also insert math inline with your text \(\mu = \frac{ \sum_{i=1}^{n} x_i} {n}\) using single dollar signs in line with the rest of your text, like:

When $a \ne 0$, there are two solutions to $(ax^2 + bx + c = 0)$ and
they are $x = {-b \pm \sqrt{b^2-4ac} \over 2a}$.

Citations #

Another key part of academic writing is including citations to other works that inform your research. To do this you will need to use a .bib file which is generated using a bibliographic software like Zotero or Mendeley. I highly recommend using a .bib file because it saves a lot of time formatting and creating the proper list of cited references.

To include citations in your .Rmd file you will need to specify where your .bib file is located by specifying its path in the YAML header at the top of the document. We have an example .bib file included in the directory that contains this source file called plaintext.bib with just a few sources in it. You then reference an item in that .bib file by writing an @ symbol followed by the “citation key” of the item you want to cite, like this @durkheim_division_2008 which renders a citation like this Durkheim ([1893] 2008). All the citations that you use in your document will appear at the end of your paper, and including a # References section at the bottom of your RMarkdown or Markdown document will add a section heading before the bibliography is printed.

There are a variety of ways in which you can customize your bibliography so it matches the proper format of whatever journal you might be submitting to. That will require that you use something called a CSL file, which can also be included in the YAML header at the top of your document. There are a lot of CSL files people have created for the various citation conventions, and there is probably one available for the journal you’ll be submitting to. You can find some here.

Make #

It can be very annoying to continually run the same lines of code over and over again, when you are nearing the completion of working on some document, particularly if you are obsessive and neurotic and messing with some minor part of a paper that you want to get perfect. A very helpful resource that computer scientists thought up was to create a system for handling running code called Make. Make automates the process of, for lack of a better word, making things. This is somewhat advanced when it comes to typesetting documents so if you want to skip this part or skim that is totally fine. I will note that this tool will be particularly helpful if you are a quantitative scholar and have different scripts that handle the creation of data, figures, or tables etc. etc.

Make is particularly powerful because it has a system for checking whether or not a given output file needs to be changed given some change in the input file. For example, say you have a recipe (recipes in Make are the different commands that you want it to run) that creates your document that looks like this:

## Location of Pandoc support files.
PREFIX = /Users/timothyelder/.pandoc

## Location of your working bibliography file
BIB = /Users/timothyelder/Documents/bibs/socbib-pandoc.bib

## CSL stylesheet (located in the csl folder of the PREFIX directory).
CSL = asa

dissertation.pdf: dissertation.md
	pandoc -f --pdf-engine=xelatex --template=$(PREFIX)/templates/ucetd.template --top-level-division=chapter --filter pandoc-xnos --citeproc --bibliography=$(BIB) dissertation.md -o  dissertation.pdf

To typeset the dissertation.pdf file given the recipe in the Makefile, you would simply type into the terminal with the Makefile, make dissertation.pdf and the recipe runs. Make will check to see if any changes have been made in the name of the prerequisite file (the dissertation.md file) which is used to make the PDF. If no change has been made to the Markdown file, then Make will let you know and say make: Nothing to be done for 'dissertation.pdf'.

When you run “pdf”, a couple of python scripts get run in a particular order, then an R script, and then a pandoc line runs and then a shell command. That would be a lot of keystrokes on your own but make does it auto-magically. This is actually a recipe that I have because I have a project where we generate all the data from optical character recognition of JPEGs, and then carving up big plaintext files with regular expressions. Sometimes the underlying data changes because we notice there is some error in the plaintext file and we have to regenerate a whole bunch of CSVs from the plaintext files. What make does is it checks the various pieces of data that you are asking it to use and that you want it to make, and checks whether or not they have changed at all since you last ran Make, and it only runs precesses to update pieces of data as needed. So if you keep running Make without changing anything it will tell you “nothing to update”.

Previous
Next