Skip to content

Format and Syntax Reference

Complete reference for all supported formats, characters, and special syntax in NormaDocs.

Input Formats

Format Extension Description
Markdown .md Primary input format
BibTeX .bib Bibliography/citation files
CSL .csl Citation Style Language files

Output Formats

Format Generator Notes
DOCX Pandoc Always generated
PDF LibreOffice or WeasyPrint Optional via -f pdf or -f all

YAML Frontmatter

YAML frontmatter is the primary method for providing document metadata. It sits between --- delimiters at the top of the Markdown file.

Syntax

---
title: "Document Title"
author: "Author Name"
date: 2024
---

Supported Fields

Field Type Default Description
title string "Untitled" Document title
subtitle string None Optional subtitle
author string None Author's name
affiliation string None Department, institution
program string None Academic program name
ficha string None Course/sheet number (ERS format)
institution string None University or institution name
center string None Regional center name
instructor string None Instructor/advisor name
date string None Date (ISO format YYYY-MM-DD or year only)
short_title string None Short title for APA running head
author_id string None Student code (ICONTEC)
location string None City, Country (ICONTEC)
extra dict {} Any additional fields stored in metadata

Example: APA 7 Document

---
title: "Effects of Machine Learning on Healthcare Outcomes"
subtitle: "A Systematic Review"
author: "Jane M. Smith"
affiliation: "Department of Computer Science, MIT"
program: "PhD in Health Informatics"
institution: "Massachusetts Institute of Technology"
date: 2024-04-15
short_title: "ML IN HEALTHCARE"
keywords:
  - machine learning
  - healthcare
  - systematic review
---

Example: ICONTEC Document

---
title: "Título del Proyecto de Investigación"
subtitle: "Subtítulo si aplica"
author: "Nombre del Autor"
author_id: "Código de estudiante"
program: "Nombre del Programa Académico"
center: "Nombre del Centro Regional"
institution: "Nombre de la Universidad"
location: "Ciudad, País"
date: 2024
---

Example: ERS Document (Legacy Format)

For documents without YAML frontmatter, the preprocessor extracts metadata from the first 14 lines:

**Título del Proyecto ERS**

Nombre del Autor

Nombre del Programa

Número de Ficha

Nombre de la Institución

Nombre del Centro

Fecha

Special Syntax

Page Breaks

Insert a page break using raw OpenXML blocks:

```{=openxml}
<w:p>
  <w:r>
    <w:br w:type="page"/>
  </w:r>
</w:p>
```

Page breaks are automatically inserted before every H1 (#) heading.

Running Head (APA 7)

The running head appears in the left header. Enable it by setting short_title in frontmatter:

short_title: "SHORT TITLE IN UPPERCASE"

The running head consists of "SHORT TITLE: " prefix followed by the short title in uppercase.

Keywords

Keywords in the document body use this format:

**Keywords:** word1, word2, word3

or in Spanish (ICONTEC):

**Palabras clave:** word1, word2, word3

Citations (BibTeX)

Use Pandoc-style citations with a BibTeX file:

The study found that... [@smith2024].

Multiple sources state this [@smith2024; @jones2023].

Table of Contents (TOC) Entries

Lines in the format 1. ... ... 5 (with dots and page number) are automatically escaped to prevent Pandoc from converting them to ordered lists:

Resumen ..................................................... 3
Objetivos ................................................... 4

These become:

1\. Resumen ..................................................... 3
2\. Objetivos ................................................... 4

Multiline Tables

Pandoc multiline tables (dashed-line format) are automatically converted to pipe tables for better compatibility:

Input (Pandoc multiline):

+------------+------------+
| Header 1   | Header 2   |
+============+============+
| Cell 1     | Cell 2     |
+------------+------------+

Output (pipe table):

| Header 1 | Header 2 |
| --- | --- |
| Cell 1 | Cell 2 |

Box-Drawing Characters

ASCII art using box-drawing characters is preserved:

┌─────────────────┐
│                 │
│   Content here  │
│                 │
└─────────────────┘

Supported Characters

NormaDocs supports full UTF-8 character encoding including:

Spanish Characters

Character Example
ÁÉÍÓÚ Ü Ángeles, Pérez, María
áéíóú ü cámara, número, mínimo
Ñ España, Niño
ñ araña, dueño
¡ ! ¡Advertencia!, ¡Hola!
¿ ? ¿Qué tal?

Latin-1 Supplement

All Latin-1 (ISO 8859-1) characters are supported, including:

  • French: àâäçéèêëîïôùûüÿœæ
  • German: äöüß
  • Portuguese: ãõñç
  • Italian: àèéìíòóù
  • Scandinavian: åæøÅÆØ

Special Symbols

  • Degree symbol: °
  • Plus-minus: ±
  • Multiply: ×
  • Divide: ÷
  • Bullet: •
  • Trademark: ™ ® ©
  • Currency: € £ ¥ $

Math Symbols

  • Greek letters: α β γ δ ε λ μ π σ Ω
  • Operators: ≤ ≥ ≠ ≈ ∞ ± − × ÷
  • Sets: ∈ ∉ ⊂ ⊃ ∪ ∩

Code Image Generation (Optional)

NormaDocs can transform code blocks into beautifully styled images for academic documents. This uses the monokai theme by default with syntax highlighting.

Syntax

Add {code} after the language identifier to mark a code block for image conversion:

```python {code}
def hello():
    print("Hello, World!")
### Supported Languages

Pygments supports 300+ languages including: `python`, `javascript`, `typescript`, `java`, `c`, `cpp`, `csharp`, `go`, `rust`, `ruby`, `php`, `swift`, `kotlin`, `scala`, `sql`, `html`, `css`, `json`, `yaml`, `xml`, `bash`, `shell`, `markdown`, `latex`, and many more.

### How It Works

1. **Detection**: The preprocessor identifies `{code}` marked blocks
2. **Highlighting**: Pygments generates styled HTML with syntax highlighting
3. **Rendering**: imgkit converts the HTML to a PNG image
4. **Replacement**: The code block is replaced with a markdown image reference
5. **Embedding**: Pandoc embeds the image in the final DOCX

### Installation

```bash
pip install normadocs[codeimage]

This installs pygments>=2.17.0 and imgkit>=1.0.0.

Note: imgkit requires wkhtmltopdf system library:

sudo apt install wkhtmltopdf      # Debian/Ubuntu
sudo dnf install wkhtmltopdf      # Fedora
brew install wkhtmltopdf          # macOS

Example

Input:

# Results

The implementation follows this pattern:

```python {code}
class DataProcessor:
    def __init__(self, config: dict):
        self.config = config
        self.cache = {}

    def process(self, data: list) -> list:
        return [self.transform(item) for item in data]
```

Output: A styled PNG image with syntax-highlighted Python code embedded in the DOCX.

Theme

Code images use the monokai theme (dark background with colorful syntax highlighting). This provides excellent contrast and readability for printed documents.

File Output

Generated images are saved to ./code_images/ directory next to the output DOCX, with names like: - code_image_001_python_a1b2c3d4.png - code_image_002_javascript_e5f6g7h8.png

Images are cached based on content hash, so repeated conversions of the same code don't regenerate the image.

Preprocessing Features

Hard-Wrapped Line Joining

The preprocessor joins consecutive lines that were hard-wrapped at approximately 72 characters into single paragraphs. This is transparent to the user.

Before:

This is a very long paragraph that was originally
wrapped at 72 characters and needs to be joined
back together for proper formatting.

After:

This is a very long paragraph that was originally wrapped at 72 characters and needs to be joined back together for proper formatting.

HTML Entity Encoding

The # character in titles is encoded as &#35; to prevent Pandoc from interpreting it as a heading marker:

Title with # hashtag

becomes safe content.

Markdown Extensions

OpenXML Blocks

Raw OpenXML can be inserted using the openxml output format:

```{=openxml}
<w:p>
  <w:r>
    <w:br w:type="page"/>
  </w:r>
</w:p>
```

Attributes on Blocks

Pandoc attributes syntax is supported for images and other elements:

![Caption](image.png){width=50%}

Validation

LanguageTool Integration

NormaDocs can check grammar using LanguageTool:

# Basic check
normadocs document.md --language-tool

# Strict mode (fail on any error)
normadocs document.md --language-tool --lt-strict

# With Docker (auto-starts container)
normadocs document.md --language-tool --lt-docker

# Save errors to file
normadocs document.md --language-tool --lt-report errors.json

Standards Comparison

Feature APA 7th ICONTEC IEEE 8th
Body font Times New Roman 12pt Arial 12pt Times New Roman 10pt
Line spacing Double 1.5 Single
Top/bottom margins 1" 3cm 1"
Left margin 1" 3cm 1"
Right margin 1" 2cm 1"
Table caption prefix Table Tabla Table
Figure caption prefix Figure Figura Fig.
Running head Yes No No
Abstract page No Yes No
Keywords section Yes Yes Yes