About

About this project

The TRC reports are vital resources that weave together survivors' testimony and historical records into a full account of how the Canadian government and religious institutions carried out cultural genocide against First Nations, Métis, and Inuit people. They are essential reading for any Canadian, as well as anyone more generally interested in North American Indigenous history.

The volumes are available in print and as e-books and PDFs. The report is explicitly public domain, which means that the creators have waived copyright. Anyone can share it and put it in any format, for free. Therefore I've created web and plaintext versions that can be more easily read and shared on any device.

You can get a link to a specific section in the text by right-clicking or long-pressing the chain-link icon () beside the heading. Then you can copy and share the link however you like.

More projects like this

If you're wondering, "Wow! Where can I read more crucial reports in a lightweight non-PDF form?", check out Plain Text IPCC.

If you've done something like this, drop me a line and let me know.

How it's made

Tools

Any terminal application
poppler's pdfimages and pdftotext (included in poppler-utils in the Debian & Ubuntu repos)
pandoc
Any text editor (or a combination of multiple ones) with the following features:
- regex search-and-replace
- justifying text (that is, merging lines with no blank lines between them)
- easily formatting Markdown I have used Code (which has a Markdown shortcut extension), nano (primarily for its Justify feature, which Code lacks), and tilde.
A PDF viewer such as Evince
Regex101.com (for interactive testing of regexes)

This site was generated using Datenstrom Yellow with the Karlskrona theme. The stylesheet is based on Perfect Motherfucking Website.

Workflow

Extract images from the PDF into the "images" folder as PNG files: pdfimages -png Executive_Summary_English_Web.pdf images/img

Note: Some images are for some reason split into multiple parts; reassemble/screenshot if necessary. Also, you need to take screenshots of the graphs.
Convert the PDF to text with pdftotext -f [FIRST PAGE] -l [LAST PAGE] -y 60 -W 450 -H 700 -nopgbrk Executive_Summary_English_Web.pdf [file.md]

Note: The page numbers should be the actual page count, not the page numbers on the PDF (offset because of title pages, prefaces, etc.)

Note: Adding the option -layout keeps paragraph breaks but adds unnecessary hyphenation, so you either have to enter paragraph breaks manually or fix hyphenation manually. Either way, frankly, is a bitch. I conclude that the best way to solve this is to, while such a report is being prepared, yell at the people responsible and make them release it in a truly open, semantically marked-up format in the first place.
Format for Markdown. (This is if you're not using the -layout option.)
- Manually enter paragraph breaks.
- Move image captions so they don't interrupt paragraphs. Add images in Markdown format if you like: ![alt text](path/to/image.png)
- Uppercase abbreviations with this handy abbreviation list, with sed -f path/to/abbr.txt <file.md >new-file.md, or sed -i -f path/to/abbr.txt file.md if you're feeling adventurous.
- Mark up footnotes. (Double-check with Regex101.com if necessary to make sure # of matches = # of footnotes.)
  - Use special characters to escape and un-escape numbers that shouldn't/should be read as footnotes; I use ꙮ (U+a66e) and 👁 (U+1f441) respectively.
  - Footnote regex: s/(?<=[.?!"”’)👁])(\d+)/[^\1]/
  - Then remove the escape/un-escape characters: s/ꙮ|👁//g
- Justify rest of text in nano.
- Manual formatting/markup:
  - Mark up blockquotes with > and headings with #s.
  - Add italics, etc., as in original text.
  - Wrap the Calls to Action in later chapters in <aside> tags.
Add footnotes.
- pdftotext appropriate range in endnotes (starting on page 447), as above, to separate file.
- Fix line breaks in case of multi-paragraph endnotes, numbers at the beginning of lines, breaks in URLs, etc.
- Mark up endnote numbers with regex: s/(^\d{1,})\. /\n[^\1]: /
- Justify in nano and append to appropriate chapter.
Generate the table of contents with pandoc -f markdown --toc -s --wrap=none file.md -o new-file.md (as per this StackOverflow post.)

Note: This should also throw errors if there's a mismatch in the number of endnotes.

To do

Write and add alt text for images and graphs
Document how to generate plaintext version with pandoc
Document how to generate HTML version with pandoc (or how to use Markdown files in a static site generator, Codeberg Pages, etc.)