About
About this project
The TRC reports are vital resources that weave together survivors' testimony and historical records into a full account of how the Canadian government and religious institutions carried out cultural genocide against First Nations, Métis, and Inuit people. They are essential reading for any Canadian, as well as anyone more generally interested in North American Indigenous history.
The volumes are available in print and as e-books and PDFs. The report is explicitly public domain, which means that the creators have waived copyright. Anyone can share it and put it in any format, for free. Therefore I've created web and plaintext versions that can be more easily read and shared on any device.
You can get a link to a specific section in the text by right-clicking or long-pressing the chain-link icon () beside the heading. Then you can copy and share the link however you like.
More projects like this
If you're wondering, "Wow! Where can I read more crucial reports in a lightweight non-PDF form?", check out Plain Text IPCC.
If you've done something like this, drop me a line and let me know.
How it's made
Tools
- Any terminal application
- poppler's
pdfimages
andpdftotext
(included inpoppler-utils
in the Debian & Ubuntu repos) - pandoc
- Any text editor (or a combination of multiple ones) with the following features:
- A PDF viewer such as Evince
- Regex101.com (for interactive testing of regexes)
This site was generated using Datenstrom Yellow with the Karlskrona theme. The stylesheet is based on Perfect Motherfucking Website.
Workflow
Extract images from the PDF into the "images" folder as PNG files:
pdfimages -png Executive_Summary_English_Web.pdf images/img
Note: Some images are for some reason split into multiple parts; reassemble/screenshot if necessary. Also, you need to take screenshots of the graphs.
Convert the PDF to text with
pdftotext -f [FIRST PAGE] -l [LAST PAGE] -y 60 -W 450 -H 700 -nopgbrk Executive_Summary_English_Web.pdf [file.md]
Note: The page numbers should be the actual page count, not the page numbers on the PDF (offset because of title pages, prefaces, etc.)
Note: Adding the option
-layout
keeps paragraph breaks but adds unnecessary hyphenation, so you either have to enter paragraph breaks manually or fix hyphenation manually. Either way, frankly, is a bitch. I conclude that the best way to solve this is to, while such a report is being prepared, yell at the people responsible and make them release it in a truly open, semantically marked-up format in the first place.Format for Markdown. (This is if you're not using the
-layout
option.)- Manually enter paragraph breaks.
- Move image captions so they don't interrupt paragraphs. Add images in Markdown format if you like:
![alt text](path/to/image.png)
- Uppercase abbreviations with this handy abbreviation list, with
sed -f path/to/abbr.txt <file.md >new-file.md
, orsed -i -f path/to/abbr.txt file.md
if you're feeling adventurous. - Mark up footnotes. (Double-check with Regex101.com if necessary to make sure # of matches = # of footnotes.)
- Use special characters to escape and un-escape numbers that shouldn't/should be read as footnotes; I use ꙮ (U+a66e) and 👁 (U+1f441) respectively.
- Footnote regex:
s/(?<=[.?!"”’)👁])(\d+)/[^\1]/
- Then remove the escape/un-escape characters:
s/ꙮ|👁//g
- Justify rest of text in
nano
. - Manual formatting/markup:
- Mark up blockquotes with
>
and headings with#
s. - Add italics, etc., as in original text.
- Wrap the Calls to Action in later chapters in
<aside>
tags.
- Mark up blockquotes with
Add footnotes.
pdftotext
appropriate range in endnotes (starting on page 447), as above, to separate file.- Fix line breaks in case of multi-paragraph endnotes, numbers at the beginning of lines, breaks in URLs, etc.
- Mark up endnote numbers with regex:
s/(^\d{1,})\. /\n[^\1]: /
- Justify in
nano
and append to appropriate chapter.
Generate the table of contents with
pandoc -f markdown --toc -s --wrap=none file.md -o new-file.md
(as per this StackOverflow post.)Note: This should also throw errors if there's a mismatch in the number of endnotes.
To do
- Write and add alt text for images and graphs
- Document how to generate plaintext version with
pandoc
- Document how to generate HTML version with
pandoc
(or how to use Markdown files in a static site generator, Codeberg Pages, etc.)