Preserve Nostalgic Texts With OCR

Nostalgia and OCR
Nostalgia and OCR
Image by

Nowadays, most of the texts we write for school, work, college or privately are digital if not even online. Not too long ago, however, it was common to use a typewriter to put your thoughts, memories, and work to paper. Or to at least print the document and delete it afterward. That way, many old texts only exist on paper by now – just like a couple of short stories I recently found, typed by my grandmother.

Paper, however, is fragile. Ink fades, paper decays, let alone them being lost or destroyed while moving, in a house fire, etc. Thus, as a “digital native”, I felt the urge to digitize these nostalgic memories from my family’s past.

In case you are or get into a situation like this as well, allow me to show you an easy way on how I did this below.

How To Digitize Old Texts?

First, we have to take a look at the texts. In my case, they were typed up using a typewriter, which makes them perfect for digital preservation. Handwritten texts, unfortunately, cannot or very very rarely be correctly interpreted by so-called OCR (Optical Character Recognition) operations.

Improve The Scan Quality

The scans I made were of good quality. To ensure this, you can increase the contrast between text and background, either while scanning or afterward. Some letters were still a bit too faded in my case, but this can be corrected later on.

Some of my scans were also a bit crooked. This is not a problem for most OCR programs. If you want to make sure to get the very best quality, though, you can either scan them again or use a “deskew” operation on your file. Online Convert offers such a correction of misaligned scans when converting to PDF, for example.

Convert Your Scans Or Images To Text

With the preliminaries out of the way, all that is left is to extract the texts. To do so, you have a number of document formats to choose from. Following, I will introduce three different variants that are most relevant in this case.

Convert To TXT
TXT is a simple format that contains nothing but text. No formatting, no images. If you simply want to pull the text from a scan or image, this is your best option since the files are small and can be opened in any writing program.

Convert To Word
The advantage of Word documents is that the OCR operation will try to retain the formatting of the original as best as possible. This also includes graphics or images that are part of the scan or image. Converting to DOCX or DOC is perfect for users of Microsoft Word.

Convert To LibreOffice
If you want to retain formatting and images but are not using Microsoft Word, the ODT format is the right thing for you. Many open-source writing programs such as LibreOffice and OpenOffice support this format a lot better than Word documents.

Of course, you can also choose other formats such as HTML, RTF, or PowerPoint, but those above should prove the most useful.

Don’t Forget OCR

Once you uploaded your scans or images to one of the converters mentioned above, you should make sure that you are using the OCR option. On Online Convert, you can do so by ticking the box next to the operation down in the optional settings.

Check The Results

Technology is not infallible. Thus, it’d important to go through your converted documents to make corrections regarding unrecognized characters, line breaks, and the like.

The good thing is that reading through such nostalgic texts is actually fun and I wanted to read my grandma’s stories anyway.

Want to get in touch with us? Hit us up on our social media channels!