PDFs are an indispensable (and unavoidable) part of modern scholar’s lives. Unfortunately, all to often, they are often responsible for slowing down the scholars, too. Far too many of the PDFs used by researchers, professors, students, and journals are improperly made or just not optimized, resulting in countless moments lost repeatedly rotating pages, transcribing passages out of PDFs, searching PDFs by eye for key words and phrases, and waiting far to long for articles to download and print (or worse waiting for someone else’s document to print).
To fix your PDF library, you need to acquire a PDF editor with OCR, Optical Character Recognition, capabilities. My choice to use Adobe Acrobat is threefold:
- Its available on both the Apple and PC platforms.
- Adobe’s relationship with higher ed, also means you should be able to find the software on many public computers in college and university media centers.
- Adobe offers aggressive academic discounts on their software (as of writing, Acrobat Pro is $119 for academic use and $499 for professional use).
The actual OCR process is surprisingly easy:
- Load the PDF in Adobe Acrobat Pro – note that this is a different program than the Adobe Acrobat Reader
- Under the Document menu, navigate to OCR Text Recognition
- Under that menu choose Recognize Text using OCR…
- Usually the default setting are enough for most people (I’ll discuss optimizing them in a moment), just be sure that All Pages is selected and click “ok.”
That’s it. Step away from Acrobat for a few minutes and get a cup of coffee or answer emails. When you come back, your PDF will now have a selectable text layer (and depending on the state it began, pages may have rotated into the correct orientation as well). Be sure to save the file, as the software does not do it automatically.
If you’re interested in the specific nuts and bolts of the ORC process, or want to learn how to further optimize your PDFs via OCR, I’ve written a short white paper that you can download (it’s a PDF with selectable text!).