I wanted to see if I could cut the amount of paper lying around in my home, and if I could bring some sort of order to this chaos. Every week I get a collection of semi-important documents from council tax statements to water bills. All of which I probably need to store but as I don’t want to rent a storage unit, something needs to be done with them.
Scanners and MFPs will let you scan and store to your computer with relative ease; however even if you are sensible with filing these scans e.g.:
And you name them something intelligible, you are still going to find yourself hunting for the right file at the right time.
I’ve built myself a workflow that takes the scans I make and automatically runs OCR (Optical Character Recognition) against them to rewrite them with the machine legible text as part of the PDF. This means if I run a Spotlight search or search for the text in Dropbox I’m much more likely to get something useful back.
So, this post is a bit of a tutorial on how to automate a tedious workflow of scanning, OCR and then filing documents. I’ve built this for my purposes but no doubt could be adapted for a professional workflow.
You will need, the HP Envy 7640 which has a built-in Automated Document Feeder which can let you scan multiple pages in one job, it’s also smart enough to scan double-sided in order. Secondly, Prizmo 3 which is an incredibly capable OCR tool. Thirdly, Hazel which is able to watch directories for file changes and then take actions. Finally, Dropbox♠ as the place to store these files so they can be accessed later.
Install the printer, buy Prizmo 3 (upgrade it) and Hazel.
Create a directory for scans to be collected from to then be processed.
Create a directory for scans to be sent once processed
/Users/myusername/Dropbox (Personal)/Scans (Processed)
Install the Automator workflow that will OCR the files and output them to a directory of your choosing.
Download this Automator workflow here
You have to add three Hazel rules, and in the correct order, to let the files be processed and tidied up correctly. This first rule Scan OCR Processing grabs only PDFs which have not been processed yet. It then sends them to Prizmo via the Automaker workflow and tags the file as Processed.
We tag the file so that we don’t end up in a loop where it keeps getting sent to Prizmo to be OCR’d. We then ask Hazel to trash any file with the tag Processed. (The tag is applied after the workflow completes which stops it being deleted midway through!)
The Hazel rules are here, and you need to apply them to your Scans folder.
Relax in the knowledge that all your paper correspondence is safe in your Dropbox.
Now you just need to file them… haha
♠ I work for Dropbox, they are great.