Scott Murray

Tools for Extracting Data From PDFs

Last updated 2015 September 10

It used to be that once data was published in PDF form — such as on a government website — it was as good as dead. Fortunately, lots of smart people have been developing new tools to help use extract tables of data from PDF and export it in structured, usable formats (like CSV).

Here are the tools I’ve found to be useful. Results may vary as each tool has its own strengths and weaknesses; try them all to see what works best for your document. (If you know of others, please let me know.)

For those curious why it’s so difficult to pull data out of PDFs, you might enjoy this read from ProPublica.



PDF Converter
Free, but limited to 2 pages and 10 files total, with a 30 minute delay for processing

Nitro Cloud
Convert 5 documents for free

Free download for Windows, Mac, Linux; also see “Introducing Tabula