Bulk PDF to Excel XLSX/CSV Converter - Textricator Tutorial

0:00 Look at PDF files
1:26 Textricator
3:52 Setting up Java (if missing)
6:26 Combining PDF files with PdfSam Basic
7:37 Extracting list of all PDF fields to MS Excel, LibreOffice Calc
8:37 Understanding the spreadsheet
9:49 YML (YAML) Textricator config file
11:48 Writing a "working" configuration
16:21 States in the finite-state-machine
22:53 Extracting Invoice ID from PDF files
25:10 How to extract Date from multiple PDF pages
29:29 Matching positions of text in PDF files
34:21 Price sum and tax PDF matching by position
35:44 PDF files extracted to XLSX or CSV

1. prep

get textricator https://textricator.mfj.io/ or rather https://github.com/measuresforjustice/textricator

'java' is not recognized as an internal or external command,
operable program or batch file.

Java 8 ain't enough:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: io/mfj/textricator/cli/TextricatorCli has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

get portable java https://portapps.io/app/oracle-jdk-portable/#download
get pdfsam basic https://pdfsam.org/download-pdfsam-basic/

Textricator errors:

Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

Exception in thread "main" com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance of `java.util.LinkedHashMap` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('invoiceid invoiceid')

Exception in thread "main" java.lang.Exception: Page 1 at "Invoice" - no valid transition from INIT

Exception in thread "main" java.lang.Exception: No type containing value type "INIT"

Exception in thread "main" java.lang.Exception: did not reach EOF

Project files: press the top right "Code" button and then "Download ZIP" at https://github.com/qubodup/TextricatorSample/tree/main
Be the first to comment