I used Node.js scripts to extract research data from Word/PDF files and import it into Google spreadsheets. This helped the Foundation for Middle East Peace (FMEP) to more easily analyse and share the data with journalists, researchers, and activists.
For several years, FMEP has tracked efforts in the US to supress opposition to America's support for Israeli policies. Much of the data on state and federal legislation, however, was distributed in tables that were buried inside PDF files. As a result, the information was difficult for journalists and researchers to analyse.
I wrote Node.js scripts to parse and extract the data from these tables. Much of the data could be extracted through basic text parsing and regex scripts. However, some important information was expressed as colour-coding within the table. To extract this, I converted the document into a Google Doc and used googleapis to walk through the table cells in the document structure. This allowed me to compile the records into a CSV file that I could then import into a new Google Sheet.
Once I had extracted the information into a Google Sheet, I configured data validation rules to help FMEP audit and maintain the data. I created dropdown controls and colour-coded some of the columns, like whether or not legislation had passed into law, to make it easier to analyse and update the information. Data was extracted from four separate documents: two on anti-boycott legislation (state, federal) and two on attempts to outlaw criticism of Israel as antisemitism (state, federal).
Finally, I built a small site to act as a data portal for all of their resources on lawfare. The site provides quick access to all of the new spreadsheets, as well as additional resources hosted elsewhere, such as their podcast episodes.
FMEP is a small team and I didn't want them to have to learn and maintain a new software service. To avoid this, the site is configured and built directly from a private Google Sheet rather than a self-hosted content management system. When they want to add a new resource, they add a new row to the spreadsheet and trigger a rebuild of the site.
This setup makes it easy for FMEP to manage the site. At the same time, the technical infrastructure ensures that the service almost never suffers from downtime or poor performance. All of the assets are hosted on fast and secure servers, either through their organisation's Google Workspace account or Netlify's global "serverless" infrastructure. Most importantly, their team won't rely on me or an expensive digital agency to keep their site up and running.