Hallucinations and other experiences using AI to scrape data from court documents
Excerpts of my interactions with Claude Pro
For more than two years, I’ve been collecting data about drug trafficking, human smuggling and other border-related cases from federal court documents. The goal is to accumulate a large enough sample size to answer questions such as:
How much money are migrants paying to be smuggled into the U.S.? How has that figure changed over the years?
What’s the average amount of meth that smugglers typically have on them during a drug bust at the San Ysidro Port of Entry?
And other interesting data points that can help bring a little more transparency to what’s happening along the U.S./Mexico border, particularly in California.
My main challenge has been the time-consuming nature of manually reading through each court document (usually a federal complaint that lays out the allegations against the person who faces charges) and type each data point into a Google Form I created.
That form feeds into a spreadsheet with 41 columns of data, so it can take a good 10-15 minutes to make sure I’m extracting everything I need. (Not all 41 columns are applicable to each type of case, so it’s not that bad.)
With artificial intelligence, I saw an opportunity to start automating that process. Instead of 10-15 minutes per document, I can have AI scrape the data I need from multiple documents at a time within minutes!
Then I learned firsthand about “hallucinations.”
Keep reading with a 7-day free trial
Subscribe to luke harold's substack to keep reading this post and get 7 days of free access to the full post archives.