Pulling Your Weight in Your Civic and Political Community
Can citizen-journalists pick up the slack from a dying broadcast and print news industry?
I’m given to believe that there’s a widespread, existential panic within the legacy news industry. The death of traditional advertising models, along with a shifting technology landscape that encourages anyone to publish anything, whenever and however they choose - described by Paul Wells as “disintermediation” - has disrupted everything. Well, perhaps not quite everything. Because there’s also a growing sense that, as a class, legacy media creators don’t represent the public they’re supposed to serve. So why tune in if it’s all about having annoying elitists nag us?
But if we all stop trusting or even consuming the old broadcasters and publishers, will we lose our connection to our civic environment? And who will fact-check and hold to account the government officials and departments that seek to oversee us?
Why not you and me? Why can’t citizen journalists leverage the incredible power of our data tools to automate the job? We’ve all got access to large language models like OpenAI’s GPT, and there’s no end of valuable data waiting to be discovered. And - to their credit - governments at all levels are regularly generating truckloads of public data to help us track their policies and actions.
This post is meant to illustrate the kind of digital investigative journalism that I think is possible. In fact, that’s the reason I built this publication in the first place.
Through the City of Toronto’s Open Data Portal, I downloaded two CSV datasets:
Reported Crimes from the Toronto Police Service
The key metric dashboard from the City of Toronto
I uploaded each dataset to ChatGPT-4 and began by asking some basic questions to confirm that GPT actually understood the data. For instance:
What percentage of the total counts (
COUNT_
) of reported crimes was cleared (COUNT_CLEARED
)?
…To which the answer was 41.70%. And:
Which
SUBTYPE
had the lowestCOUNT_CLEARED
percentage?
“Auto Theft,” I was told, which took up approximately 10.37% of cases. Good luck getting your stolen SUV back. The SUBTYPE
column, by the way, contains specific crime categories.
Helpfully, ChatGPT automatically provides the Python code it used to reach its conclusions. I was therefore able to confirm that the code made sense and then run it myself to compare results. I then asked some more complex questions that assumed the AI had noticed the dataset’s descriptive column names and their relationships to each other. Those were questions like:
Were reported crimes of all types more or less likely to be resolved (
COUNT_CLEARED
) in 2022 than in 2014?
…And:
What percentage of the total victim counts (
COUNT_
) of reported crimes was cleared (COUNT_CLEARED
) when the SUBTYPE wasBreak & Enter-House
GPT was able to provide accurate answers in each case. Of course, given that I’m perfectly capable of answering those questions myself using “old fashioned” Python data analytics tools, why do we need GPT at all? Well, let me show you the next level of question I asked (this time based on the Toronto Key Metrics Dashboard dataset):
Can you identify any more interesting insights or trends in the data that we haven’t yet discussed?
This gave GPT the space it needed to really shine. Here’s what it told me:
After analyzing the trends based on the ‘category’ and ‘year’ columns, several interesting insights emerge:
Trends by Category Over Years: The data shows the average and total values of the measures within each category for each year. This can provide an overview of how different aspects of the city’s operations and services have evolved over time.
Significant Changes in Certain Years: For the ‘City Services’ category, there’s a notable increase in both the mean and total measure values starting from the year 2007. This suggests a substantial shift or expansion in the scope or scale of city services measured from that year onwards.
Variation in Number of Measures Recorded: The ‘count’ column indicates the number of measures recorded per category each year. This varies across years, suggesting either changes in the number of metrics tracked or variations in data collection practices over time.
Potential for Further Breakdowns: The data could be further broken down to identify specific measures within each category that have seen significant changes. This can provide more detailed insights into specific areas of city operations and services.
Analysis Across Other Categories: Similar trends could be explored in other categories like ‘Community Vulnerability’, ‘Economic Development’, ‘Environmental Sustainability’, etc., to understand how different aspects of the city’s operations have been performing. To gain deeper insights, it would be helpful to conduct a more granular analysis of specific measures within each category, looking at how they have changed over the years and what factors might be driving those changes.
This kind of insight is important, because there’s no way I’m going to catch every interesting trend from a document containing many thousands of lines of data. And because this particular dataset is only one of many hundreds available from the City of Toronto website. And Toronto is only a single municipality out of many. And I have it on good authority that there are a couple of other levels of government that are generating their own data.
Being able to regularly feed all that data to an AI tool using a script for automation, means that there will be at least one pair of “eyes” on everything. Sure, AI will miss stuff, and a fair proportion of what it does get will be over-the-edge crazy. But at least we’ll be in the game, because our scripts can deliver reports in readily-consumable formats that will include content for us to audit and productive new avenues to explore.
The AI-based workflow won’t replace every shoeleather-intensive task. But it can considerably reduce the time and access necessary to dig deeply into at least some darkened doorways.
Just to see what might come out the other end, I prompted GPT with this request for each of the two datasets we’ve been discussing:
Keep reading with a 7-day free trial
Subscribe to The Audit to keep reading this post and get 7 days of free access to the full post archives.