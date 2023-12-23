I’m given to believe that there’s a widespread, existential panic within the legacy news industry. The death of traditional advertising models, along with a shifting technology landscape that encourages anyone to publish anything, whenever and however they choose - described by Paul Wells as “disintermediation” - has disrupted everything. Well, perhaps not quite everything. Because there’s also a growing sense that, as a class, legacy media creators don’t represent the public they’re supposed to serve. So why tune in if it’s all about having annoying elitists nag us?

But if we all stop trusting or even consuming the old broadcasters and publishers, will we lose our connection to our civic environment? And who will fact-check and hold to account the government officials and departments that seek to oversee us?

Why not you and me? Why can’t citizen journalists leverage the incredible power of our data tools to automate the job? We’ve all got access to large language models like OpenAI’s GPT, and there’s no end of valuable data waiting to be discovered. And - to their credit - governments at all levels are regularly generating truckloads of public data to help us track their policies and actions.

This post is meant to illustrate the kind of digital investigative journalism that I think is possible. In fact, that’s the reason I built this publication in the first place.

Through the City of Toronto’s Open Data Portal, I downloaded two CSV datasets:

Reported Crimes from the Toronto Police Service

The key metric dashboard from the City of Toronto

I uploaded each dataset to ChatGPT-4 and began by asking some basic questions to confirm that GPT actually understood the data. For instance:

What percentage of the total counts ( COUNT_ ) of reported crimes was cleared ( COUNT_CLEARED )?

…To which the answer was 41.70%. And:

Which SUBTYPE had the lowest COUNT_CLEARED percentage?

“Auto Theft,” I was told, which took up approximately 10.37% of cases. Good luck getting your stolen SUV back. The SUBTYPE column, by the way, contains specific crime categories.

Helpfully, ChatGPT automatically provides the Python code it used to reach its conclusions. I was therefore able to confirm that the code made sense and then run it myself to compare results. I then asked some more complex questions that assumed the AI had noticed the dataset’s descriptive column names and their relationships to each other. Those were questions like:

Were reported crimes of all types more or less likely to be resolved ( COUNT_CLEARED ) in 2022 than in 2014?

…And:

What percentage of the total victim counts ( COUNT_ ) of reported crimes was cleared ( COUNT_CLEARED ) when the SUBTYPE was Break & Enter-House

GPT was able to provide accurate answers in each case. Of course, given that I’m perfectly capable of answering those questions myself using “old fashioned” Python data analytics tools, why do we need GPT at all? Well, let me show you the next level of question I asked (this time based on the Toronto Key Metrics Dashboard dataset):

Can you identify any more interesting insights or trends in the data that we haven’t yet discussed?

This gave GPT the space it needed to really shine. Here’s what it told me:

After analyzing the trends based on the ‘category’ and ‘year’ columns, several interesting insights emerge:

Trends by Category Over Years: The data shows the average and total values of the measures within each category for each year. This can provide an overview of how different aspects of the city’s operations and services have evolved over time. Significant Changes in Certain Years: For the ‘City Services’ category, there’s a notable increase in both the mean and total measure values starting from the year 2007. This suggests a substantial shift or expansion in the scope or scale of city services measured from that year onwards. Variation in Number of Measures Recorded: The ‘count’ column indicates the number of measures recorded per category each year. This varies across years, suggesting either changes in the number of metrics tracked or variations in data collection practices over time. Potential for Further Breakdowns: The data could be further broken down to identify specific measures within each category that have seen significant changes. This can provide more detailed insights into specific areas of city operations and services. Analysis Across Other Categories: Similar trends could be explored in other categories like ‘Community Vulnerability’, ‘Economic Development’, ‘Environmental Sustainability’, etc., to understand how different aspects of the city’s operations have been performing. To gain deeper insights, it would be helpful to conduct a more granular analysis of specific measures within each category, looking at how they have changed over the years and what factors might be driving those changes.

This kind of insight is important, because there’s no way I’m going to catch every interesting trend from a document containing many thousands of lines of data. And because this particular dataset is only one of many hundreds available from the City of Toronto website. And Toronto is only a single municipality out of many. And I have it on good authority that there are a couple of other levels of government that are generating their own data.

Being able to regularly feed all that data to an AI tool using a script for automation, means that there will be at least one pair of “eyes” on everything. Sure, AI will miss stuff, and a fair proportion of what it does get will be over-the-edge crazy. But at least we’ll be in the game, because our scripts can deliver reports in readily-consumable formats that will include content for us to audit and productive new avenues to explore.

The AI-based workflow won’t replace every shoeleather-intensive task. But it can considerably reduce the time and access necessary to dig deeply into at least some darkened doorways.

Just to see what might come out the other end, I prompted GPT with this request for each of the two datasets we’ve been discussing:

Based on your insights from the Reported Crimes CSV, could you write me a detailed article that highlights the findings and offers theories to explain the trends?

The two articles GPT gave me are printed below. The writing is dead boring. But the fact that it’s possible at all is nearly miraculous.

Article: Trends and Insights from Toronto’s Reported Crimes Data

Introduction

The city of Toronto, as one of North America’s most vibrant urban centers, faces various challenges in maintaining public safety and order. An analysis of the “Reported Crimes” dataset from the Toronto Police Service offers valuable insights into the patterns and trends in crime and law enforcement responses over recent years.

Key Findings

Divisional Differences in Crime Reports The data revealed significant disparities in crime report volumes across different police divisions. Division D32 had the highest number of reports at 2,960, indicating a possibly higher rate of criminal activity or better reporting mechanisms in the area. Trends in Crime Resolution In 2014, the percentage of crimes resolved ( COUNT_CLEARED ) was relatively high at 51.79%, but this rate dropped significantly by 2020, to 35.31%, and further to 32.39% in 2022. This trend suggests either an increase in the complexity of crimes, potential changes in police resources or strategies, or possibly a shift in the types of crimes being committed. Variations by Crime Type Among the subtypes of crimes, “Other Criminal Violations - Offensive Weapons” had the highest resolution rate at approximately 87.08%, while “Auto Theft” had the lowest at about 10.37%. This stark contrast could point to differences in the resources allocated to different crime types or the inherent difficulties in solving certain types of crimes. Impact of Crime Subtypes The data also showed that certain crime subtypes, such as “Break & Enter-House,” had a notably lower resolution rate (19.33%) compared to others. This might indicate the need for more focused strategies or resources in these areas.

Theoretical Explanations

Several theories can be proposed to explain these trends:

Resource Allocation and Prioritization: The variations in resolution rates across divisions and crime types might reflect the prioritization and allocation of police resources. High-resolution rates in certain areas or crime types suggest focused attention and resource allocation. Community Engagement and Reporting Mechanisms: The high number of reports in certain divisions could be attributed to better community-police engagement and more efficient crime reporting mechanisms in those areas. Evolving Crime Patterns: The decrease in overall crime resolution rates over the years might be due to evolving crime patterns, such as an increase in cybercrime or other complex crime types that are harder to solve with traditional methods. Technological and Methodological Advances: The high clearance rates in specific crime categories like offensive weapons violations might be attributed to technological advancements and improved investigative methods.

Conclusion

The insights from the Toronto Police Service’s “Reported Crimes” dataset paint a complex picture of the city’s law enforcement challenges and successes. While certain areas and crime types show high efficiency in resolution, others indicate potential areas for improvement. The evolving nature of crime and law enforcement strategies suggests that continuous adaptation and resource allocation are crucial for maintaining public safety and effectively addressing crime in Toronto.

Article: Navigating the Waves of Change: Insights from Toronto’s Dashboard Key Indicators

Introduction

Toronto, a bustling metropolis, is often hailed as one of the world’s most livable cities. However, beneath this accolade lies a complex web of metrics and measures that offer a deeper understanding of its ever-evolving landscape. The Toronto Dashboard Key Indicators provide a rich dataset that reveals trends, challenges, and triumphs in various aspects of city life. This article delves into these indicators, uncovering the story they tell about Toronto’s journey through time.

1. The Rise and Fall of Construction Values

A striking trend in the data is the substantial increase in the value of construction permits, both residential and non-residential. Over the years, there has been a significant surge, suggesting a booming construction sector. This increase could be attributed to rising housing prices, inflated construction material costs, and an uptick in development activities. The city’s growing population and economic expansion likely fuel the demand for more residential and commercial spaces. However, it’s essential to consider whether these increases reflect actual construction activity growth or are merely a byproduct of escalating costs and market dynamics.

2. Water Revenue Ebb and Flow

Another intriguing insight is the marked improvement in water revenues billed for all water customers. This upward trend could be explained by a combination of factors: rate increases, enhanced efficiency in billing and collection processes, or an increase in water usage due to population growth. It’s an indicator of not just economic factors but also of changing consumption patterns and city management strategies.

3. The Recycled Materials Revenue Conundrum

On the flip side, the dataset shows a troubling decline in revenue from the sale of recycled materials. This downturn might be a consequence of fluctuating global market prices for recyclables, changes in the composition of recycled materials, or even the impact of international policies like China’s National Sword policy, which restricted the import of certain types of recyclable materials. This decline raises questions about the sustainability and profitability of current recycling programs.

4. Library Metrics: A Reflection of Changing Times

The data indicates a decrease in library materials circulation and visits, which could be symptomatic of broader societal shifts. With the digital revolution, there’s a possible transition from physical books to digital media. The decline could also reflect changes in public habits or possibly budget constraints impacting library services.

5. Food Bank Usage: A Measure of Community Vulnerability

An alarming trend in the data is the increase in food bank usage, signaling rising community vulnerability. This uptrend might be due to various socio-economic factors, including income inequality, unemployment, or insufficient social safety nets. It underscores the need for more robust community support systems.

Conclusion

The Toronto Dashboard Key Indicators offer a multifaceted view of the city’s health, growth, and challenges. They serve as a reminder that behind every statistic is a story of human experiences, policy impacts, and economic forces. Understanding these trends is crucial for policymakers, businesses, and citizens alike, as they navigate the complexities of urban life and strive towards a sustainable and prosperous future.

This analysis, while comprehensive, merely scratches the surface. It invites a deeper exploration and continual monitoring to adapt strategies that align with the city’s evolving needs and aspirations. In the dance of numbers, Toronto’s story continues to unfold, one indicator at a time.

