### Crime in Chicago

The Chicago Police Department makes details of all of its incidents available. For each one, there’s a record describing what kind of incident and crime it was, where it took place (thinly anonymized), and when it happened.

This data is available for more than a decade’s worth of crimes (a big file!) but I’ve used one subset of just over a month as a working dataset. What sorts of things can be learned from such data? One research project looked at seasonal patterns, and discovered that there are strong and consistent patterns over time.

I’m more interested in this dataset as a publicly available example of the kind of information that might be collected about terrorist incidents, IED explosions, and the like. Such data is of mixed type: some fields are numeric, others (times and dates) are cyclic, and still others are textual. So it’s not straightforward to apply knowledge discovery algorithms to them.

In what follows, I’m using a hashing technique to deal with the non-numeric fields, z-scoring to treat variation in each attribute as equally significant, and singular value decomposition to project the data into lower dimensions and to visualize it. To help understand which attributes are making an interesting difference, the resulting plot can be overlaid so that the colour of each point corresponds to the value of that particular attribute for the record corresponding to that point. (In an earlier post, I did a similar analysis for the complete START set of terrorist attacks.)

Incidents are labelled with the physical coordinates of where they took place, so one way to visualize the data is to plot each of the attributes against position. Here is a figure showing the distribution on incidents in space, labelled their FBI crime descriptor:

There are patterns to be seen, but they are weak and hard to pick out.

The advantage of clustering using singular value decomposition (SVD) is that we can see the effects of all of the attributes on which incidents resemble others, without having to know anything in advance about the signficance of any one of them.

Here’s the clustering derived from SVD:

Clustering of all incidents derived from SVD

It’s clear from this figure that there are clusters, i.e. not all crimes are one-offs, nor are all crimes essentially the same. But what properties of these incidents accounts for the 7 or so strong clusters? That’s where being able to overlay single attributes on the clusters can help.

For example, if we overlay time of day on the clustering like this:

Clusters overlaid with time of day

we see that time of day is somehow orthogonal to the clustering — it has some relevance, but each of the clusters has the same internal structure with respect to time. So this doesn’t help to explain the clustering but it does suggest that there is deeper structure that is connected to time.

On the other hand, if we overlay the clustering with whether or not it led to an arrest:

Clustering overlaid with arrests

we see that arrested or not plays a major differentiating role in the clustering. Similarly, if we overlay whether or not the incident was domestic:

Clustering overlaid with domestic or not

we see that this also makes a big difference in the clustering.

If we overlay with the primary description of the incident:

Clustering overlaid with the primary classification of the incident.

then we see that another part of the clustering is explained. Notice that, while arrested or not varies from top to bottom, incident classification varies from left to right. In other words, macroscopically at least, there is no correlation between type of crime and arrest rate — they vary in an uncorrelated way — which is a good thing.

There’s a similar structure in the secondary crime description attribute:

Clustering overlaid with the secondary crime classification

We can get another sense of which attributes are driving the clustering by plotting the attributes, produced from the same SVD. In these plots, points far from the centre are more significant than those close to the centre, and those in the same direction from the centre are correlated. Furthermore, points representing incidents are “pulled” towards attributes for which they have large values. So this plot provides a sense of which attributes play the largest role in differentiating incidents, and how they fit together.

Variation among attributes aligned with variation among incidents

In fact, we can plot incidents and attributes in the same plot to make the connections obvious:

The relationship between attributes and incidents

This dataset also shows how difficult some of the problems of anomaly detection are. Suppose we wanted to answer the question: which incident was the most unusual in this dataset? The SVD provides a theoretically well-motivated answer: the one whose representative point is farthest from the origin. However, looking at the clustering, this theoretical answer seems rather weak.