Many organisations have large datasets whose entities are records, perhaps records of transactions. In some settings, such as detecting credit-card fraud, sophisticated sets of rules have been developed to decide which records deserve further attention as potentially fraudulent. What does an organisation do, however, when it has a large dataset like this, hasn’t developed a model of what “interesting” records look like, but would still like to focus attention on “interesting” records — usually because there aren’t enough resources even to look at all of the records individually.

One way to decide which records are interesting, is to label records as uninteresting if there are lot of other records like them. I have developed ways to rank records by interestingness using this idea.

So when the Sydney Morning Herald published a dataset of Australian defence contracts (700,000 of them) I thought I would try my approach. The results are interesting. Here are the most unusual records from this ranking (the columns are contract number, description, contracting agency, start date, end data, amount, and supplier):

1. 1217666,REPAIR PARTS,Department of Defence,16-October-2002,,5872.52,L

This one comes at the top of the list because the supplier name is unusual, only a single letter.

2. 1120859,Supply of,Department of Defence,15-May-2002,,0,C & L AEROSPACE

This one has a very short description and an amount of $0.

3. 854967,EARTH MOVING EQUIPMENT PARTS FOR REPAIR,Department of Defence,21-May-2002,,2134.05,439

Unusual because the supplier name is a number

4. 956798,PRESSURE GAUGE (WRITE BACK SEE ROSS DAVEY),Department of Defence,11-September-2002,,1,WORMALD FIRE & SAFETY

Unusual because of the extra detail in the description and the cost of $1

5. 1053172,5310/66/105/3959.PURCHASE OF WASHER FLAT.*CANCELLED* 29/04/03,Department of Defence,12-February-2003,,0,ID INTERNATIONAL

Unusual because of the dollar value, and the unusual description because of the cancellation

6. 868380,cancelled,Department of Defence,14-June-2002,,0,REDLINE

Unusual again because of the description and dollar value

7. 1043448,tetanus immunoglobulin-human,Department of Defence,10-January-2003,,1,AUSTRALIAN RED CROSS

Unusual because of the low dollar value

8 1014322,NATIONAL VISA PURCHASING,Department of Defence,18-October-2002,,26933.99,NAB 4715 2799 0000 0942

Unusual because the supplier is a bank account number (and so numeric); also a largish dollar value

9. 1023922,NATIONAL VISA PURCHASING,Department of Defence,18-September-2002,,25586.63,NAB 4715 2799 0000 0942

Same sort of pattern as (8) — globally unusual but similar to (8), note the common date

10. 968986,COIL RADIO FREQUENCY,Department of Defence,27-September-2002,,2305.6,BAE

Unusual because of the short supplier name and large dollar value

11. 887357,SWIMMING POOL COVER.,Department of Defence,07-May-2002,,7524,H & A TEC

Unusal supplier name and large (!!) dollar value — hope it’s a big pool

12. 1010554,NAB VISA CARD,Department of Defence,02-August-2002,,16223.19,NAB 4715 2799 0000 0942

Another numeric bank account number as supplier and large dollar amount

13. 1005569,Interest,Department of Defence,12-August-2002,,2222.99,NAB 4715 2799 0000 1494

And again

14. 925011,FLIR RECORDER REPPRODUCER SET REPAIR KIOWA,Department of Defence,16-August-2002,,1100,BAE

Shart supplier name, long description with unusual words

15. 1012869,NAB VISA STATEMENT,Department of Defence,22-August-2002,,12934.87,NAB 4715 2799 0000 0942

Another financial transaction

16. 1073019,NATIONAL VISA,Department of Defence,03-February-2003,,10060.16,NAB 4715 2799 0000 0942

And again

17. 969039,SUSPENDERS WHITE,Department of Defence,30-September-2002,,41800,ADA

Short supplier name and very large dollar amount (hopefully not just one suspender)

18. 1097060,Purchase of Coveralls Flyers Lightweight Sage Green.,Department of Defence,11-February-2003,,18585.6,ADA

Again short supplier name and large dollar amount

959232,SUPPLY OF COATS AND TROUSERS DPDU,Department of Defence,23-September-2002,,1032350,ADA

Again short supplier name and very (!!) large dollar amount

Clearly the process is turning up example records that seem to be quite unusual within this large set, and might sometimes be worth further investigation.

This technique can be applied to any record-based data. As well as providing a version of the data ranked by interestingness, it also provides a graphical view of the data, and some indication of what the density of unusual records is compared to ordinary records. As the example shows, what it also often turns up are technical problems with the way that the data was collected, since mistakes in fields are records with the wrong fields, or with fields in the wrong place will usually turn up as anomalous.Some of the top records are there not because they are really unusual (probably) but because something went wrong with the capture of the supplier names. So it can be used for quality control as well.

ans =

1 23

ans =

1 6

ans =

1 23

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 5

ans =

1 6

ans =

1 61

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 8

ans =

1 6

ans =

1 11

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 24

ans =

1 6

ans =

1 20

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 16

ans =

1 6

ans =

1 8

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 24

ans =

1 6

ans =

1 20

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 4

ans =

1 24

ans =

1 6

ans =

1 25

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 24

ans =

1 6

ans =

1 17

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 25

ans =

1 6

ans =

1 26

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 18

ans =

1 6

ans =

1 26

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 32

ans =

1 6

ans =

1 25

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 18

ans =

1 6

ans =

1 82

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 9

ans =

1 18

ans =

1 6

ans =

1 32

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 25

ans =

1 6

ans =

1 43

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 12

ans =

1 6

ans =

1 21

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 37

ans =

1 6

ans =

1 21

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 15

ans =

1 6

ans =

1 21

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 15

ans =

1 6

ans =

1 38

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 44

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 25

ans =

1 7

ans =

1 18

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 24

ans =

1 7

ans =

1 37

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 15

ans =

1 7

ans =

1 23

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 31

ans =

1 7

ans =

1 33

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 4

ans =

1 32

ans =

1 7

ans =

1 65

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 29

ans =

1 7

ans =

1 79

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 34

ans =

1 7

ans =

1 27

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 21

ans =

1 7

ans =

1 26

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 24

ans =

1 7

ans =

1 38

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 17

ans =

1 7

ans =

1 27

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 21

ans =

1 7

ans =

1 44

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 25

ans =

1 7

ans =

1 22

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 99

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 29

ans =

1 7

ans =

1 21

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 4

ans =

1 25

ans =

1 7

ans =

1 5

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 29

ans =

1 7

ans =

1 22

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 18

ans =

1 7

ans =

1 77

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 5

ans =

1 19

ans =

1 7

ans =

1 30

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 20

ans =

1 7

ans =

1 31

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 30

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 24

ans =

1 7

ans =

1 8

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 9

ans =

1 11

ans =

1 7

ans =

1 8

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 11

ans =

1 7

ans =

1 14

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 79

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 34

ans =

1 7

ans =

1 9

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 1

ans =

1 15

ans =

1 7

ans =

1 29

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 23

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 20

ans =

1 7

ans =

1 22

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 20

ans =

1 7

ans =

1 77

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 6

ans =

1 19

ans =

1 7

ans =

1 35

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 31

ans =

1 7

ans =

1 21

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 7

ans =

1 29

ans =

1 7

ans =

1 15

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 9

ans =

1 20

ans =

1 7

ans =

1 44

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 25

ans =

1 7

ans =

1 8

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 9

ans =

1 11

ans =

1 7

ans =

1 99

ans =

1 21

ans =

1 11

ans =

1 0

ans =

1 8

ans =

1 29

ans =

1 7

ans =

1 8

ans =

1 21

ans =

1 11

ans =

1 11

ans =

1 9

ans =

1 11