Visualization gone serious
I blogged some weeks back on research I was doing around visualization of forensic data which was well received with some very interesting comments from readers (both of you!). However, the week after the posting I was asked to be involved in a prosecution of a man who was accused of various forms of grooming, sexual assault, voyeurism etc of several teenage girls in his community centre.
The case has now concluded and the man received 4 years prison, so a good result, however I wont name the case as I refer to the victims and they deserve as much anonymity as possible.
The case revolved around a large amount of Facebook chat between the accused and the girls, and between the girls themselves. Some of the chat was quite damning and on the face of it, it was clear that he was trying to talk the girls, one in particular, out of coming forward with what had been happening using emotional blackmail.
His defense on the Facebook chats was that the girls had logged in as him and had chats between themselves, implicating him in wrongdoing.
I was asked to consider the workings of Facebook, could they log in at the same time as him on a different computer, would he have a record on his own machine and what were the ‘relationships’ between the parties involved.
The word, relationships, got me thinking, could we visualize the data to ‘see’ the relationships and would it be easier for a jury to understand and interpret? Now, it is easy to map out Facebook ‘Friends’, the excellent Facebook Visualizer as well as the Facebook transform in Maltego will help with that task, but that doesn't really help us understand the activity that exists between those people. Although Im not much of a Facebook user I have load of buddies on Skype but some of them I haven't spoken to in years. Just because the accused and Girls A,B, and C were on each others Facebook lists and the fact that there was some chat doesn't ‘a relationship make’!
I used IEF 4(Internet Evidence Finder) to carve all the Facebook chats and fragments out of the 4 hard drives, it even did a great job on the accused’s Mac hard drive and I was left with 4 CSV files with thousands and thousands of chats. Now to make some sense of it.
I tidied up the CSV’s, removing some of the metadata that I didn't need and essentially just left the FROM, TO and the CHAT columns. Next I imported this data into Maltego as an Edge weighted graph. I expected this to cluster the chats around the person who made them and it worked better than expected.
Fig 1 shows the recovered chats on the accused’s computer and who he was talking to. Each orange dot is a person he has chatted with and the surrounding green dots are each individual chat. The primary cluster, centre left, is the accused with all his chats; being his machine we would expect this to be the largest cluster. As we can see there are many chats to many different people, however, our eye is quickly drawn to the 2nd largest cluster on the centre right. This is a person he talks to more than anyone. Rolling our mouse over the orange dot in the centre of the cluster, surprise, surprise, it is our 13 year old Girl B. The 3rd largest, at the bottom, is his best friend, but top right, Girl A.
Fig 1 |
This graph gives us an excellent tool, aside from just numbers and statistics as to who was important to him in a Facebook setting. The question, was this just a girl or girls with a crush, that it was one way traffic, is quashed by this graph, Girl B and Girl A are the 1st and 3rd most frequently communicated with persons on his extensive Facebook buddy list.
Encouraged by the success I did the same process on the machine of Girl B. This time, as there were many different chat partners I also removed the chats that only existed once or twice, the boy at school saying Hi, a friend inviting to a party etc, but which were not repeated with that person. The results in Fig 2 are fascinating:-
Fig 2 |
The primary cluster is of course Girl B herself, but no prize for guessing which cluster is the accused?? You’ve got it, the 1st next biggest cluster top left, in fact their chats are almost twice as many as any other person. Remember we are talking about a teenage girl here with lots of people to chat too and he was chatting with her more than twice as much as her best friends at school.
I then moved on to looking at the relationships with all those involved. I again used Maltego and imported all the chats from all the machines but removed the actual chat. This provided a link graph between the Girls and the accused and their friends, also showing connections between those friends. I will not present that graph as it includes the names of the persons involved but it showed the accused front and centre with chat connections with all the girls involved and showed the connections between those girls and their friends.
I felt this was very useful to a jury and so included it in my report to the prosecution barrister. It went on to form part of the jury pack so I can say that my graphs have made it to Court. Sadly, I was not called to give evidence on this occasion as the defense agreed all our findings and signed a statement to that effect. Shame really as I was looking forward to presenting this data in open Court and judging the reaction from a jury. Not that I am expecting wild applause and fist pumping whooping but it would be interesting all the same.
So far I’ve been using Maltego but have been given heads up of other free tools that might do the same job. The primary tool is Gephi, thanks @danmcquillan for the tip, a superb, free graphing application for Windows or Mac which supports many different output graphs. So far Im liking it, it takes a little more work pre-application as you need to define your Nodes and Edges for it to successfully graph the links. I’ve also had problems with the Preview and output elements which keep crashing, I need to pop a message on the forums really.
A Bump on the Node
Just for your information, the visualization industry seems to be dominated by research groups in Universities ‘visualizing’ everything that moves and then posting them on Youtube with no information about how it was done except the message ‘Arn’t we clever!’.
However, if you want to learn about it you appear to need the brain the size of planet, a doctorate in statistics and a student card. It is a very difficult area to start learning as a beginner. For example, search Google for - What are Nodes and Edges. Go on, try it. The top link is Wikipedia that presents you with a series of equations that make up graphing theory. Its a nightmare.
Anyway, for those of you out there with a shriveled 40-something brain like me, a Node is an element such as the person on my graphs and the Edges are the links between them.
Eg
I am Nick Furneaux.
My friends are Ed, Toby and Chris
I talk to Ed and Toby
I never talk to Chris
The Nodes are:-
Nick
Ed
Toby
Chris
The Edges are:-
Nick - Ed
Nick -Toby
The graph would show links between me and Ed and Toby but Chris would be an unlinked orphan node floating around the graph on his own. Sorry Chris.
Clear? Good.
Hear endeth the lesson!