By Caroline Scharf & Uli Foessmeier , Tom Sawyer Software | October 12, 2016 Tom Sawyer Software is a Silver sponsor of GraphConnect San Francisco . Meet their team on October 13-14th at the Hyatt Regency SF.
The Offshore Leaks Database Challenge
ThePanama Papers investigation and resulting Offshore Leaks database present an interesting challenge for investigators.
If you’re not familiar with this investigation, it was led by the ICIJ – The International Consortium of Investigative Journalists – to expose the people behind companies and trusts incorporated in tax havens. While some offshore entities and trusts are legitimate, their anonymous nature more easily facilitates money laundering, tax evasion, fraud and other crimes. For more information about the Offshore Leaks database, visit offshoreleaks.icij.org .
The Offshore Leaks database contains more than 320,000 entities and often times duplicate entries. Navigating the massive amount of information, visualizing it in a format that can be digested and understood, and knowing what clues to look for are all unique challenges for anyone using this database.
Tom Sawyer Software specializes in helping businesses rapidly build sophisticated enterprise graph and data visualization applications to help make sense of and analyze their Big Data, such as the volume of information in the Offshore Leaks database.
In this first of two articles, we walk you through our Panama Papers example application, built with our flagship product Tom Sawyer Perspectives. We discuss two scenarios that can help you make sense of the Offshore Leaks data, so you can focus your investigation on suspicious people and companies, spot areas of potential fraud and make connections.
Using Tom Sawyer Perspectives to Focus Your Investigation
When you begin an investigation, you may know the person or network of people you want to investigate, such as a well-known political figure or celebrity, or you may know several individuals who you suspect are connected, or the name or address of a company.
In the first example scenario, we want to dive a little deeper into Vladimir Putin’s inner circle, but searching the Panama Papers data for “Putin” yields no results. Instead, we search for one of Putin’s advisers, Sergey Roldugin , which finds two people with the same name and a third person with the same first and last names, but including a middle name. Data integrity is common in this database, so we included a feature in our example application to automatically merge nodes with identical names, and the ability to manually merge nodes.
After merging the three nodes, we see the number “7,” which indicates that there are seven connections between Roldugin and other entries in the database. Using the Load Connections feature, we expand the network to see a graph of these relationships.
We continue to load more and more connections as we look for clues. We chose to exclude connections of intermediaries from our graph visualization because they typically have many connections and can clutter our diagram. It also seems doubtful that intermediaries and their connections would lead to any factual connections between two companies simply because both were created by the same intermediary. So we continue focusing on connections between people, companies and addresses.
In expanding the network in this way, we begin to notice that there are two distinct groups of connected entities which our Symmetric layout has helped to highlight. The graph shows there is only one connection between these two groups. That seems like a good area to investigate a little deeper.