Nearly everyone in the UK knows by heart the best path to take them over to their favorite public house. But what about jotting down the shortest route to visit every pub in the country and return home safely? That is what we set out to do.
Okay, maybe every pub is overstating the goal. Pubs in the UK are closing shop or starting up, fresh and new, all of the time. Any route would be out-of-date by the time it was created. So we set a more modest goal: find the shortest route to visit some 24,727 stops found on the great Web site Pubs Galore - The UK Pub Guide.
This is a concrete target. But still an overstatement. Only a real local could possibly know every shortcut, slipping between buildings and along dark allies, to find the absolute best way to reach The Fiddler's Elbow or The Bald Faced Stag. This is well out of reach for a humble team of mathematicians.
Here we rely on the fantastic service provided by Google Maps . Ask Google for the shortest way to walk from The Elbow over to The Stag and it will respond with excellent step-by-step directions. The level of detail covered by Google Maps is amazing.
So this is our challenge. Using geographic coordinates of 24,727 pubs provided by Pubs Galore and measuring the distance between any two pubs as the length of the route produced by Google Maps, what is the shortest possible tour that visits all 24,727 and returns to the starting point?
Well, almost. We need to make one final assumption. It sounds like something only a mathematician would consider, but we have to assume that the route Google suggests for walking between The Fiddler's Elbow and The Bald Faced Stag is no shorter than the route a smart crow would fly. This makes it conceivable to solve the problem without actually asking Google for the distance to travel between each pair of pubs, an important consideration since there are 305,699,901 pairs and Google puts a cap of 2,500 distance requests per day.
This is the problem we have solved. The optimal tour has length 45,495,239 meters. To be clear, our main result is that there simply does not exist any pub tour that is even one meter shorter (measuring the length using the distances we obtained from Google) than the one produced by our computation. It is the solution to a 24,727-city traveling salesman problem (TSP).
The UK Pubs tour is easily the largest such road-distance TSP that has been solved to date, having over 100 times more stops than any road-distance example solved previously by other research groups.
The Big Picture
The work was carried out over the past two years. We, of course, did not have in mind to bring everything mathematics has to bear in order to improve the lot of a wandering pub aficionado. Rather, we use the UK pubs problem as a means for developing and testing general-purpose optimization methods. The world has limited resources and the aim of the applied mathematics fields of mathematical optimization and operations research is to create tools to help us to use these resources as efficiently as possible.
For general information on mathematical modeling and its impact on industry, commerce, medicine, and the environment, we point you to a number of societies that support mathematics research and education: American Mathematical Society , Mathematical Association of America , Mathematical Optimization Society , INFORMS (operations research), London Mathematical Society , and SIAM (applied mathematics).
William Cook, Combinatorics and Optimization, University of Waterloo, Canada
Daniel Espinoza , Gurobi Optimization, USA
Marcos Goycoolea , School of Business, Universidad Adolfo Ibanez, Chile
Keld Helsgaun , Computer Science, Roskilde University, Denmark
It is not easy to convey the structure and complexity of the optimal tour. A list of the 24,727 pubs, one after the other, in the correct order, resembles a good-sized phone book. Perhaps the best way to get a quick view is to look at a line drawing, where the solution is displayed without indicating the many destinations.
Click image for a larger view. You see that we obviously cannot walk several of the indicated routes: to reach the Isle of Man, Northern Ireland, and the islands of Scotland, the tour uses scheduled passenger-ferry routes provided by Google's direction services.
To show a detailed view, we make use of the Google Maps drawing tools to display an interactive version of the tour, where you can zoom in and pan from one region to another. The link is given below, but first a word of warning: the map contains a great deal of information and it can take a minute or so to load. We provide tips for using the map on thetour page.
Click image for an interactive map. If the map refuses to load for you, please have a look at thetour page for high-resolution screen shots, as well as further information about the route.
How do we know the tour is the shortest possible? Clearly we did not check every tour, one by one by one. Indeed, the first thing you learn about the TSP is that it is impossible to solve in this way. If you have N cities, then, starting from any point, you have N -1 possibilities for the second city. Then N -2 possibilities for the third city, and so on. The total number of tours is obtained by multiplying these values: N-1 x ( N -2) x ( N -3) x . . . x 3 x 2 x 1. Now this is a big number . For the pubs problem, it is roughly 1 followed by 100,000 zeroes. That is in an unimaginably large number of possibilities. Even for 50 cities, the world's fastest supercomputer has no hope of going through the full count of tours one by one to pick out the shortest.
But this by itself does not mean we can't possibility solve an example of the TSP. If you have 50 words to put into alphabetical order, you don't worry about the 50 x 49 x 48 x ... x 3 x 2 x 1 possible lists you could create. You just sort the words from first to last and build the one correct list among the huge number of possibilities.
For the TSP we don't know of any simple and fast solution method like we have for sorting words. And, for technical reasons, it is believed that there may be large, nasty TSP examples that no one can ever solve. (If you are interested in this and could use an extra $1,000,000, check out the P vs NP problem .) But if you need to plot a 50-point route for a holiday or to compute the order of 1,000 items on a DNA strand, then mathematics can help, even if you need the absolute shortest-possible solution.
The way to proceed is via a process known as the cutting-plane method . If you have twenty minutes to spare, there is a video explaining the method and how it is used to solve the TSP (in the pleasing voice of Siri). If you are in a hurry, here is how I try to describe the process in a short piece in Scientific American
The idea is to follow Yogi Berra's advice "When you come to a fork in the road, take it." A tool called linear programming allows us to do just this, assigning fractions to roads joining pairs of cities, rather than deciding immediately whether to use a road or not. It is perfectly fine, in this model, to send half a salesman along both branches of the fork.
The process begins with the requirement that, for every city, the fractions assigned to the arriving and departing roads each sum to one. Then, step-by-step, further restrictions are added, each involving sums of fractions assigned to roads. Linear programming eventually points us to the best decision for each road, and thus the shortest possible route.
Our pubs computation used a beefed-up version of theConcorde implementation of the TSP cutting-plane method. Even if you are in a hurry, you might want to see for yourself how the process solves smaller examples on an iPhone or iPad by downloading the free Concorde App .
In working with road data, we were faced with the additional challenge of finding the correct TSP solution even though we could not possibly ask Google for all 305,699,901 pub-to-pub distances. To handle this, we ran the cutting-plane method in tandem with a beefy variant of Keld Helsgaun's LKH code.
LKH combines a powerful local-search technique with a genetic algorithm to produce a high-quality tour, say of length U . Along the way, LKH discovers pairs of pubs that look promising to include in any short tour, so for these pairs we ask Google for the correct walking distances.
While this is going on, Concorde's cutting-plane method finds a fractional tour of value L . From the way this is constructed with linear programming, we know for sure that no TSP tour can have value less than L . During this process, Concorde also discovers pairs of pubs that look promising, in this case for fractional solutions, so we ask Google also for these distances.
Any new data obtained from Google is shared between LKH and Concorde, while both codes continue to look for better results. That is, we aim to decrease the value of U by finding better tours, and we aim to increase the value of L by adding further restrictions to the fractional linear-programming model. At any point, we know the optimal tour length is trapped between L and U , that is, we know the difference between the length of our tour and the length of an optimal tour is at most U - L . The name of the game is to reduce this gap U - L as quickly as we can.
Eventually, in the pubs computation, the algorithms inside LKH and Concorde became satisfied that they had an adequate collection of Google distances, LKH could find no further improvements in its tour, and the cutting-plane method in Concorde could only produce tiny improvements in the value of its fractional tour. At this point, we had L = 45,492,247 and U = 45,495,239, and thus a gap of 2,992 meters.
To finish off the problem, we then turned to Concorde's branch-and-bound search procedure. In this process, the collection of tours is repeatedly subdivided and the cutting-plane method is applied to the resulting TSP subproblems. The simplest form of the division is to select a pair of pubs, say The Black Dog and The Duke of Cornwall in Weymouth, and consider first only tours where the two pubs are visited consecutively, then consider only tours where, between the stops at The Dog and The Duke, we drop in on at least one other pub along the way. This selection divides the set of all tours neatly into two subsets.
In this this final phase of the computation, we processed a large, but manageable, collection of 4,231 subproblems. The total amount of computer time was 305.2 days on a single processor core of a Linux server. We didn't actually have to wait the full 10 months for the results, since we ran the search in parallel on up to 48 cores.
Click here to see a drawing of the search tree, where the position of a subproblem corresponds to the value of its fractional tour. For a closer look,here is a pdf file for the tree.
The computing time was reasonable enough to allow us to run the branch-and-bound phase a second time, using different settings to test the selection rule for the subdivisions. This second run processed a total of 5,687 subproblems in a total of 1381.1 days. Of course, it again produced the same optimal value of 45,495,239 meters.
If you are interested in creating your own local pub tour, the best bet for data is to go back to the original sources, Pubs Galore for locations and Google Maps for up-to-date walking distances. But the information provided by these sources changes over time. Therefore, to document the 24,727-stop TSP instance we have solved, we provide the raw data needed to reproduce the travel distances on thedata page.
Early computational studies focused on the most natural class of salesman problems: select an interesting group of cities, look up the point-to-point distances in a road atlas, and have a go at finding the shortest tour. Record-setting solutions were found by legendary figures in applied mathematics and computer science.
49 USA cities in 1954 by George Dantzig , Ray Fulkerson , and Selmer Johnson .
57 USA cities in 1970 by Michael Held and Richard Karp .
120 German cities in 1977 by Martin Groetschel .
The first reference, in particular, is widely viewed as the most important paper in the history of the broad fields of discrete optimization and integer programming. The links are to technical research papers. For lighter viewing, have a quick look at a slide show of these record-breaking results .
In the late 1970s, the focus switched to geometric examples of the TSP, where cities are points drawn on a sheet of paper and travel is measured by straight-line distances. The reasons were twofold. First, with over 100 stops it became difficult to obtain driving distances along road networks: printed road atlases included distances only for major cities. Second, there were classes of industrial problems that neatly fit into the geometric TSP setting. Indeed, the next world record, set in 1980 by Harlan Crowder and Manfred Padberg , consisted of locations of 318 holes that had to be drilled into a printed-circuit board .
Geometric TSP instances, arising in applications or from geographic locations, were gathered together in the TSPLIB by Gerhard Reinelt . This collection became the standard testbed for researchers. The largest of the instances, having 85,900 points arising in aVLSI application, was solved byApplegate et al. in 2006.
The geometric data sets are worthy adversaries, but the large industrial instances have points clustered into straight lines. These examples are punching below their weight, likely missing aspects of the complexity of the road TSP problems.