Blog

Spatial algorithms under the hood: Ramer-Douglas-Peucker

It has been a week since I started Recurse Center and it has been absolutely inspiring so far! I mostly spent my time exploring ideas and pairing with others on smaller coding challenges. This lead me to start working on something I wanted to do for a while: There is an immense toolket out there for Geographers and Urban Planners enabling them to perform methods like cartographic generalization or interpolation with the click of a button. But how do these methods work under the hood?

I want to use the next couple of weeks to explore some of these methods in more detail and compile them in a toolkit, starting with the Ramer-Douglas-Peucker algorithm. The Ramer-Douglas-Peucker algorithm (RDP) is used in cartographic generalization to simplify vector lines.

rdp.jpg

The function for the algorithm takes in an array of points (i.e. the line we want to simplify) as well as a "threshold" variable called epsilon. It starts by connecting the first and the last point of the original line with a "baseline". It then finds the point that is furthest away from that baseline. If that point is greater than epsilon, it will be kept and the function continues to recursively split the line into two segments and repeat the procedure. If the point is nearer from the baseline than epsilon, then all the points along the baseline can be discarded as they will be smaller than epsilon too. The output will be an array containing only those points that were marked as kept. Implemented in Python, the RDP function looks like this:

def RDP(line, epsilon):
   startIdx = 0
   endIdx = len(line)-1
   maxDist = 0 #var to store furthest point dist
   maxId = 0 #var to store furthest point index

   for i in range(startIdx+1,endIdx):
    d = perpendicularDistance(line[i], line[startIdx], line[endIdx])
    if d > maxDist:
      maxDist = d #overwrite max distance
      maxId = i #overwrite max index

   if maxDist > epsilon:
    l = RDP(line[startIdx:maxId+1], epsilon)
    r = RDP(line[maxId:], epsilon)

    results = np.vstack((l[:-1], r))
    return results

   else:
    results = np.vstack((line[0], line[endIdx]))
    return results

As you might have noticed, I use a perpendicularDistance() function to calculate the perpendicular distance from a line to a point. I did have to brush up on my Trigonometry knowledge there, but if you want to know how to calculate the distance or learn more about the Time Complexity, check out the Github Repo. If you would like to peek under the hood of a particular spatial method, feel free to reach out!

melanie imfeld