Generalising density functions using pygeos STRtree

In my post calculating spatial density in python generated three functions to generate different types of density measure. And in another post I tested the speeds of different nearest neighbour algorithms. These functions work, but onld for for point based geometries. In this post I’ll look at a more generalised set of functions to generate densities of any geometry in python using pygeos spatial index, or STRtree.

There is an ipython workbook here with all of the code used within this post.

After putting together the methods in my two previous posts, spatial density and polygons within a radius, became aware that the pygeos package had the functionality to do just what I wanted to do and using the STRtree function. It has the functions “nearest”, “query” and “bulk_query”. These are also included with geopandas spatial index if you have pygeos installed and will be included in Shapely 2.0 when released. But for now I’m using them with pygeos.

These functions may end up being slower than the previous ones, but they should work for linestrings, polygons and multipolygons as well as points. Making them much more useful. The updated functions are

dist_to_nearest

def dist_to_nearest(source, candidates, return_geom = False):
    source_py = pygeos.from_shapely(source.geometry)
    candidates_py = pygeos.from_shapely(candidates.geometry)
    tree = pygeos.STRtree(candidates_py)
    s_near, c_near = tree.nearest(source_py)

    dist = pygeos.distance(source_py[s_near.tolist()], candidates_py[c_near.tolist()])

    if return_geom:
        out = dist, candidates.loc[c_near.tolist() ,"geometry"]
    else:
        out = dist

    return out

within_radius

def within_radius(source, candidates, radius):
    source_py = pygeos.from_shapely(source.geometry)
    candidates_py = pygeos.from_shapely(candidates.geometry)

    tree = pygeos.STRtree(candidates_py)
    s_idx, c_idx = tree.query_bulk(source_py, predicate='dwithin', distance=radius)
    return np.bincount(s_idx)

average_within_radius

def average_within_radius(source, candidates, radius, Value):

    source_py = pygeos.from_shapely(source.geometry)
    candidates_py = pygeos.from_shapely(candidates.geometry)

    tree = pygeos.STRtree(candidates_py)
    s_idx, c_idx = tree.query_bulk(source_py, predicate='dwithin', distance=radius)

    avg = [candidates.loc[c_idx[s_idx == x],Value].mean() for x in range(0,max(s_idx)+1)]

    return avg

As you can see, they are all very short. And on small runs the first two seemed comparable to the earlier functions. But the last one “average_within_radius” is very, very slow because of my poor coding of the generation of avg.

On a smallish dataset, the within_radius is slower than balltree based function. So I’ll keep both.

dist_to_nearest is quicker than my balltree function (that was over engineered by a long way) and is more general. This is a win.

The new function based on pygeos is definitely not as good. Given that this is less useful for polygons I’m not going to spend more time on it.

Leave a Comment

Your email address will not be published. Required fields are marked *