在 python 中检查点是否在多边形内的最快方法是什么

小开

最佳答案

from shapely.geometry import Point
from shapely.geometry.polygon import Polygon


point = Point(0.5, 0.5)
polygon = Polygon([(0, 0), (0, 1), (1, 1), (1, 0)])
print(polygon.contains(point))

从您提到的方法中，我只使用了第二种方法 path.contains_points，它工作得很好。在任何情况下，取决于测试所需的精度，我建议创建一个多边形内所有节点都为 True (如果不是，则为 False)的数字 bool 网格。如果你打算做一个测试很多点，这可能会更快(尽管注意这依赖于你在一个“像素”公差范围内进行测试) :

from matplotlib import path
import matplotlib.pyplot as plt
import numpy as np


first = -3
size  = (3-first)/100
xv,yv = np.meshgrid(np.linspace(-3,3,100),np.linspace(-3,3,100))
p = path.Path([(0,0), (0, 1), (1, 1), (1, 0)])  # square with legs length 1 and bottom left corner at the origin
flags = p.contains_points(np.hstack((xv.flatten()[:,np.newaxis],yv.flatten()[:,np.newaxis])))
grid = np.zeros((101,101),dtype='bool')
grid[((xv.flatten()-first)/size).astype('int'),((yv.flatten()-first)/size).astype('int')] = flags


xi,yi = np.random.randint(-300,300,100)/100,np.random.randint(-300,300,100)/100
vflag = grid[((xi-first)/size).astype('int'),((yi-first)/size).astype('int')]
plt.imshow(grid.T,origin='lower',interpolation='nearest',cmap='binary')
plt.scatter(((xi-first)/size).astype('int'),((yi-first)/size).astype('int'),c=vflag,cmap='Greens',s=90)
plt.show()

结果是这样的:

小开

你的测试很好，但它只测量一些特定的情况: 我们有一个多边形的许多顶点，和长数组的点检查他们在多边形。

而且，我想你不是在测量多边形内部的 matplotlib 方法与射线方法, 但是 Matplotlib 优化迭代与简单列表迭代

让我们做 N 个独立的比较(N 对点和多边形) ？

# ... your code...
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)[:-1]]


M = 10000
start_time = time()
# Ray tracing
for i in range(M):
x,y = np.random.random(), np.random.random()
inside1 = ray_tracing_method(x,y, polygon)
print "Ray Tracing Elapsed time: " + str(time()-start_time)


# Matplotlib mplPath
start_time = time()
for i in range(M):
x,y = np.random.random(), np.random.random()
inside2 = path.contains_points([[x,y]])
print "Matplotlib contains_points Elapsed time: " + str(time()-start_time)

结果:

Ray Tracing Elapsed time: 0.548588991165
Matplotlib contains_points Elapsed time: 0.103765010834

Matplotlib 仍然要好得多，但并不比 Matplotlib 好100倍。现在让我们尝试更简单的多边形..。

lenpoly = 5
# ... same code

结果:

Ray Tracing Elapsed time: 0.0727779865265
Matplotlib contains_points Elapsed time: 0.105288982391

小开

如果速度是您所需要的，并且不存在额外的依赖性问题，那么您可能会发现 numba非常有用(现在它在任何平台上都非常容易安装)。您提出的经典 ray_tracing方法可以通过使用 numba @jit修饰符和将多边形转换为一个数字阵列来轻松地移植到 numba。代码应该是这样的:

@jit(nopython=True)
def ray_tracing(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y


return inside

第一次执行所需的时间比任何后续调用都要长一点:

%%time
polygon=np.array(polygon)
inside1 = [numba_ray_tracing_method(point[0], point[1], polygon) for
point in points]


CPU times: user 129 ms, sys: 4.08 ms, total: 133 ms
Wall time: 132 ms

在编译之后将减少到:

CPU times: user 18.7 ms, sys: 320 µs, total: 19.1 ms
Wall time: 18.4 ms

如果在函数的第一次调用时需要速度，那么可以使用 pycc在模块中预编译代码。将函数存储在 src.py 中，如:

from numba import jit
from numba.pycc import CC
cc = CC('nbspatial')




@cc.export('ray_tracing',  'b1(f8, f8, f8[:,:])')
@jit(nopython=True)
def ray_tracing(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in range(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y


return inside




if __name__ == "__main__":
cc.compile()

使用 python src.py构建并运行:

import nbspatial


import numpy as np
lenpoly = 100
polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in
np.linspace(0,2*np.pi,lenpoly)[:-1]]


# random points set of points to test
N = 10000
# making a list instead of a generator to help debug
points = zip(np.random.random(N),np.random.random(N))


polygon = np.array(polygon)


%%time
result = [nbspatial.ray_tracing(point[0], point[1], polygon) for point in points]


CPU times: user 20.7 ms, sys: 64 µs, total: 20.8 ms
Wall time: 19.9 ms

在我使用的 numba 代码中: ‘ b1(f8，f8，f8[ : ，: ])’

为了用 nopython=True编译，每个变量都需要在 for loop之前声明。

在预构建 src 代码中，行:

@cc.export('ray_tracing' , 'b1(f8, f8, f8[:,:])')

用于声明函数名及其 I/O var 类型、一个布尔输出 b1和两个浮点 f8以及一个浮点 f8[:,:]的二维数组作为输入。

编辑2021年1月4日

对于我的用例，我需要检查一个多边形内是否有多个点——在这种情况下，利用 numba 并行功能在一系列点上循环是有用的。上面的例子可以改为:

from numba import jit, njit
import numba
import numpy as np


@jit(nopython=True)
def pointinpolygon(x,y,poly):
n = len(poly)
inside = False
p2x = 0.0
p2y = 0.0
xints = 0.0
p1x,p1y = poly[0]
for i in numba.prange(n+1):
p2x,p2y = poly[i % n]
if y > min(p1y,p2y):
if y <= max(p1y,p2y):
if x <= max(p1x,p2x):
if p1y != p2y:
xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
if p1x == p2x or x <= xints:
inside = not inside
p1x,p1y = p2x,p2y


return inside




@njit(parallel=True)
def parallelpointinpolygon(points, polygon):
D = np.empty(len(points), dtype=numba.boolean)
for i in numba.prange(0, len(D)):
D[i] = pointinpolygon(points[i,0], points[i,1], polygon)
return D

注意: 预编译上面的代码将不支持 numba 的并行功能(并行 CPU 目标不支持 pycc/AOT编译)参见: < a href = “ https://github.com/numba/numba/questions/3336”rel = “ noReferrer”> https://github.com/numba/numba/issues/3336

测试:

import numpy as np lenpoly = 100 polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)[:-1]] polygon = np.array(polygon) N = 10000 points = np.random.uniform(-1.5, 1.5, size=(N, 2))

对于72核机器上的 N=10000，返回:

%%timeit parallelpointinpolygon(points, polygon) # 480 µs ± 8.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

编辑:

修正循环从 0开始而不是从 1开始(多谢@mehdi) :

for i in numba.prange(0, len(D))

编辑:

跟进@mehdi 的比较，我在下面添加了一个基于 GPU 的方法。它使用来自 cuspatial库的 point_in_polygon方法:

import numpy as np import cudf import cuspatial N = 100000002 lenpoly = 1000 polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)] polygon = np.array(polygon) points = np.random.uniform(-1.5, 1.5, size=(N, 2)) x_pnt = points[:,0] y_pnt = points[:,1] x_poly =polygon[:,0] y_poly = polygon[:,1] result = cuspatial.point_in_polygon( x_pnt, y_pnt, cudf.Series([0], index=['geom']), cudf.Series([0], name='r_pos', dtype='int32'), x_poly, y_poly, )

按照@Mehdi 的比较，对于 N=100000002和 lenpoly=1000，我得到了以下结果:

time_parallelpointinpolygon: 161.54760098457336 time_mpltPath: 307.1664695739746 time_ray_tracing_numpy_numba: 353.07356882095337 time_is_inside_sm_parallel: 37.45389246940613 time_is_inside_postgis_parallel: 127.13793849945068 time_is_inside_rapids: 4.246025562286377

硬件规格:

CPU 英特尔至强 E1240

Nvidia GTX 1070

备注:

cuspatial.point_in_poligon方法非常健壮和强大，它提供了处理多个和复杂多边形的能力(我猜是以牺牲性能为代价的)

numba方法也可以“移植”到 图形处理器上-看到一个比较将是有趣的，其中包括移植到@Mehdi (is_inside_sm)提到的最快方法的 cuda。

小开

我就把它放在这里，用 numpy 重写上面的代码，也许有人会觉得它很有用:

def ray_tracing_numpy(x,y,poly): n = len(poly) inside = np.zeros(len(x),np.bool_) p2x = 0.0 p2y = 0.0 xints = 0.0 p1x,p1y = poly[0] for i in range(n+1): p2x,p2y = poly[i % n] idx = np.nonzero((y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= max(p1x,p2x)))[0] if p1y != p2y: xints = (y[idx]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x if p1x == p2x: inside[idx] = ~inside[idx] else: idxx = idx[x[idx] <= xints] inside[idxx] = ~inside[idxx] p1x,p1y = p2x,p2y return inside

将光线追踪包装到

def ray_tracing_mult(x,y,poly): return [ray_tracing(xi, yi, poly[:-1,:]) for xi,yi in zip(x,y)]

测试了100000分，结果:

ray_tracing_mult 0:00:00.850656 ray_tracing_numpy 0:00:00.003769

小开

不同方法的比较

我找到了其他方法来检查一个点是否在一个多边形(给你)内。我只测试了其中的两个方法(is _ inside _ sm 和 is _ inside _ postgis) ，结果与其他方法相同。

多亏了@epifanio，我将这些代码并行化，并将它们与@epifanio 和@user3274748(ray _ trace _ numpy)方法进行了比较。注意，这两个方法都有一个 bug，所以我修复了它们，如下面的代码所示。

我还发现，用于创建多边形的代码不生成封闭路径 np.linspace(0,2*np.pi,lenpoly)[:-1]。因此，上述 GitHub 存储库中提供的代码可能无法正常工作。所以最好创建一个 关门了路径(第一个和最后一个点应该是相同的)。

密码

方法1: 多边形平行点

from numba import jit, njit import numba import numpy as np @jit(nopython=True) def pointinpolygon(x,y,poly): n = len(poly) inside = False p2x = 0.0 p2y = 0.0 xints = 0.0 p1x,p1y = poly[0] for i in numba.prange(n+1): p2x,p2y = poly[i % n] if y > min(p1y,p2y): if y <= max(p1y,p2y): if x <= max(p1x,p2x): if p1y != p2y: xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x if p1x == p2x or x <= xints: inside = not inside p1x,p1y = p2x,p2y return inside @njit(parallel=True) def parallelpointinpolygon(points, polygon): D = np.empty(len(points), dtype=numba.boolean) for i in numba.prange(0, len(D)): #<-- Fixed here, must start from zero D[i] = pointinpolygon(points[i,0], points[i,1], polygon) return D

方法2: ray _ trace _ numpy _ numba

@jit(nopython=True) def ray_tracing_numpy_numba(points,poly): x,y = points[:,0], points[:,1] n = len(poly) inside = np.zeros(len(x),np.bool_) p2x = 0.0 p2y = 0.0 p1x,p1y = poly[0] for i in range(n+1): p2x,p2y = poly[i % n] idx = np.nonzero((y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= max(p1x,p2x)))[0] if len(idx): # <-- Fixed here. If idx is null skip comparisons below. if p1y != p2y: xints = (y[idx]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x if p1x == p2x: inside[idx] = ~inside[idx] else: idxx = idx[x[idx] <= xints] inside[idxx] = ~inside[idxx] p1x,p1y = p2x,p2y return inside

方法3: Matplotlib 包含 _ points

path = mpltPath.Path(polygon,closed=True) # <-- Very important to mention that the path # is closed (default is false)

方法4: is _ inside _ sm (从给你得到)

@jit(nopython=True) def is_inside_sm(polygon, point): length = len(polygon)-1 dy2 = point[1] - polygon[0][1] intersections = 0 ii = 0 jj = 1 while ii<length: dy = dy2 dy2 = point[1] - polygon[jj][1] # consider only lines which are not completely above/bellow/right from the point if dy*dy2 <= 0.0 and (point[0] >= polygon[ii][0] or point[0] >= polygon[jj][0]): # non-horizontal line if dy<0 or dy2<0: F = dy*(polygon[jj][0] - polygon[ii][0])/(dy-dy2) + polygon[ii][0] if point[0] > F: # if line is left from the point - the ray moving towards left, will intersect it intersections += 1 elif point[0] == F: # point on line return 2 # point on upper peak (dy2=dx2=0) or horizontal line (dy=dy2=0 and dx*dx2<=0) elif dy2==0 and (point[0]==polygon[jj][0] or (dy==0 and (point[0]-polygon[ii][0])*(point[0]-polygon[jj][0])<=0)): return 2 ii = jj jj += 1 #print 'intersections =', intersections return intersections & 1 @njit(parallel=True) def is_inside_sm_parallel(points, polygon): ln = len(points) D = np.empty(ln, dtype=numba.boolean) for i in numba.prange(ln): D[i] = is_inside_sm(polygon,points[i]) return D

方法5: is _ inside _ postgis (从给你获得)

@jit(nopython=True) def is_inside_postgis(polygon, point): length = len(polygon) intersections = 0 dx2 = point[0] - polygon[0][0] dy2 = point[1] - polygon[0][1] ii = 0 jj = 1 while jj<length: dx = dx2 dy = dy2 dx2 = point[0] - polygon[jj][0] dy2 = point[1] - polygon[jj][1] F =(dx-dx2)*dy - dx*(dy-dy2); if 0.0==F and dx*dx2<=0 and dy*dy2<=0: return 2; if (dy>=0 and dy2<0) or (dy2>=0 and dy<0): if F > 0: intersections += 1 elif F < 0: intersections -= 1 ii = jj jj += 1 #print 'intersections =', intersections return intersections != 0 @njit(parallel=True) def is_inside_postgis_parallel(points, polygon): ln = len(points) D = np.empty(ln, dtype=numba.boolean) for i in numba.prange(ln): D[i] = is_inside_postgis(polygon,points[i]) return D

基准

1000万分的时机:

parallelpointinpolygon Elapsed time: 4.0122294425964355 Matplotlib contains_points Elapsed time: 14.117807388305664 ray_tracing_numpy_numba Elapsed time: 7.908452272415161 sm_parallel Elapsed time: 0.7710440158843994 is_inside_postgis_parallel Elapsed time: 2.131121873855591

这是密码。

import matplotlib.pyplot as plt import matplotlib.path as mpltPath from time import time import numpy as np np.random.seed(2) time_parallelpointinpolygon=[] time_mpltPath=[] time_ray_tracing_numpy_numba=[] time_is_inside_sm_parallel=[] time_is_inside_postgis_parallel=[] n_points=[] for i in range(1, 10000002, 1000000): n_points.append(i) lenpoly = 100 polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)] polygon = np.array(polygon) N = i points = np.random.uniform(-1.5, 1.5, size=(N, 2)) #Method 1 start_time = time() inside1=parallelpointinpolygon(points, polygon) time_parallelpointinpolygon.append(time()-start_time) # Method 2 start_time = time() path = mpltPath.Path(polygon,closed=True) inside2 = path.contains_points(points) time_mpltPath.append(time()-start_time) # Method 3 start_time = time() inside3=ray_tracing_numpy_numba(points,polygon) time_ray_tracing_numpy_numba.append(time()-start_time) # Method 4 start_time = time() inside4=is_inside_sm_parallel(points,polygon) time_is_inside_sm_parallel.append(time()-start_time) # Method 5 start_time = time() inside5=is_inside_postgis_parallel(points,polygon) time_is_inside_postgis_parallel.append(time()-start_time) plt.plot(n_points,time_parallelpointinpolygon,label='parallelpointinpolygon') plt.plot(n_points,time_mpltPath,label='mpltPath') plt.plot(n_points,time_ray_tracing_numpy_numba,label='ray_tracing_numpy_numba') plt.plot(n_points,time_is_inside_sm_parallel,label='is_inside_sm_parallel') plt.plot(n_points,time_is_inside_postgis_parallel,label='is_inside_postgis_parallel') plt.xlabel("N points") plt.ylabel("time (sec)") plt.legend(loc = 'best') plt.show()

结论

最快的算法是:

1-is _ inside _ sm _ 並行

2-is _ inside _ postgis _ 並行

3-平行顶点多边形(@epifanio)

小开

奇偶规则的纯数字向量化实现

其他答案要么是缓慢的 python 循环，要么需要外部依赖或 cython 处理。

import numpy as np def points_in_polygon(polygon, pts): pts = np.asarray(pts,dtype='float32') polygon = np.asarray(polygon,dtype='float32') contour2 = np.vstack((polygon[1:], polygon[:1])) test_diff = contour2-polygon mask1 = (pts[:,None] == polygon).all(-1).any(-1) m1 = (polygon[:,1] > pts[:,None,1]) != (contour2[:,1] > pts[:,None,1]) slope = ((pts[:,None,0]-polygon[:,0])*test_diff[:,1])-(test_diff[:,0]*(pts[:,None,1]-polygon[:,1])) m2 = slope == 0 mask2 = (m1 & m2).any(-1) m3 = (slope < 0) != (contour2[:,1] < polygon[:,1]) m4 = m1 & m3 count = np.count_nonzero(m4,axis=-1) mask3 = ~(count%2==0) mask = mask1 | mask2 | mask3 return mask N = 1000000 lenpoly = 1000 polygon = [[np.sin(x)+0.5,np.cos(x)+0.5] for x in np.linspace(0,2*np.pi,lenpoly)] polygon = np.array(polygon,dtype='float32') points = np.random.uniform(-1.5, 1.5, size=(N, 2)).astype('float32') mask = points_in_polygon(polygon, points)

多边形尺寸为1000的100万个点需要44个点。

它的数量级比其他实现慢，但仍然比 python 循环快，而且只使用 numpy。

小开

inpoly是在 python 中执行多边形检查的黄金标准:

Https://github.com/dengwirda/inpoly-python

简单用法:

from inpoly import inpoly2 import numpy as np xmin, xmax, ymin, ymax = 0, 1, 0, 1 x0, y0, x1, y1 = 0.5, 0.5, 0, 1 #define any n-sided polygon p = np.array([[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax], [xmin, ymin]]) #define some coords coords = np.array([[x0, y0], [x1, y1]]) #get boolean mask for points if in or on polygon perimeter isin, ison = inpoly2(coords, p)

后端的 C 实现快如闪电