Eyram SchwingerDepartment of Mathematics, University of GhanaRalph TwumDepartment of Mathematics, University of GhanaThomas KatsekporDepartment of Mathematics, University of GhanaGladys SchwingerInstitute of Environment and Sanitation Studies, University of Ghana
Abstract
In this work, we designed algorithms for the point in polygon problem based on the ray casting algorithm using equations from vector geometry. The algorithms were implemented using the python programming language. We tested the algorithm against the point in polygon algorithms used by the shapely (and by extension geopandas) library and the OpenCV library using points from the google Open Buildings project. Our algorithm in pure python performed much better than the shapely implementation. It also performed better than the OpenCV implementation when combined with the Numba optimization library. We also performed simulations to verify that our algorithm performance was of the order
Keywordsβ point in polygon, ray casting, vector geometry, shapely, Numba
Introduction
The point in polygon problem is a fundamental geometric problem with a wide range of applications in computer graphics, painting simulation, robotics, geosciences, remote sensing, and geographic information systems. Therefore, efficient numerical algorithms for solving point in polygon computations are essential.
Many algorithms have been proposed (for instance in [7, 5, 10, 2, 8, 6]).Some of the methods include the winding number method [7, 10, 2, 6], which measures the number of times a polygon encloses a point, thesum of angles method [8, 6], where the sum of the interior angles of the polygon is used, and the ray castingmethod [8, 5]. The ray casting method involvesshooting a ray from the point in question in any direction and countinghow many times the ray intersects the boundaries of the polygon.Most of these methods assume the polygon is convex.
Several of the algorithms are computationally expensive and are sensitive to rounding and truncation errors.[6]states that the time complexities of the preprocessing step rangesfrom to In this paper, we design an algorithm for point in polygon computationusing vector geometric methods, with the advantage that the algorithmcan take advantage of processor parallelization.
The work is broken down in the following sections: The method section where we develop our algorithm from the vector geometric definition of the line. In the implementation section, we write the python code for the algorithm developed in methods then in the simulation section we test the algorithm on real world data and against other algorithms.
Method
Our proposed algorithm is based on the ray casting method. Suppose we wantto determine whether a point lies within a closed polygon with vertices , where In the ray casting method, a ray is drawn moving out from thepoint in a specific direction. The point lies inside the polygon if intersects the edges of an odd number of times.In vector geometry, one way of writing out the equation of a lineis the normal form. Let be a normal vector to theline. Note that a line in the 2D plane has two normals going in oppositedirections that can be denoted and with so can be either or Assuming is a known point on the line and is any point on the line, the equation of the line can be writtenas
(1) |
Every point on the line satisfies equation 1.Points that do not lie on the line do not satisfy the equation, hencewith points on opposite sides of the line having different signs.This is illustrated in Figure 1. The point is onone side of the line while the point is on the opposite side.Using the concept of perpendicular distance, the position vectorsof the points and can be written as and where represents the unit vector in the directionof and and are positive constants.Let Then and Substituting and into the equation of the linegives
Now ,since the points and lie on the line. Similarly
In our work, we choose to shoot a ray from the point with directionvector This is a ray shot horizontally andto the right of the point Then the normal vectors to the lineare We choose the vector Since the source of the ray is the vector is a point on the line. This gives the equation of the line Thus the line is the locus of points such that This equation reduces to
For the first constraint which is looking for ray crossings, if theray crosses the polygon at the edge connecting the vertices and then the points and willbe on opposite sides of the ray hence will have opposite signs.Let Hence if the ray crosses the edge , otherwise This test will however test for all edge crossings of the ray to theleft and right of .
The second constraint of the ray being cast towards the right fromthe point will be enforced by testing for the vertices with componentsgreater or equal to the component of the point We tested this contraint in two ways:
- 1.
Assume the points and are the endpoints of theedge where the ray cast from intersects the polygon Theconstraint is satisfied if is on the left of the line To test which side of the line the point lies, define the directionvector of the line then or Since we want to test which side of the line lies on, we choose to be the vector that forms an acute angle withthe vector We obtain the equation of the lineto be Note that the vector can be replaced by thevector in the equation of the line Then the point lies to the left of the edge if
- 2.
The vector form of the equation of the line is given as This means that
(2) (3) (4) Since the ray is a horizontal line passing through the point we know that the component of the point of intersection betweenthe edge and the ray is This gives
(5) We then substitute to get Theray is then understood to cross the edge if
This helps us to develop the following algorithm:
- 1.
Input: vertices of the polygon as a 2-dimensionalarray.
- 2.
Find the values where is thecomponent of
- 3.
Calculate sgn_chg
- 4.
Find where sgn_chg This means there is a ray crossingbetween the vertices and
See AlsoNumpy optimization with Numba - GeeksforGeeksNumba vs. Cython: A Technical Comparison - GeeksforGeeksBrandon Rohrer on LinkedIn: Numba rule of thumb #7: Pass return variables in as input arguments. Thisβ¦Optimizing Performance in Numba: Advanced Techniques for Parallelization - GeeksforGeeks - 5.
For each sgn_chgi found in 4, calculate
(6) (7) - 6.
Find
- 7.
If is odd, then contains
Implementation
The defined algorithm was implemented in python. The basic implementationis shown in Algorithms 1 and 2. In bothimplementations, lines 2-4 perform the border crossing test usingthe sign change algorithm. Once that is done, lines 6-7 in algorithm1 calculate the normal vector to thepoint pair where the sign change occured, then lines 9-12 determineif the calculated normal is in the same direction as the vector If it is not, the vector replaces Line 14 then calculates the sign of the point in relationto each of the lines. Since we are only shooting the rays in one direction,line 16 counts the number of polygon boundaries that are to the rightof the point or contain In algorithm 2,line 6 calculates the direction vectors of the point pair where the sign change occured, then line 7 calculates the parameter in equation 2 using equation 6.Line 8 then calculates the corresponding using equation 7.Line 9 then counts how many of the calculated are the same orto the right of the input
Simulation
To test the algorithms, we downloaded data from the Google Open Buildingsproject [12] and shapefile for the map of the Republic of Ghana from the Database of Global Administrative Areas [1]. From [12], we wanted data only for Ghana, so we downloaded the following cells: 0fd_buildings,0e3_buildings, 11d_buildings, 103_buildings, and 0ff_buildings as can be seen in Figure 2. The initial data containedapproximately 54 million points. Initial preprocessing to reduce thenumber of points was done by restricting to only points within thebounding box of Ghana downloaded from [1].
This reduced the number of points to 17,097,823. We run our algorithm with these points as the points and the map of Ghana as the polygon. Figure 3shows the polygon with the map of Ghana and the points to be tested.Figures 4 and 5 show theresults of all points for which our algorithm returned True and Falserespectively. It can be seen that the algorithm works very well.
The algorithm was also tested against other algorithms. Our initialsimulation was done on a Dell XPS 15 9570 with an Intel i7-8750H 6core/12 thread processor and 32GB of RAM running a Jupyter notebook withpython 3.10.8. On this setup, we tested our algorithm against thepoint in polygon algorithms used by shapely [4], geopandas[9] and OpenCV [3].Table 1 shows the execution time of our algorithmcompared to the three other algorithms. It can be seen that shapelyand geopandas took in excess of 16 hours to test their point in polygonalgorithms on the 17,097,823 points. In both shapely and geopandasthe contains method of the polygon and the withinmethod of the point were all tested with consistent results. Notethat geopandas uses shapelyβs point in polygon algorithm, hence thesimilarity in execution times. It is only the OpenCV point in polygonalgorithm that was faster than our algorithm. The OpenCV library iswritten in C++ and takes advantage of code optimization features.
Point in polygon algorithm | Execution time |
---|---|
Algorithm 1 | 18min 34s |
Algorithm 2 | 18min 30s |
shapely | 16h 2min 36s |
geopandas | 16h 38min 23s |
OpenCV | 13min 20s |
We therefore combined our algorithm with numba, a high performancepython compiler [11] with both parallel and non-parallelprocessing. These can be seen in Algorithms 3 for numbawithout parallelization and 4 for numba with parallelization.Algorithm 2 was also tested on python 3.11 whichis claimed to have faster execution of code based on its built-in optimizations.The results are shown in Table 2. Fromthe table it can be seen that our algorithm lends itself well to parallelization,as the execution time was cut sharply from 18min 30s down to 3min30s when all 12 processor threads are utilized. Even without parallelization,there was an improvement in execution time from 18min 30s to 13min17s which is slightly better than the execution time for the OpenCVimplementation. Again, our algorithm made some speed gains when runin python 3.11. Our algorithm was also run on a Dell Optiplex 9020with 8GB of RAM and an Intel i5-4590 CPU with 4 cores and 4 threads runningpython 3.10.8. The results are shown in Table 4.From the table it can be seen that despite the lower specificationsin processor and memory, the algorithm still performed appreciablywell. Optimizing the algorithm with numba on 4 cores produces runtimes of 8min 59s and in 16min 3s without parallelization.
Algorithm | Platform | Execution Time |
---|---|---|
Algorithm 3 | numba without parallelization (python 3.10.8) | 13min 17s |
Algorithm 4 | numba with parallelization (python 3.10.8) | 3min 30s |
Algorithm 2 | python 3.11 | 16min 25s |
OpenCV | python 3.10.8 | 13min 20s |
OpenCV | python 3.11 | 16min 45s |
Tables 3-8and Figures 6-7show the execution times of the various algorithms for different numberof points from 10 to 17,097,823. It can be seen that generally, shapelyand by extension geopandas are the worst performing methods. In Figure 6(a), the steep slope of the shapely algorithm makes it difficult tosee the performance of the other algorithms. The shapely algorithmis therefore removed in the (b) and (c) parts of the figure to properlyshow the performance of the other algorithms. Figure 6(b) shows the performance of the algorithm for logarithmically spacedpoints from 10β10,000,000 and for 17,097,823 points, while the (c) part shows theperformance of the algorithm for the logarithmically spaced points10, 100 and 1000 points which are not visible in (b). The shapelyexecution time is removed from subsequent plots to allow the executiontimes from the other algorithms to be visualized. It can be seen that,consistently the numba implementation with parallelization performsthe fastest, followed by the numba implementation without parallelization,then the OpenCV implementation, then Algorithm 2, followed by Algorithm1. In Figure 7, the executiontimes for the numba implementations are absent because numba was notsupported on the new python 3.11 as at the time of performing thesimulations. In Tables 7-8,we show the execution times of the algorithms on a linearly spacedscale from 10β10,000 in Table 7 andfrom 1,908,647 to 17,097,823 in Table 8.We perform a linear least squares fit of the data in Table 7to find the equation of the form , where is the numberof points and is the execution time using the matrix equation
(8) |
In Equation 8, is the vector of the number of points thatwere tested and is the vector of execution times. is the vector of ones with the same length as and The slopes and intercepts of the least squareslines of best fit are shown in Table 9. Figure 8shows the least squares lines and the original data used to calculatethem. Figure 9 shows the least squareslines overlaid on the execution times in Table 8to verify the linearity of the algorithms. It can be seen that thelargest deviations in execution times come from the numba algorithmwith parallelization which may be due to the calls to the multipleprocessor threads. Tables 10 and 11show the absolute and relative errors respectively in the executiontimes projected by the least squares lines for the points in Table8 and the actual execution times inthe table. We can say that it is possible to use the least squaresfit of the execution times in Table 7with between 10 and 10,000 points to predict the execution times fora larger number of points as contained in Table 8.In our case, the largest errors in absolute terms were in the opencv implementation 44.48 seconds error while the numba implementation with parallelization had the highest relative error. However and the least error was in the numba implementation without parallelization which was 11 seconds.
Algorithm | Algorithm_1 | Algorithm_2 | opencv | numba | numba_p | shapely |
---|---|---|---|---|---|---|
10 | 0.000674 | 0.000804 | 0.000792 | 0.000455 | 0.000193 | 0.039866 |
100 | 0.006780 | 0.007234 | 0.005199 | 0.004525 | 0.001524 | 0.384787 |
1000 | 0.067971 | 0.064967 | 0.050449 | 0.045648 | 0.014742 | 3.987965 |
10000 | 0.674826 | 0.660946 | 0.496855 | 0.458177 | 0.131555 | 37.005583 |
100000 | 6.749296 | 6.394145 | 5.031192 | 4.663205 | 1.239017 | 348.450866 |
1000000 | 67.799494 | 64.238579 | 49.027873 | 45.881565 | 12.510255 | 3460.231409 |
10000000 | 642.754256 | 636.792082 | 467.686868 | 453.594473 | 121.763848 | 34361.098605 |
17097823 | 1117.207423 | 1055.401415 | 800.032984 | 788.796931 | 202.510374 | 58475.832875 |
Algorithm | Execution Time |
---|---|
Algorithm 1 | 23min 57s |
Algorithm 2 | 23min 2s |
Algorithm 2 with numba without parallelization | 16min 3s |
Algorithm 2 with numba with parallelization | 8min 59s |
Points | Algorithm_1 | Algorithm_2 | opencv |
---|---|---|---|
10 | 0.000711 | 0.000719 | 0.000720 |
100 | 0.006478 | 0.005915 | 0.005638 |
1000 | 0.062355 | 0.056861 | 0.055172 |
10000 | 0.616584 | 0.566055 | 0.548117 |
100000 | 6.154858 | 5.680473 | 5.431181 |
1000000 | 61.618874 | 56.637895 | 56.335213 |
10000000 | 606.914380 | 566.261798 | 588.910343 |
17097823 | 1078.766585 | 1013.177792 | 1005.053892 |
Points | Algorithm_1 | Algorithm_2 | opencv | numba | numba_p |
---|---|---|---|---|---|
10 | 0.001023 | 0.003242 | 0.001362 | 0.002346 | 0.000979 |
100 | 0.009219 | 0.011926 | 0.006372 | 0.007208 | 0.001654 |
1000 | 0.086663 | 0.081737 | 0.044268 | 0.056429 | 0.015674 |
10000 | 0.865499 | 0.818598 | 0.441505 | 0.561596 | 0.157564 |
100000 | 8.678028 | 8.170648 | 4.419931 | 5.635208 | 1.825092 |
1000000 | 85.399976 | 82.598725 | 44.758200 | 55.967566 | 31.157190 |
10000000 | 850.251691 | 815.232475 | 421.485714 | 559.662381 | 309.712137 |
17097823 | 1459.900754 | 1387.815551 | 720.694399 | 951.991756 | 534.855275 |
Points | Algorithm_2 | Algorithm_1 | numba | numba_p | opencv | shapely |
---|---|---|---|---|---|---|
10 | 0.000797 | 0.000678 | 0.000455 | 0.000193 | 0.000885 | 0.040110 |
1120 | 0.085122 | 0.080887 | 0.053803 | 0.011901 | 0.060541 | 3.921390 |
2230 | 0.160281 | 0.170801 | 0.100509 | 0.021441 | 0.119104 | 7.746753 |
3340 | 0.218364 | 0.224381 | 0.150193 | 0.030485 | 0.176672 | 11.664765 |
4450 | 0.289474 | 0.298039 | 0.204831 | 0.047858 | 0.239670 | 15.478461 |
5560 | 0.366708 | 0.391248 | 0.251546 | 0.049396 | 0.283920 | 19.261769 |
6670 | 0.424220 | 0.449986 | 0.298660 | 0.058415 | 0.331706 | 23.099383 |
7780 | 0.497226 | 0.523772 | 0.348369 | 0.070565 | 0.387975 | 27.049710 |
8890 | 0.572252 | 0.640621 | 0.413792 | 0.081335 | 0.448136 | 30.752390 |
10000 | 0.639463 | 0.667802 | 0.446130 | 0.090945 | 0.494594 | 34.666056 |
Points | Algorithm_2 | Algorithm_1 | numba | numba_p | opencv |
---|---|---|---|---|---|
1908647 | 120.244296 | 129.822878 | 86.105501 | 17.027383 | 93.905902 |
3807294 | 239.848545 | 258.961195 | 171.758836 | 33.963956 | 187.311751 |
5705941 | 359.452793 | 388.099512 | 257.412170 | 50.900529 | 280.717599 |
7604588 | 479.057041 | 517.237829 | 343.065504 | 67.837102 | 374.123447 |
9503235 | 598.661289 | 646.376145 | 428.718838 | 84.773674 | 467.529295 |
11401882 | 718.265538 | 775.514462 | 514.372172 | 101.710247 | 560.935143 |
13300529 | 837.869786 | 904.652779 | 600.025506 | 118.646820 | 654.340991 |
15199176 | 957.474034 | 1033.791096 | 685.678841 | 135.583393 | 747.746839 |
17097823 | 1077.078283 | 1162.929413 | 771.332175 | 152.519965 | 841.152687 |
Algorithm | Slope | Intercept |
---|---|---|
Algorithm_2 | 0.000063 | 0.010103 |
Algorithm_1 | 0.000068 | 0.004402 |
numba | 0.000045 | 0.001039 |
numba_p | 0.000009 | 0.001607 |
opencv | 0.000049 | 0.008094 |
shapely | 0.003462 | 0.041119 |
Points | Algorithm_2 | Algorithm_1 | numba | numba_p | opencv |
---|---|---|---|---|---|
1908647 | 0.260000 | 0.010000 | 2.070000 | 5.350000 | 0.980000 |
3807294 | 1.700000 | 5.630000 | 4.950000 | 11.700000 | 6.710000 |
5705941 | 2.800000 | 12.620000 | 8.540000 | 13.730000 | 11.750000 |
7604588 | 3.680000 | 20.350000 | 8.090000 | 11.360000 | 17.100000 |
9503235 | 4.500000 | 21.730000 | 11.970000 | 14.110000 | 22.460000 |
11401882 | 5.150000 | 26.140000 | 6.060000 | 16.990000 | 27.720000 |
13300529 | 2.730000 | 30.980000 | 2.080000 | 23.980000 | 33.250000 |
15199176 | 13.500000 | 28.640000 | 4.270000 | 27.470000 | 39.010000 |
17097823 | 27.740000 | 33.020000 | 11.230000 | 28.680000 | 44.480000 |
Points | Algorithm_2 | Algorithm_1 | numba | numba_p | opencv |
---|---|---|---|---|---|
1908647 | 0.000000 | 0.000000 | 0.020000 | 0.240000 | 0.010000 |
3807294 | 0.010000 | 0.020000 | 0.030000 | 0.260000 | 0.040000 |
5705941 | 0.010000 | 0.030000 | 0.030000 | 0.210000 | 0.040000 |
7604588 | 0.010000 | 0.040000 | 0.020000 | 0.140000 | 0.050000 |
9503235 | 0.010000 | 0.030000 | 0.030000 | 0.140000 | 0.050000 |
11401882 | 0.010000 | 0.030000 | 0.010000 | 0.140000 | 0.050000 |
13300529 | 0.000000 | 0.040000 | 0.000000 | 0.170000 | 0.050000 |
15199176 | 0.010000 | 0.030000 | 0.010000 | 0.170000 | 0.060000 |
17097823 | 0.030000 | 0.030000 | 0.010000 | 0.160000 | 0.060000 |
Discussion
The implementation of a point in polygon computation using basic vector equations leads to a very useful algorithm which compares very well to the current algorithms that are in use. Moreover, we can scale the algorithm to higher dimensions by considering vector equations in these dimensions and modifying the algorithm accordingly.
We used a linear least squares fit for the values in Table 7 because the values for each point appear to show a linear relationship between the number of points and the execution times of the algorithm. We can infer that the algorithm running time is of linear order, but with an extremely small slope, which allows it to scale very well.
References
- [1]Global administrative areas, gadm database of global administrative areas,version 2.0.www.gadm.org, 2012.www.gadm.org.
- [2]DAlciatore and Rick Miranda.A winding number and point-in-polygon algorithm.Glaxo Virtual Anatomy Project Research Report, Department ofMechanical Engineering, Colorado State University, 1995.
- [3]G.Bradski.The OpenCV Library.Dr. Dobbβs Journal of Software Tools, 2000.
- [4]Sean Gillies etal.Shapely: manipulation and analysis of geometric objects, 2007β.
- [5]Eric Haines.Graphics gems IV, chapter Point in polygon strategies, page24β46.1994.
- [6]Jianqiang Hao, Jianzhi Sun, YiChen, Qiang Cai, and LiTan.Optimal reliable point-in-polygon test and differential codingboolean operations on polygons.Symmetry, 10:477, 2018.
- [7]Kai Hormann and Alexander Agathos.The point in polygon problem for arbitrary polygons.Computational Geometry, 20(3):131β144, November 2001.
- [8]Chong-Wei Huang and Tian-Yuan Shih.On the complexity of point-in-polygon algorithms.Computers and Geosciences, 23(1):109β118, 1997.
- [9]Kelsey Jordahl, JorisVan den Bossche, Martin Fleischmann, Jacob Wasserman,James McBride, Jeffrey Gerard, Jeff Tratner, Matthew Perry, AdrianGarciaBadaracco, Carson Farmer, GeirArne Hjelle, AlanD. Snow, Micah Cochran, SeanGillies, Lucas Culbertson, Matt Bartos, Nick Eubank, maxalbert, AlekseyBilogur, Sergio Rey, Christopher Ren, Dani Arribas-Bel, Leah Wasser,LeviJohn Wolf, Martin Journois, Joshua Wilson, Adam Greenhall, ChrisHoldgraf, Filipe, and François Leblanc.geopandas/geopandas: v0.12.2, 12 2022.
- [10]G.Naresh Kumar and Mallikarjun Bangi.An extension to winding number and point-in-polygon algorithm.IFAC-PapersOnLine, 51(1):548β553, 2018.
- [11]SiuKwan Lam, Antoine Pitrou, and Stanley Seibert.Numba: A llvm-based python jit compiler.In Proceedings of the Second Workshop on the LLVM CompilerInfrastructure in HPC, pages 1β6, 2015.
- [12]WSirko, SKashubin, MRitter, AAnnkah, YBouchareb, YDauphin, DKeysers,MNeumann, MCisse, and JQuinn.Continental-scale building detection from high resolution satelliteimagery.Technical report, 2021.
- [13]J.C. Tallack.Introduction to Vector Analysis.Cambridge University Press, 1970.Reprinted in 2009.
- [14]Stefan Vander Walt, JohannesL Schânberger, Juan Nunez-Iglesias,François Boulogne, JoshuaD Warner, Neil Yager, Emmanuelle Gouillart,and Tony Yu.scikit-image: image processing in python.PeerJ, 2:e453, 2014.