Handle millions of location points with leaflet without crashing the browser
We’ve also published this post in bahasa at https://petaku-gis.github.io/docs/blog/
Web GIS is one solution for the publication of geographic information systems that can be accessed more flexibly between platforms without requiring application installation, by simply accessing through a browser we can display and use geographic information systems, especially in the form of digital maps. 11 years ago Volodymyr Agafonkin created an open source library called leaflet with more than 700 contributors and 34 thousand stars on Github so far. Leaflet has a lot of features from basic basic map visualizations to a huge number of community built plugins.
Leaflet markers are a representation of location points on a map from spatial data, leaflets has out of the box support for displaying markers and can be configured as needed, for small amount of data will be no problem, but when we need to display large amount of data will come problems, leaflet will make more and more elements on the base map, and the browser will work hard and even crash.
This is not a leaflet problems, but how do we approach it to display a lot of data to get more efficiency, when we have large amount of data, there are many ways or approaches that can be done starting from data transformation, for example from location points will convert it to heatmap or convex hull, or another way to reduce data, and another way is to use clustering techniques.
Clustering is the process of grouping several point locations or marker locations. There are several techniques and algorithms that can be used, here we will use kd tree using kdbush library to search the nearest neighbor from the point location.
And here is wonderfull medium posts about kind of geospatial clustering technique https://towardsdatascience.com/geospatial-clustering-kinds-and-uses-9aef7601f386.
Leaflet itself has marker cluster plugins to handle this clustering case, but it is limited to the browser or client side because these plugins are extensions of the leaflet so it will be a little difficult to bring it to the server/backend side. supercluster is a library to do clustering separately from leaflets so that the process can be separated from the browser, the goal is to ease the browser’s work in rendering a lot of data and not crashing our application.
Server side clustering
As we mention in the previous section, that supercluster is a library created by the mapbox team to be able to do clustering separately from leaflets so that the process can be separated from the browser to be able to perform clustering techniques with large data without making the client/browser side applications hang or crash due to heavy rendering processes we can approach by separating the clustering process from the browser to the server side and filtering point locations based on bounding boxes using Postgis by observing bounding box in the leaflet view on the client side.
You can see from the 2 comparisons between client-side and server-side clustering if we reduce data transfer and stay in the client browser processing a small amount of data.
The server process from using the server side approach is select filter -> clustering -> send response more or less for data processing as follows:
Better to limit the max zoom of the map to avoid the clustering process which is too much but not necessary.
So far the supercluster can run in the browser or on the server using the Nodejs runtime. It would be very interesting if we ported to other languages and implement concurrency to increase clustering speed performance, but for know supercluster is fast enough.
For the experiment we will create a very simple mapping system that is mapping the location of tweeter users. for the data we get from Kaggle in the following dataset : link to dataset
The dataset is twitter 1 million Connections User Location, the file has a size of 38 MB and when converted to Geojson the size becomes 230mb
For the database using Postgres with the addition of Postgis for spatial operations. In this experiment, only one table is used as follows:
To seed database we have prepared a script to do that at https://github.com/alfiankan/leaflet-server-side-marker-cluster.git, run it with node runtime
Make sure before running it has changed the database configuration.
We will use the NodeJS runtime and typescript to create the http API . you can clone from the repository we created
Here is the explanation:
This class have a method for querying Postgres with the Postgis extension, namely envelope which is used to select location points in a predetermined bounding box.
Clustering Use Case
This class contains use cases for clustering with the supercluster library uses kdbush (kd tree) for its clustering technique. In this process, the transformation to DTO GeoJSON is also carried out.
For handling requests we use expressjs with endpoints /points with query params bounding box and zoom that we get from the leaflet
The live demo can be accessed through the https://playground.petaku.xyz/clustering playground demo clustering platform, Development of Petaku GIS Platform, No Code Easy to use GIS Platform.
Wrap it up
That’s one of many approach to handle millions of data point locations on leaflets with server side clustering techniques, of course there will be many things that can be improved so we keep exploring. thank you…