Handle millions of location points with leaflet without crashing the browser

Alfiankan
6 min readApr 25, 2022
preview result

We’ve also published this post in bahasa at https://petaku-gis.github.io/docs/blog/

Preface

Web GIS is one solution for the publication of geographic information systems that can be accessed more flexibly between platforms without requiring application installation, by simply accessing through a browser we can display and use geographic information systems, especially in the form of digital maps. 11 years ago Volodymyr Agafonkin created an open source library called leaflet with more than 700 contributors and 34 thousand stars on Github so far. Leaflet has a lot of features from basic basic map visualizations to a huge number of community built plugins.

basic leaflet map

Leaflet Marker​

Leaflet markers are a representation of location points on a map from spatial data, leaflets has out of the box support for displaying markers and can be configured as needed, for small amount of data will be no problem, but when we need to display large amount of data will come problems, leaflet will make more and more elements on the base map, and the browser will work hard and even crash.

broswer page crash

This is not a leaflet problems, but how do we approach it to display a lot of data to get more efficiency, when we have large amount of data, there are many ways or approaches that can be done starting from data transformation, for example from location points will convert it to heatmap or convex hull, or another way to reduce data, and another way is to use clustering techniques.

Clustering

Clustering is the process of grouping several point locations or marker locations. There are several techniques and algorithms that can be used, here we will use kd tree using kdbush library to search the nearest neighbor from the point location.

https://www.mathworks.com/matlabcentral/fileexchange/26649-kdtree-implementation-in-matlab
kd tree
find nearest and removing neighbor

And here is wonderfull medium posts about kind of geospatial clustering technique https://towardsdatascience.com/geospatial-clustering-kinds-and-uses-9aef7601f386.

Leaflet itself has marker cluster plugins to handle this clustering case, but it is limited to the browser or client side because these plugins are extensions of the leaflet so it will be a little difficult to bring it to the server/backend side. supercluster is a library to do clustering separately from leaflets so that the process can be separated from the browser, the goal is to ease the browser’s work in rendering a lot of data and not crashing our application.

Server side clustering

As we mention in the previous section, that supercluster is a library created by the mapbox team to be able to do clustering separately from leaflets so that the process can be separated from the browser to be able to perform clustering techniques with large data without making the client/browser side applications hang or crash due to heavy rendering processes we can approach by separating the clustering process from the browser to the server side and filtering point locations based on bounding boxes using Postgis by observing bounding box in the leaflet view on the client side.

client side clustering
server side rendering

You can see from the 2 comparisons between client-side and server-side clustering if we reduce data transfer and stay in the client browser processing a small amount of data.

The server process from using the server side approach is select filter -> clustering -> send response more or less for data processing as follows:

flow envelop and cluster

Better to limit the max zoom of the map to avoid the clustering process which is too much but not necessary.

So far the supercluster can run in the browser or on the server using the Nodejs runtime. It would be very interesting if we ported to other languages and implement concurrency to increase clustering speed performance, but for know supercluster is fast enough.

Implementation

For the experiment we will create a very simple mapping system that is mapping the location of tweeter users. for the data we get from Kaggle in the following dataset : link to dataset

The dataset is twitter 1 million Connections User Location, the file has a size of 38 MB and when converted to Geojson the size becomes 230mb

For the database using Postgres with the addition of Postgis for spatial operations. In this experiment, only one table is used as follows:

To seed database we have prepared a script to do that at https://github.com/alfiankan/leaflet-server-side-marker-cluster.git, run it with node runtime

node import.js` 

Make sure before running it has changed the database configuration.

Creating API

We will use the NodeJS runtime and typescript to create the http API . you can clone from the repository we created

git clone https://github.com/alfiankan/leaflet-server-side-marker-cluster.git

Here is the explanation:

Location Repository

This class have a method for querying Postgres with the Postgis extension, namely envelope which is used to select location points in a predetermined bounding box.

Clustering Use Case

This class contains use cases for clustering with the supercluster library uses kdbush (kd tree) for its clustering technique. In this process, the transformation to DTO GeoJSON is also carried out.

HTTP Handler

For handling requests we use expressjs with endpoints /points with query params bounding box and zoom that we get from the leaflet

http://localhost:3000/points?west=-74.37604665756227&south=40.74525741379786&east=-74.34858083724977&north=40.75104456155781&zoom=16

Map Client

To display the map and marker cluster we use leaflets, in this experiment we only create an html file with a js script in it for handling the leafelt, there is an algorithm to adjust the marker cluster icon and display the total point location of a cluster. The following is a piece of javascript from index.html

Testing

The live demo can be accessed through the https://playground.petaku.xyz/clustering playground demo clustering platform, Development of Petaku GIS Platform, No Code Easy to use GIS Platform.

Wrap it up

That’s one of many approach to handle millions of data point locations on leaflets with server side clustering techniques, of course there will be many things that can be improved so we keep exploring. thank you…

--

--

Alfiankan

Lifetime learner | Backend Engineer at Soul Parking