For many regions of the world, there are no authoritative and openly accessible place names. Solving this problem would help all development and humanitarian work. And solving this problem might be possible with the right mix of people and skills. I’m considering to dedicate my work time to this. The present post explains the idea, with the aim of getting as much feedback as possible.
Problem: wait, no place names?
Approximately two billion people live in places that are not labeled on any map, like this village in the Democratic Republic of Congo. Google Maps shows “Walikale” as the address, but that’s a territory of almost 25,000 square kilometers, or the name of a town about 100 km away. Other maps and gazetteers are no better: OpenStreetMap, Who’s on First
The lack of place names has many negative effects, like slower development, difficult humanitarian work, or worse governance. Consider the example of bednet distributions against malaria. This is familiar to me, because I have worked at the Against Malaria Foundation for the last 2.5 years. Bednets are an essential tool to fight malaria, and so countries like the Democratic Republic of Congo distribute them to all households every three years. Without reliable lists of location names:
- Planning the distribution is hard: how to communicate where nets should go, and how many are needed in each place?
- People miss out on nets: entire villages might be left out if the distribution agents aren’t aware of them.
- Monitoring is hard: Typically, AMF selects a subset of locations randomly and sends independent monitoring teams there to verify the distribution. This only works if there are place names.
- Inconsistencies make everything difficult: For example, we want independent monitoring teams, but we can’t use their data if location names don’t match those used by the distribution teams.
- New technologies cannot be used: Distributions would benefit from better digital data collection tools, dashboards, satellite-based population estimates, etc. The adoption of these tools is often difficult because they require place names and boundaries.
An important consequence is that distributions cost more. Funders regularly buy +5% or +10% extra nets to ensure that there are no stockouts despite low-quality planning data. In addition, there is an engineering cost of dealing with missing and inconsistent data. When I was at AMF, a significant part of my work time was spent on these problems.
Bednet distributions are an area that I know well, but the example generalizes. Most non-profits, NGOs, and governments working in least developed countries would benefit from better place names. Better data would improve routine healthcare, make it easier to organize fair elections, and much more. Yet, such is nobody’s core responsibility, and thus nobody has solved the problem yet.
The goal
I want to build the world’s best dataset of place names, boundaries, and metadata (such as population estimates), with a focus on least developed countries.
The current state of the art looks roughly like the graphic below, which comes from the excellent overview State of the Gazetteer in 2023 by Who’s on First. Regions with orange dots have data on local administrative areas (e.g., village or town names/boundaries). They are missing in most of Africa and large parts of Asia. I want to make sure this map becomes fully covered.
The most useful data has places with on the order of 100 households. This is a typical rural village or a small urban neigborhood. At this level of detail, the data is fine-grained enough for detailled planning. And while practicioners can always use higher levels of the location hierarchy or merge places, the opposite is more difficult.
To make the data fully useful, we also want boundary polygons. This allows us to link places to other data sources, e.g., GPS coordinates or satellite imagery.
Finally, we want estimates of the coverage and quality of the data. We want to know what percentage of settlements and people are covered, and get a list of the unknown settlements. For metadata like population numbers, it would be great to have confidence intervals or some other measurement of accuracy. All data should also come with information on provenance and recency.
Candidate solutions
Solutions to this problem of missing place names involve identifying good data sources, cleaning and aggregating the data, publishing the result in all relevant places to ensure its use, and finally maintaining the data so it stays up-to-date.
A few candidate data sources:
- Governments, ministries of health.
- NGOs like the Against Malaria Foundation, but also local actors such as SANRU in DRC. These could provide not only place names, but also GPS coordinates of households or population estimates.
- The UN Office for Coordination of Humanitarian affairs (see for example their data sets at humdata.org).
- Academic initiatives such as the Columbia Population Research Center or GRID3
- Initiatives by for-profit organizations like Facebook.
- New satellite-based data sets like World Settlement Footprint and Sentinel-2 Land Use.
The people at the Who’s on First gazetteer have recently completed gathering and importing place data for India. I highly recommend their write-up of the Karmashapes project. Their approach and techniques might translate to other parts of the world.
Potential challenges
Building the world’s best place data set is not without challenges. Below is an initial bullet list. This is an area where feedback and ideas would be particularly welcome.
- There are many existing initiatives addressing parts of the problem. We need to better understand why they have not yet produced the data that we’d like to see.
- Authoritiative data might not exist. People have disputes around boundaries. Some places (e.g., South Sudan) might now know or want the concept of “village name”.
- There is a risk of colonialistic thinking, where we (the rich first-worlders) believe to know what less developed countries need. Also, the data is potentially dual-use: it can serve dictators as well as humanitarian organizations.
- The data set needs to be maintained to stay relevant.
- The data set should be published under an open license to maximize usage.
- It’s unclear how to make a living when working on this project.
Conclusion
With this early project write-up, I am primarily looking for feedback. Any thoughts are welcome: Is this a good idea? Why or why not? Is this an important problem to solve? How could it be done? Whom should I talk to? What is missing?
Don’t hesitate to reach out to me, Jonas Wagner jonas@purpureus.net.