Fixing IP Geolocation: An ISP Guide

November 3, 2022

DISCLAIMER: I am currently working for Google. This post is published in my personal capacity, without using any knowledge I may have obtained from my employment with them. All the information provided here is coming from purely personal experience in dealing with this issue.


There are probably infinitely many reasons why one might want to know the approximate location of a device based on its IP address. However, this information is not part of any Internet protocol, therefore third party services are trying to determine, with the highest accuracy possible, where each address probably is. There are plenty of free and commercial offerings that can provide this to operators.

You will always run into incorrect data problems however. First of all, there’s no one true way to obtain this information. Everyone is using a mix of signals with various degrees of success. You can ping an address from various locations and triangulate, which doesn’t work well, you can look at whois records and hope for the best, you can monitor which country or language website visitors from these addresses pick on a website, collect GPS information from a cooperating (mobile) application, look at the ccTLD of reverse DNS, etc.

But it’s not always easy as, especially on IPv4, multiple devices can use one address. What about addresses rarely used? What about CGNAT, with up to hundreds of ISP subscribers behind a single address? What about a corporate VPN?

This problem (of bad data) is then amplified by decisions made with that data. What if a website only allows only IP addresses from a single country to visit it? What if it uses the information for a fraud check (e.g. on a payment)? What if it switches to a language you don’t speak? Or it limits its functionality as it’s only available in specific countries.

ISPs are typically on the other end of the support phone call or e-mail, receiving a report and then having to fix it. But how do you fix this problem as a network? How do you ensure it’s not coming back? I’ll try to provide some best practices for networks in this post, that can be used to address that problem.

Databases

The current state of IP geolocation is complicated. There are many databases that offer this service, and some provide a file to download e.g. every week, while others provide an HTTP API where they charge for all the queries. Finally, some other companies may have private / internal databases that are only accessible by them. All these don’t necessarily agree, have different update frequencies, different correction submission procedures, and there’s little standardization.

Some are more popular than others, so one could focus on the head, and not the long tail, but it’s still some work that has to be done.

Also, something important to keep in mind is that websites may not update their databases often. They may use one that’s several months or years old. Just because a published version exists, it does not guarantee adoption. They may also update it much less frequently than its release cycle.

Monitoring

We’ll first begin with figuring out there’s an issue. A lot of networks currently will just learn there’s a problem when they receive a user complaint. That’s not very good. Depending on how much these issues affect an ISP, perhaps they’d like to invest in proactive monitoring of key IP address blocks, to get ahead of their users. It’s not the highest priority, I know..

The easiest way is to collect some free databases, frequently download their latest version, depending on their release schedule, and check inside for their IP ranges. If there’s an incorrect country (or state, etc. – it depends on how much you care), it can raise a (low priority) issue to investigate and correct.

For API-only databases that typically charge per query, there’s little that can be done, but some offer a free quota. These are typically having a /49 and /25 level of accuracy, and a weekly update rate. Usually the free quota is enough if the requests are spread over a week. For important addresses, e.g. proxies or VPNs, specific queries could be configured.

For proprietary databases that belong to certain companies, it’s unfortunately not easy to arrive in an automated solution, especially without the ability to send HTTP requests from the addresses themselves.

I’m not aware of any OSS software that provides this ability, and I had to create one for my own use. Perhaps it would make sense to release it as open source some time.

Submitting Correction Reports

If there’s a problem, actions need to be taken to inform the databases of what’s actually happening. I like to use these steps here proactively too: if there’s a change in an address block’s geolocation, or a new allocation is made, I’ve included in my procedures the submission of a correction report (if required) for the IP block.

Reports can be sent by anyone, and there are a lot of users doing it for malicious reasons, so companies tend to be careful when receiving them.

MaxMind

For MaxMind, a very popular database that’s also available in many Linux distributions, the form is available here. I have submitted this many times, and my requests were always accepted. They’ve been added to the database the next Tuesday after the approval e-mail I received. The form has to be used for every IP block, that can be of any size.

What I found is that their mechanisms to auto-detect the location typically work on a /49 & /25 level, so a mobile device whose GPS claims it is in Country A can set the all the addresses in that range to Country A. This form can be used to create more specific updates. I’ve also discovered that if you submit an address range as a collection of /50 or /26’s, then it’s very unlikely that its location will change automatically.

I am typically using a noc@ e-mail address when submitting here, on a domain that’s fairly easy to tie to the IP addresses in question (e.g. same as abuse e-mail). Maybe this helps.

Google

The form seems to be this one. I don’t have nearly as much experience with this, so I don’t know of any useful tricks. According to the information in the form, updates may take over a month to fully propagate to all Google products and services.

Other databases

You can use iplocation.net to find many other databases and query them all in a single place for a specific address. You can then find links to all databases in this page, and all have a way to report incorrect data. The procedure is similar to MaxMind above. Here’s another list with some correction URLs that I am not updating or maintaining.

Proactive Information Sharing

Usually it’s best to never have to reach the correction phase. That means constantly and manually fixing issues like a game of Whac-A-Mole. It’s time consuming and for large ISPs it just doesn’t make much sense. Especially on some networks, it’s very easy for a (number of) user(s) to do something that will inaccurately classify a block of addresses.

It’s better if you help geolocation databases get it right in the first place, and this can be achieved with authoritative information that’s coming clearly from the network operator and cannot be easily spoofed. As there are networks that lie, not all of these work everywhere, but they can significantly reduce the mistakes and the changes made due to the passive or active information the databases collect.

inet{6,}num

The first place to start is the whois objects created by networks for their addresses. For the RIPE region these are the inet6num and inetnum objects, but other RIRs have equivalent types. Databases try to download frequent dumps of these objects and use them in their calculations.

For some whois services that don’t publish dumps in e.g. an FTP or web server, geolocation services have to query this information, so updates can be slower. Also, since whois is humanly readable according to its specification, it’s not always easy to parse. Thankfully, RIR databases are usually and mostly well structured.

What I am providing below works for IP addresses receiving these services from RIPE, which is likely the best. Expect less accuracy or features with other RIRs.

Here’s one of my objects:

$ whois -r 2a0d:3dc0:200::1
inet6num:       2a0d:3dc0:200::/64
netname:        DNET-ZU-S-6
descr:          DaKnObNET - Zurich - Secure LAN
country:        CH
geoloc:         47.363745 8.531644
language:       EL
admin-c:        AAC138-RIPE
tech-c:         AAC138-RIPE
status:         ASSIGNED
mnt-by:         gr-daknob-1-mnt
created:        2018-12-09T15:46:17Z
last-modified:  2020-03-15T14:38:18Z
source:         RIPE

The most common field that should be set is the country. This is being used in most services’ algorithms as an input in the final calculation, from what I can tell. It’s also one that’s fairly standardized across most whois databases.

I am also setting geoloc, which for RIPE includes the coordinates (to any accuracy) of the physical location of this IP address block. This is very fine grained information, impractical, and does not obviously scale. It can, however, be set to the country level too. Just make sure you pick the center of a lake, to avoid sending the FBI to a farm in Kansas. From private conversations with geolocation databases, this is very rarely used.

Finally, I set the language, which is the expected language that people behind this address may speak. As location does not equal language, this can be useful. However, most websites just rely on the location to set the language unfortunately. You can see here that this is a Swiss IP Range with Greek-speaking users. The countries don’t have to match. You may also have e.g. DE in some parts of CH, and FR, IT, etc. in others if you want. Anecdotal evidence suggests minor use of this, but higher than geoloc.

A lot of databases will check these objects before accepting correction requests submitted manually, to make sure they are legitimate.

It’s common for ISPs to have nested objects. They typically have one for their allocation, e.g. the /29 from RIPE, and then have a hierarchy of allocations or assignments within that. Some databases will include all entries (and you have to use the most specific when querying) while others will only include one entry per address. This behavior is not standardized. It’s best if you can set at least the country field for all objects.

GeoFeeds

Geolocation Feeds, standardized in RFC8805, are CSV files that are published by ISPs that include the geolocation of their IP addresses. This is authoritative information, and generally trusted much more than whois objects, which is hopefully kept up to date.

You can find an example file here, but it looks something like this:

# Prefix,A2C,Region,City,Postal Code
2a0d:3dc0::/48,EU,,,
2a0d:3dc0:100::/48,GR,GR-B,Veria,59100
193.5.16.0/26,EU,,,
193.5.16.80/29,GR,GR-B,Veria,59100
193.5.16.88/29,CH,CH-ZH,Zurich,8002
147.189.216.0/21,,,,

The file can have comments, starting with #, and data lines that contain the prefix, IPv6 or IPv4, the ISO 3166-1 alpha 2 of the administrative region this address is in (which is typically the country), the region / state within the country, with its ISO 3166-2 code (you can find this e.g. here), and the city and postal code.

All fields except the prefix are optional (and can be missing with the way shown above), and the postal code SHOULD NOT be used for new feeds due to privacy concerns.

Due to some MAYs in the RFC, there are some edge cases here. For example, an anycast prefix has no real country, and one could use UN for that. For other blocks that span across Europe for example, one may want to use EU. The problem is that for some implementations, these are not valid country codes, and they can do anything from ignoring these lines to rejecting the processing of the entire file and discarding it. I haven’t found a published list of acceptable codes for every service, so it’s best to try and be conservative here. You don’t want the file to be discarded entirely because someone doesn’t support EU as a valid entry in the second field.

If a prefix is not used, it can have empty information (as the last line above), and this is interpreted differently by all the geolocation providers.

This CSV file must be served over an HTTPS endpoint using UTF-8 encoding. For more dynamic setups, this file can be generated via e.g. PHP from a database (NetBox?) and ideally cached. It’s better to automate this than have to update it manually.

The problem that hinders the GeoFeed adoption is that there’s no formal mechanism to detect the URL under which it is served. Several mechanisms have developed to make this easy to find:

inet{6,}num

Some geolocation providers try to find a valid GeoFeed URL in whois objects. RIPE has a dedicated field, geofeed:

$ whois -r 2a0d:3dc0::/29
inet6num:       2a0d:3dc0::/29
netname:        GR-DAKNOB-20180828
descr:          Antonios A. Chariton
country:        CH
geofeed:        https://geo.daknob.net/as210312.csv
org:            ORG-AAC10-RIPE
admin-c:        AAC138-RIPE
tech-c:         AAC138-RIPE
status:         ALLOCATED-BY-RIR
mnt-by:         RIPE-NCC-HM-MNT
mnt-by:         gr-daknob-1-mnt
mnt-lower:      gr-daknob-1-mnt
mnt-routes:     gr-daknob-1-mnt
created:        2018-08-28T09:46:32Z
last-modified:  2022-03-27T16:35:12Z
source:         RIPE

It seems to be respected by many databases, and you will start to receive a lot of traffic to this endpoint from various user agents, some identifying a database explicitly, while others being generic (e.g. Go, Python, NodeJS).

For other RIRs or non-RIR services, it has been reported that you can add the URL in a remarks (or equivalent) field and maybe the “Geofeed” word in front, and hope for the best, like this:

remarks:        Geofeed https://geo.daknob.net/as210312.csv

By doing some tests, for RIPE, I haven’t observed any additional crawlers with the remarks, so probably geofeed is enough.

This can be added in the top-level object only, and there’s no need to include it in every additional object. Typically databases will treat it as authoritative information for all addresses represented by this object.

MaxMind

To the best of my knowledge, and until recently at least, MaxMind does not make use of the geofeed field from whois. They have a section in their docs that asks you to e-mail them and request them to add your GeoFeed as a source. You’ll need to e-mail them every time you add or remove an IP range that this GeoFeed provides authoritative information for (but not for every file edit), e.g. if you acquire a new IPv6 block or you sell your old IPv4 block.

I have requested inclusion once, unsuccessfully, but it worked the second time, a few months apart. It probably helped that I included the geofeed object in my inet{6,}num objects so they could verify the legitimacy of the request.

Upon approval, you should expect them to start respecting it starting next Tuesday, in their next release.

I’ve asked them if there’s a problem with EU or UN, they said they’ll get back to me if there is something, but I haven’t heard back. It probably works.

Google

For my network, AS210312, I have access to Google’s ISP Portal. You can find out more here. If you meet their requirements, you can probably get access as well. Perhaps there’s some way to publish GeoFeeds to Google without it, but I haven’t looked it up.

Through this portal, you can submit the URL of your GeoFeed for your ASN. Within a few days, it will be crawled, and then used. All invalid lines will be ignored, and you’ll be informed about the errors that occurred, e.g.:

The provided geocode identifier was not recognized. Please check for typos.
Problem value(s): EU,,,
Problem line: 2a0d:3dc0::/48,EU,,,

The rest of the file will be used normally, and it’s best to fix these errors.

In conclusion

Hopefully the information above was helpful and will lead to correcting current problems, preventing them from happening in the future (at least as often as they do now), and allow for early detection.

None of this behavior is standardized (or published), and as databases compete for accuracy we’ll see more changes in their classification algorithms. This guide may need updates to keep up with the new requirements. There’s also a lot of trial and error, so I’ll try to incorporate new knowledge here when I manage to confirm it. Most of the actions databases take that’s presented here is empirical or anecdotal and not official, and may change.