Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


yammdb - just another .mmdb - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

yammdb - just another .mmdb

2»

Comments

  • NeoonNeoon Community Contributor, Veteran
    edited May 2023

    @kait said:
    icmp flood the world :) Interesting project hope it goes well, will you do a comparison against maxmind and other ip2location services?

    I already wrote benchmark scripts, to keep track on the builds.
    Basically I grab a routing table dump and throw ip's against selected .mmdb's including this one.

    Right now the .mmdb has a 74% hit rate, if you take any IP from the internet and look it up.
    For the rest, I just don't have any data.

    However, no idea about the accuracy, I would need a 3rd party source, that is nearly 100% certain where the IP's originate from, to benchmark that.

    Which I haven't found yet.

  • jimaekjimaek Member

    Which I haven't found yet.

    If you do please let me know cause I have the same problem https://github.com/jsdelivr/globalping/issues/264. No reliable dataset to benchmark against.

    And yes please don't try to ping the whole internet using Globalping :smiley:

  • NeoonNeoon Community Contributor, Veteran

    @jimaek said:

    Which I haven't found yet.

    If you do please let me know cause I have the same problem https://github.com/jsdelivr/globalping/issues/264. No reliable dataset to benchmark against.

    And yes please don't try to ping the whole internet using Globalping :smiley:

    Well, to be fair, correcting a city, within a country, is going to be very hard.
    However, the biggest problem I think rather is, continents.

    There are a lot of entries, that just give you the entire wrong continent.
    In extreme cases, you can ask 3 different databases and they return you different values.

    Then I would go down to countries, cities etc...
    For example things my tool found:

    Corrected AS to EU (223.120.188.0, 8.0)
    https://db-ip.com/223.120.188.0
    https://ping.sx/ping?t=223.120.188.1

    No, its not china.

  • NeoonNeoon Community Contributor, Veteran

    Thanks to some people that followed my github repo, they actually gave me the idea, to make an mtr only geo database.
    I did code it in less than 24 hours, however, the hit rates where to low and my brain did not manage to figure out yet where the fuck up was.

    However, today I found the mapping error.
    db/mtr.mmdb {'fail': 126502, 'success': 849072, 'percentage': 87.03306976200678}

    From 64% to 74% now 87% hitrate, not bad.
    I put the .mmdb as usual on https://yammdb.serv.app/mtr.mmdb

    This database is only 4.2MB in size, only contains geo coordinates, right now.
    I will add the usual info in a later build, such as country, continent etc.

    Plus I will add a combined build later, with geo.mmdb and mtr.mmdb which first uses latency, then mtr for better accuracy.

    Thanked by 1BasToTheMax
  • NeoonNeoon Community Contributor, Veteran

    As per request, the files are also now available as csv.
    https://yammdb.serv.app/geo.csv
    https://yammdb.serv.app/mtr.csv

    Thanked by 2BasToTheMax tuc
  • mrTommrTom Member

    any future updates planned? :)

  • NeoonNeoon Community Contributor, Veteran
    edited June 2023

    @mrTom said:
    any future updates planned? :)

    The buildserver broke, I did upload the wrong config file, hence no update.
    I fixed it, the next fresh build should be up in roughly 22 hours.

    However it does not include Paris and Zurich, will run another one for these.

    Thanked by 1mrTom
  • fartfart Member

    It seems like entire continents are set to the same country and location, so is this correctly set up or am I misunderstanding something?

  • NeoonNeoon Community Contributor, Veteran

    @fart said:
    It seems like entire continents are set to the same country and location, so is this correctly set up or am I misunderstanding something?

    The Country and the City is set, where the closest server is at.
    Including the latency measured from it.

  • Just a thought but would it not be reasonable to write up a daemon for the probe which can be publicly hosted by anyone? this way you could really turn this into a community effort and in turn this could result in more valuable data overall

  • NeoonNeoon Community Contributor, Veteran

    @pinguin said:
    Just a thought but would it not be reasonable to write up a daemon for the probe which can be publicly hosted by anyone? this way you could really turn this into a community effort and in turn this could result in more valuable data overall

    Possible, however it would have to be written and field tested.
    The current System doesn't support any public probes.
    Donations/Sponsors are welcome though.

    Practically, meh.
    Right now, it uses a preset of probes, not random, means one build is carried out on the same probes.

    These probes are reliable and have a high uptime.
    There is error correction build in, if a probe goes offline during a build, the system can reschedule these tests to a certain point.

    If this would be build and integrated, I still would select probes by hand, means not every probe added would be ever used. Same as before.

    It does not make sense to use every probe/server you have.
    From my point of view, it does not change much, its more work.

  • NeoonNeoon Community Contributor, Veteran

    Since I found differences yesterday especially with Google, the build is done with around 2x more tests.
    Instead of 2.3 Million we do 4.5 Million, the .mmdb file is already compressed, however the .csv files will nearly double in size.

    The build should be done by Sunday, the recent Friday build got stock, due to a bunch of network issues.

    Thanked by 1BasToTheMax
  • NeoonNeoon Community Contributor, Veteran

    There is probably more delay on this.
    I adjusted the speed of the build, which apparently is hitting some limits.

    That caused 2 virtual servers to be "suspended", I can only guess since I have no info yet.
    There is active monitoring on each of the probes, so I can ensure, that CPU, Memory and I/O is within a reasonable limit.

    Usually we never hit more than 30% CPU usage.
    Despite that, they got suspended, on about 15% CPU usage.

    My best guess would be network, I increased the amount of pings in a batch.
    Which hits some kind of limit, that is not mentioned anywhere, shrugs.

    I updated the software, to remove probes if they have been unreachable for minutes.
    This should prevent any further delays.

    This build won't include Warsaw and partially only Singapore, depends if it gets suspended/stopped again.

  • NeoonNeoon Community Contributor, Veteran

    Some info on the build delays.
    Recently I changed the software to do at least one measurement per /24, if the prefix is bigger than a /24 e.g /20, we slice it into /24s and try to carry out at least one measurement per sliced /24.

    This should increase accuracy and fixes some of the issues where a Subnet was internally routed to different geographically locations.

    However, Google is Google, they like to slice specific Ranges into /26s.
    A good example for this, is the IP range used by google resolvers.

    Which if you use the data right now, points you to America, but the Range is also used in Europe.
    So I had to patch the software to allow individual slicing for ranges.

    The build is currently still running, just started a few hours ago, should be done by Sunday.

    Thanked by 2BasToTheMax fatchan
  • NeoonNeoon Community Contributor, Veteran

    Looks like, I have no chance other than also do IPv6.
    Since I got data on every single IPv4 prefix that has at least one pingable ip.

    I am going to use that data, to crosscheck, if this specific ASN has more than 1 geographically location.
    If it has, its getting ignored for now, same goes for IPv6 only networks.

    Every other network, where all data, that has been weekly measured, points to a single geographically location, I assume, the IPv6 prefixes are originating from that same location.

    No idea how reliable this is but for the beginning, should hopefully be good enough.
    If anyone got ideas on this, please lemme know.

    Scanning would be next on the list.

  • NeoonNeoon Community Contributor, Veteran

    I did decided to discontinue this Project, mainly because I was just curious, I wanted to learn how to build my own full fledged .mmdb based on latency data. However, I don't see myself running this in the long run.

    The raw masscan data will be still available weekly here https://raw.serv.app
    However, I won't build any .mmdb's anymore.
    If you wanna build your own .mmdb the code is available at https://github.com/Ne00n/latency-geolocator-4550

    Thanks to all the Sponsors.

    /thread

    Thanked by 1ralf
This discussion has been closed.