Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

The death of MXRBL

jarjar Patron Provider, Top Host, Veteran

Hey friends,

A long time ago I pushed my RBL here as a good source for you to use on your mail servers: https://mxrbl.com. Today I'm here to announce that if you are using it, you should stop. It will continue to accept queries for a while, I will reach out to people that I know are using it. In my Black Friday offer post I'm going to explain a bit more about what we're doing at MXroute to stop spam and why I believe the latest approaches are vastly superior to any usage of a traditional RBL.

«1

Comments

  • Any rough idea when queries will no longer be accepted?

    Thanked by 1jar
  • but recurring means forever

  • jarjar Patron Provider, Top Host, Veteran

    @hsr said:
    Any rough idea when queries will no longer be accepted?

    Most likely I'll empty the zone, point the three NS to 1 cloud server, and return empty queries for a decade or more.

    Thanked by 2hsr adly
  • quicksilver03quicksilver03 Member, Host Rep

    I've recently implemented my own RBL, and while it seems to work well enough for me I'm looking forward to learn more about your new approach.

  • @jar said: In my Black Friday offer post I'm going to explain a bit more about what we're doing at MXroute to stop spam

    That will be interesting to read. AI based content analysis, hopefully?

    Thanked by 1jar
  • jarjar Patron Provider, Top Host, Veteran

    @eezcloud said:

    @jar said: In my Black Friday offer post I'm going to explain a bit more about what we're doing at MXroute to stop spam

    That will be interesting to read. AI based content analysis, hopefully?

    That would be the dream but I can’t self host a LLM that would perform well enough or reach our scale at a reasonable overhead. It’s a series of efforts that combine into what I’m expecting to see received as one of the best filtering systems around, with content filters being the last line of defense.

  • @jar said: That would be the dream but I can’t self host a LLM that would perform well enough or reach our scale at a reasonable overhead.

    I run rspamd on my mail server, and I'm continually frustrated about endless need to manually adjust it to block the newest wave of easily recognizable refund scam, phishing, or extortion campaigns. I'll be eager to see what you are doing, as I am on the verge of writing a whole new spam filter built around AI (not necessarily LLM) techniques.

  • RBL's can still be quite effective method to cut down the amount of spam but comes with the risks of losing legitimate mail if you use it to reject mail even when using the major/trusted ones, over the years I've seen ip's of major mostly legit senders be inboxes/transactional etc appearing in rbl even the more reputable ones.

    LLM based one would be good to filter the phishing,SEO etc crap you receive from major free inbox where RBL would be useless since listing the IP would cause legitimate mail to get rejected.

    Hopefully in the next few years LLM based ones make sense to deploy and cost effective enough to justify them for smaller providers.

  • @Razza said: you receive from major free inbox

    It is really unfair how much Gmail makes independent email servers go through when they can't (or don't want to) block the firehose of spam that they originate. They don't do anything to prevent gmail accounts from sending the most blatant nigerian prince, refund scams, sextortion scams, etc.

  • jarjar Patron Provider, Top Host, Veteran

    @eezcloud said:

    @Razza said: you receive from major free inbox

    It is really unfair how much Gmail makes independent email servers go through when they can't (or don't want to) block the firehose of spam that they originate. They don't do anything to prevent gmail accounts from sending the most blatant nigerian prince, refund scams, sextortion scams, etc.

    Gmail is extremely frustrating to the rest of us and that perfectly describes it.

  • @jar said: That would be the dream but I can’t self host a LLM that would perform well enough or reach our scale at a reasonable overhead. It’s a series of efforts that combine into what I’m expecting to see received as one of the best filtering systems around, with content filters being the last line of defense.

    Never worked on spam classification but using LLMs for that is a bit of an overkill, isnt it?

    Thanked by 2jar tentor
  • jarjar Patron Provider, Top Host, Veteran
    edited November 2024

    @itsdeadjim said:

    @jar said: That would be the dream but I can’t self host a LLM that would perform well enough or reach our scale at a reasonable overhead. It’s a series of efforts that combine into what I’m expecting to see received as one of the best filtering systems around, with content filters being the last line of defense.

    Never worked on spam classification but using LLMs for that is a bit of an overkill, isnt it?

    I think it would be perfect because it would be better at learning and adapting quickly to the continual changes spammers are making to try to circumvent filters. But I would only feel comfortable with it if it were a locally running model connected via extremely low latency private network to the server. Huge bottleneck potential, and via API huge privacy violation.

    Thanked by 1itsdeadjim
  • itsdeadjimitsdeadjim Member
    edited November 2024

    @jar said: I think it would be perfect because it would be better at learning and adapting quickly to the continual changes spammers are making to try to circumvent filters. But I would only feel comfortable with it if it were a locally running model connected via extremely low latency private network to the server.

    Yeah I see your point, I meant using some custom word embeddings and some light classifiers would be probably much more efficient to train/infer and probably perform better.

    Thanked by 1jar
  • @itsdeadjim said: much more efficient to train/infer and probably perform better.

    Yea, you don't need a full blown multi-billion parameter LLM to effectively use ML techniques.

  • The only real problem in training any kind of a model on real emails is that you need an effective way to remove PII from the dataset.

  • itsdeadjimitsdeadjim Member
    edited November 2024

    @eezcloud said: Yea, you don't need a full blown multi-billion parameter LLM to effectively use ML techniques.

    @jar has a point though, that an LLM has been trained on enough context to understand any language trick used by spammers in future.

    I have a feeling that for such a simple task, lighter classifiers could still have enough power even without transformers. But worst case, also a cut-down pretrained model on 200-300m parameters could be more than enough.

    It has enough language understanding to generalize, and can be very easily fined-tuned to identify spam.

    Running in parallel on a fast engine like vllm can have very low latency.

    As far as I see, there are such minded models on hf like: https://huggingface.co/phishbot/ScamLLM https://huggingface.co/ggrizzly/roBERTa-spam-detection etc

    @eezcloud said: The only real problem in training any kind of a model on real emails is that you need an effective way to remove PII from the dataset.

    Train another model to do that /jk
    IIRC there are many open datasets out there.

    Thanked by 2jar maverick
  • @itsdeadjim said: IIRC there are many open datasets out there.

    Yes, but many of them are 20 years old or so. Spam isn't quite the same as it was back then.

    Thanked by 1itsdeadjim
  • jarjar Patron Provider, Top Host, Veteran
    edited November 2024

    Bayesian learning is still the most popular open source algorithm and I still find it to be completely unable to keep up with current trends, much as I reached the same conclusion in 2013. Even rspamd's fuzzy misses pretty much everything no matter how much you feed it. Spammers are changing their messages up too rapidly for simple approaches. LLMs are the only thing new that has a chance of changing the game on pure content filtering. If it can't even seem like it can think like a human, it doesn't stand a chance in 2024. You can bet the spammers are using LLMs to help them out.

    Thanked by 2itsdeadjim maverick
  • @jar said: Spammers are changing their messages up too rapidly for simple approaches

    Or they just embed it in an image, and fly right past every spam filter out there.

    Thanked by 2jar tentor
  • @eezcloud said:

    @jar said: Spammers are changing their messages up too rapidly for simple approaches

    Or they just embed it in an image, and fly right past every spam filter out there.

    There could/should at least be some OCR done on images, which I'm sure the big guys do in some form, but isn't widely available for self-hosting and smaller providers.

  • @adly said: There could/should at least be some OCR done on images, which I'm sure the big guys do in some form, but isn't widely available for self-hosting and smaller providers.

    I've played around with postfix content filters, and it would be incredible easy to add OCR and PDF text extraction, but without good training data to properly train a classifier, it isn't worth it.

  • A LLM to sort your mailbox could be handy. A better version of what Google does sorting promotions and social posts from transactional email from personal messages.

  • You won't be needing full LLM but a SML (small language models) will be enough. Most modern models are already good enough. Some can be further trained/specialized or adapted to use multi shots. Might even be able to use multiple and different SML.

  • I was actually skeptical about LLMs for spam prediction, but according to JPMorgan they work quite well. I just don't think it'd be cheap enough to justify

    Thanked by 1tentor
  • @blackjack4494 said: You won't be needing full LLM but a SML (small language models) will be enough.

    I tested the precursor to LLMs (bert) and they are too stupid to be useful.

  • @bobert said:

    @blackjack4494 said: You won't be needing full LLM but a SML (small language models) will be enough.

    I tested the precursor to LLMs (bert) and they are too stupid to be useful.

    So you tested a single SLM?
    Luckily there are some academic publications on the matter of SLM for tasks like spam detection.

  • @bobert said: I tested the precursor to LLMs (bert) and they are too stupid to be useful.

    Did you fine-tuned for this task?

  • I know the pain. Hardware costs should come down in a few years which will eventually make local computation viable. Adaptability is a challenge especially in the context of spam because it changes very fast and today, the common approach is offline learning. All the resources you put into making a model becomes stale over time due to the adaptability of spammers. If you're doing R&D, online learning techniques are something to explore. Also, speaking from experience, it's an unsolvable problem because it's always going to be a moving target. The only thing you can do is aim for perfection knowing it can never be perfect.

  • @itsdeadjim said: Did you fine-tuned for this task?

    Yes I specifically trained it on my own data.

  • kevindskevinds Member, LIR

    @eezcloud said:
    Or they just embed it in an image, and fly right past every spam filter out there.

    Emails without text, just an image attached, automatically get a high spam score..

Sign In or Register to comment.