Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
25% Recurring Discount on NVMe VPS
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How do you santize the data you put into commercial LLMs?

As the title suggests, when you copy logs, errors, broken code and whatever else into a commercial clanker, how do you anonymize your personally identifiable information first?

Comments

  • rpqurpqu Member

    Uh sed?

  • AlyxAlyx Member, Host Rep

    I would take bets that nobody really does that 🫣

  • zedzed Member

    wait what

  • nghialelenghialele Member

    So people did that? I just throw raw into it

  • I usually use Notepad or Notepad++ and use search & replace to bulk remove/replace data. For domain I usually use example.com, for IPs I usually use ipv4 or ipv6.

  • edited 5:16PM

    @Alyx said:
    I would take bets that nobody really does that 🫣

    Yeah fat chance people would trade in any of that sweet, sweet efficiency gain for something boring like privacy or security. Not like the whole concept is likely to cross their minds at all but even if it did: No way.

    Thanked by 2TrikeLike Alyx
  • TrikeLikeTrikeLike Member

    I'm sure someone (maybe you) could vibe-slop something together for this quickly. If that's too much work, I imagine you could just feed it through some lightweight local model that strips this info out first, then copy-paste that into the prompt of your commercial chatbot of choice

  • olokeoloke Member, Host Rep

    @JohnFilch123 said:
    I usually use Notepad or Notepad++ and use search & replace to bulk remove/replace data. For domain I usually use example.com, for IPs I usually use ipv4 or ipv6.

    Actually I do the same for now. Would love to know if there are better solutions. That said, I don't rely on LLMs for really sensitive things anyway.

    I think there are now companies specializing in not letting users put sensitive company data into clanker. I saw one ad on YouTube, can't remember the name of the product.

    Thanked by 1CloudHopper
  • @oloke said: Would love to know if there are better solutions

    I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.

    Thanked by 1oloke
  • PolyAnthiPolyAnthi Member

    I've seen more and more things like https://x.com/MaziyarPanahi/status/2073383825669849118 show lately on my twitter feed (yes I know it is renamed X, cope).

    I imagine that would be the perfect utilisation, using local models to pre-clean your output to commercial LLMs.

    Thanked by 3kait oloke CloudHopper
  • VoidVoid Member

    So you guys don’t say YOLO and send all the logs and code as is?

    Thanked by 1Alyx
  • @JohnFilch123 said:

    @oloke said: Would love to know if there are better solutions

    I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.

    I'm also using Notepad++ and actively substituting things to keep the structure, but it sucks. I'm considering setting up an 8b local LLM for it, but I was hoping for a cheaper solution.

    @PolyAnthi said:
    I've seen more and more things like https://x.com/MaziyarPanahi/status/2073383825669849118 show lately on my twitter feed

    That looks like the kind of thing I need, but I'd also need a workflow for it too. I can use Open-WebUI for a local LLM, but my commercial clanker subscription only works via their own UI so I guess I'd have to workout a solution that injects/extracts prompts/responses from a browser and that's an increasing headache. But still better that than voluntarily giving critical PII to the world's biggest data vampires.

    Thanked by 1oloke
  • PolyAnthiPolyAnthi Member

    @CloudHopper said:

    @JohnFilch123 said:

    @oloke said: Would love to know if there are better solutions

    I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.

    I'm also using Notepad++ and actively substituting things to keep the structure, but it sucks. I'm considering setting up an 8b local LLM for it, but I was hoping for a cheaper solution.

    @PolyAnthi said:
    I've seen more and more things like https://x.com/MaziyarPanahi/status/2073383825669849118 show lately on my twitter feed

    That looks like the kind of thing I need, but I'd also need a workflow for it too. I can use Open-WebUI for a local LLM, but my commercial clanker subscription only works via their own UI so I guess I'd have to workout a solution that injects/extracts prompts/responses from a browser and that's an increasing headache. But still better that than voluntarily giving critical PII to the world's biggest data vampires.

    May be worth looking at making a clanker make an extension to redact data from said clanker.

  • rpqurpqu Member
    edited 6:24PM

    @JohnFilch123 said:
    I usually use Notepad or Notepad++ and use search & replace to bulk remove/replace data. For domain I usually use example.com, for IPs I usually use ipv4 or ipv6.

    Just replace them with words
    cat /etc/host\s

    ...
    163.172.x.x mime scwpardc1
    51.15.x.x swamp scwamsdc2
    193.149.x.x crank fumo
    45.128.22x.x sing gcsg1
    69.161.22x.x sang gcsg2
    # space for gcsg3
    2a0d:8142:x:x:: stuart RO32
    173.231.3x.x futa fat32utah
    ...
    
  • @oloke said: Would love to know if there are better solutions

    I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.

    @rpqu said: Just replace them with words

    If only I understand what you mean....

  • JohnFilch123JohnFilch123 Member
    edited 6:28PM

    @CloudHopper said: cheaper solution

    I can vibe code one :lol: some kind of pastebin service that will strip personal/private data but I doubt anybody will use it much since people have got a strong negative sentiment towards vibe coders.

  • rpqurpqu Member

    @JohnFilch123 said:

    @rpqu said: Just replace them with words

    If only I understand what you mean....

    Let's say nginx log -> sed your IP with etchosts -> run a script that temporarily assign common-name as replacement to random IPs from a dictionary -> saves token

  • rpqurpqu Member
    edited 6:40PM

    @JohnFilch123 said:

    @CloudHopper said: cheaper solution

    I can vibe code one :lol: some kind of pastebin service that will strip personal/private data but I doubt anybody will use it much since people have got a strong negative sentiment towards vibe coders.

    Vibebin.com regged
    Clankbin.com
    Crankbin.com

  • @rpqu said:

    @JohnFilch123 said:

    @rpqu said: Just replace them with words

    If only I understand what you mean....

    Let's say nginx log -> sed your IP with etchosts -> run a script that temporarily assign common-name as replacement to random IPs from a dictionary -> saves token

    Ah got it now. I would prefer to have a web app to do it automatically, using scripts it kinda tiring for me. However, ya, this is an option.

  • @rpqu said: Vibebin.com

    Nah, too expensive. I have got a few short domains, maybe reuse them instead.

    Thanked by 1rpqu
  • networknetwork Member

    You guys send your logs to AI manually? Just give claude code permission to run ssh * and give it the hostname.

  • networknetwork Member

    @network said:
    You guys send your logs to AI manually? Just give claude code permission to run ssh * and give it the hostname.

    Just make sure you have ssh agent running locally and passwordless sudo on the server.

  • rpqurpqu Member

    @network said:
    You guys send your logs to AI manually? Just give claude code permission to run ssh * and give it the hostname.

    ssh, git, rsync, docker, scp, source, nohup, sudo, rm

Sign In or Register to comment.