New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
How do you santize the data you put into commercial LLMs?
CloudHopper
Member
in General
As the title suggests, when you copy logs, errors, broken code and whatever else into a commercial clanker, how do you anonymize your personally identifiable information first?

Comments
Uh
sed?I would take bets that nobody really does that 🫣
wait what
So people did that? I just throw raw into it
I usually use Notepad or Notepad++ and use search & replace to bulk remove/replace data. For domain I usually use example.com, for IPs I usually use ipv4 or ipv6.
Yeah fat chance people would trade in any of that sweet, sweet efficiency gain for something boring like privacy or security. Not like the whole concept is likely to cross their minds at all but even if it did: No way.
I'm sure someone (maybe you) could vibe-slop something together for this quickly. If that's too much work, I imagine you could just feed it through some lightweight local model that strips this info out first, then copy-paste that into the prompt of your commercial chatbot of choice
Actually I do the same for now. Would love to know if there are better solutions. That said, I don't rely on LLMs for really sensitive things anyway.
I think there are now companies specializing in not letting users put sensitive company data into clanker. I saw one ad on YouTube, can't remember the name of the product.
I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.
I've seen more and more things like https://x.com/MaziyarPanahi/status/2073383825669849118 show lately on my twitter feed (yes I know it is renamed X, cope).
I imagine that would be the perfect utilisation, using local models to pre-clean your output to commercial LLMs.
So you guys don’t say YOLO and send all the logs and code as is?