New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
How do you santize the data you put into commercial LLMs?
CloudHopper
Member
in General
As the title suggests, when you copy logs, errors, broken code and whatever else into a commercial clanker, how do you anonymize your personally identifiable information first?

Comments
Uh
sed?I would take bets that nobody really does that 🫣
wait what
So people did that? I just throw raw into it
I usually use Notepad or Notepad++ and use search & replace to bulk remove/replace data. For domain I usually use example.com, for IPs I usually use ipv4 or ipv6.
Yeah fat chance people would trade in any of that sweet, sweet efficiency gain for something boring like privacy or security. Not like the whole concept is likely to cross their minds at all but even if it did: No way.
I'm sure someone (maybe you) could vibe-slop something together for this quickly. If that's too much work, I imagine you could just feed it through some lightweight local model that strips this info out first, then copy-paste that into the prompt of your commercial chatbot of choice
Actually I do the same for now. Would love to know if there are better solutions. That said, I don't rely on LLMs for really sensitive things anyway.
I think there are now companies specializing in not letting users put sensitive company data into clanker. I saw one ad on YouTube, can't remember the name of the product.
I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.
I've seen more and more things like https://x.com/MaziyarPanahi/status/2073383825669849118 show lately on my twitter feed (yes I know it is renamed X, cope).
I imagine that would be the perfect utilisation, using local models to pre-clean your output to commercial LLMs.
So you guys don’t say YOLO and send all the logs and code as is?
I'm also using Notepad++ and actively substituting things to keep the structure, but it sucks. I'm considering setting up an 8b local LLM for it, but I was hoping for a cheaper solution.
That looks like the kind of thing I need, but I'd also need a workflow for it too. I can use Open-WebUI for a local LLM, but my commercial clanker subscription only works via their own UI so I guess I'd have to workout a solution that injects/extracts prompts/responses from a browser and that's an increasing headache. But still better that than voluntarily giving critical PII to the world's biggest data vampires.
May be worth looking at making a clanker make an extension to redact data from said clanker.
Just replace them with words
I guess for really bulk logs etc you can run it through you local LLM which will sanitize it. Or can probably vibe code a web app in JS or something.
If only I understand what you mean....
I can vibe code one
some kind of pastebin service that will strip personal/private data but I doubt anybody will use it much since people have got a strong negative sentiment towards vibe coders.
Let's say nginx log -> sed your IP with etchosts -> run a script that temporarily assign common-name as replacement to random IPs from a dictionary -> saves token
Vibebin.com regged
Clankbin.com
Crankbin.com
Ah got it now. I would prefer to have a web app to do it automatically, using scripts it kinda tiring for me. However, ya, this is an option.
Nah, too expensive. I have got a few short domains, maybe reuse them instead.
You guys send your logs to AI manually? Just give claude code permission to run
ssh *and give it the hostname.Just make sure you have ssh agent running locally and passwordless sudo on the server.
ssh, git, rsync, docker, scp, source, nohup, sudo, rmYou give Claude unfettered access to your digital life without a condom? 😲
0 fck given. Paste everything with passwords, tokens. All .cfg. .env goes directly into prompt.
Ask another commercial LLM to sanitize it for you
Seriously though, I would only not send passwords. The rest, nobody cares.
Sanitize: do not make mistake 😭
"Hey ChatGPT, can you sanitize this data for me so I can ask Claude who will consult with Grok and I'll bring it back to you to unsanitize it so I can review it?"
I am using LiteLLM with guardrails and some PII that for example replace IP addresses or domains. I am also trying to route to local LLM's when possible.
i remember openai released some sort of PII sanitization open model thats tiny, but i have no idea how well it works etc
https://github.com/openai/privacy-filter