New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
You do understand how inodes work, right?
Yes, but since you must be an expert, please explain the difference between running a public file/image hosting site, and me storing millions of my own tiny files?
Edit: In TeRmS oF iNoDe UsE
Dumbed down for noobs: Each file will use at least one inode, even if it is only 1 byte. You'll still use the smallest cluster size unless you're using a magical filesystem like Reiser3 that sticks them in the tail of other open clusters, which only works if you kill your wife.
EDIT2:
You didn't explain "the difference between running a public file/image hosting site, and me storing millions of my own tiny files?"
I am pretty sure it's a completely different difference than the difference between two different differences.
You aren't serving your millions of tiny little files to a ton of assholes on public forums on a 512MB HDD service, but you're still an asshole. HTH.
What? Why would there be a difference in this case? Shared hosting and VPS hosting are both shared by many users. If anything, the VPS is worst because it's not just running an optimized Web stack and would do different things.
Did you mean to say bare metal with a single user, just you? You should Google inodes and read up a bit. My first experience with inode issues was when trying to backup an Ubuntu server years ago. Just fucking kernel bugs for years long issues. Frustrating as fuck. All kinds of available memory, but some other limitations.
Sorry, there's no difference. High inode usage is going to be an issue for Linux, regardless if one person has a million files or 10 people have 100,000.
But to sum things up, lots of small files kill performance for storage and networking. They'd be unfairly sucking up resources vs other users.
You clearly don't know how to read very well so I will stop wasting my time with you.
Yea I understand that, but I was just saying in regards to inodes specifically, there wouldn't be a difference between public file hosting (which is in ToS) and someone storing millions of tiny files on there for personal use
a difference in resource usage is understandable
Because the million tiny files for personal use case is small to nonexistent.
touch {0000001..1000000}.file
Now what?
.> @eol said:
EDIT2:
Also, add 3 zeros ... see: One billion files on Linux - https://lwn.net/Articles/400629/
and (maybe) consider using XFS
Good read.
Yeah.
Would be interesting to know how NTFS would deal with it.
EDIT2:
fsck seems to be problematic on low mem machines though.
You just need to run
npm install
on a big library and you will get 100k files easilyyeah ... I guess salient point with regard to loading up VPS and thrashing i/o willy nilly might be taken to heart though.
If a provider sees a correlation between file hosts / torrents and trouble whether technical or administrative, then that's a business decision they should make as they see fit. I suspect due diligence to avoid impacting performance of the node is often in short supply, so makes sense to head that off at the pass ... if they don't want no hassle in their castle.
That said, my "personal use" use case for myriad small files includes a neuroscience research paper corpus for machine learning projects with over 500,000 individual pdfs, all on an ext4 filesystem on a storage VPS. I'm of the opinion that files just sitting there is not (necessarily) going to cause any problems - but any serious work gets done on a dedi.
What prevents you from compressing them since you are not going to need all of them at the same time?
laziness, and impatience
EDIT2:
Seriously - I haven't given it much thought - but I want to keep the workflow as simple as possible, with minimal processing requirements on the VPS side - while still allowing straightforward access to an individual file on the VPS via browser ... in case I actually want to read it or something.
EDIT3:
Hard drive space is cheap, time is "precious" ... because instant gratification delayed, is instant gratification denied.
EDIT4:
I usually do make a tarball when transferring files for backup etc - since I've found that transferring many small files ie via rsync can for whatever reason trigger the "packets per second" alarm on some IDS monitors. I tend not to bother compressing the tarfiles since I haven't seen great compression ratios for these files - and compressing a 100 GB tarfile can take a while.
EDIT5:
Just realized actually maybe the best reason I don't try to compress the pdf archive - and why I might see something like 15% reduction from compressing a large tarfile of pdfs (just tested with zstd as well as good old bzip2) ... most of the pdfs I have are already intrinsically compressed internally. D'oh!
EDIT6:
If you're interested in using machine learning to analyze pdfs about machine learning, check out http://arxiv-sanity.com/ - that'll get ya started
Now what, what?
Now what, what, what?
EDIT2: what now ...?
exit 0.
https://youtu.be/fbGkxcY7YFU
We are now taking PRE-ORDERS for Los Angeles, with 30% off your first term. Please note this is for space on our new servers that are coming soon. Depending on many factors, it make take several weeks to receive your service, but we are planning on having some servers by next week.
The Special 512 is also now back in stock.
Dallas is now available.
Pre-orders for Los Angeles should be filled by next week, if there are not any further delays.
Today only, make a big commitment before we go out of business.* Our first-ever 3 year deadpool special.
All special offers have been extended to 3 year terms (for one day.) Only pay 2 year price for 3 year term on an already-low price! Also, we're taking further Los Angeles backorders. These are estimated to be fulfilled this week.
*No refunds. No guarantees that we will go out of business, sorry.