New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Sorting 10TB of mess (10+ years)
DeadlyChemist
Member
in Help
So, I have all my data but slowly finding something is pain in the ass because it's a huge mess
pictures half sorted, pictures unsorted, documents all over, old programs, old backups, backups of backups
im planning to move everything to one spot, is there some software that would just go thru the 10TB and tag/sort/assist with everything as much as possible? i don't wanna spend next 100h going thru old stuff by hand if some AI/script can do most of the work
Comments
To be honest, if you won't spend the next 100 hours looking over the files, these files aren't actually important, and you'll unlikely look at them for the rest of your life.
So the solution is:
FORMAT D:
Poof!
Problem gone.
Dump it all into some cloud storage, then wait for AI to develop in a few years. Don't forget corporations want all that data to develop AI and grab your personal info; they don't care if it's unsorted, so just upload it into their cloud to help them out with your data
As a side note, if you wish to get dirty, you could upload all data into a large server from @hosthatch (I remember they offered huge cheap storage) and sort it there using commands such as "find" to move pictures (jpg, png) into a specific directory, documents (doc, docx) into another directory, and so on.
https://github.com/qarmin/czkawka will help find duplicates. I found it particularly great for getting rid of backups of backups of backups. You can set one set of files to be the "reference" directory and it will delete all other copies except for the "reference" ones.
I run it on docker on my NASes: https://hub.docker.com/r/jlesage/czkawka
I am in the same scenario as you now but I estimate it will take me longer than 100 hours... I got a NAS and reorgnising everything from scratch
Use a hashing software to find duplication, compress those folder / projects with lots of small files etc...
The best way imo is to number the folder like
001_LET
. In the rare case where you need to insert a folder in between you can use alphabet so it become like this when sorted:It is definitely a pain but if the data is important for you, you will have to do it anyway
I'm the exact same position.
For pictures, people like @KuJoe told me to load them into one of the major photo apps (Apple Photos, Google Photos, whatever) and use that instead of keeping them in YYYY-MM-DD Label directories.
But for 24TB of music, video, PDFs, zips...ugh.
And I keep growing it.
Although my progress has been slow, what I did was setup some NAS space as the Perfectly Organized Future and then I move things to it. For example, although I might have a copy of...oh call it Debian 11.iso...under random folders titled "misc", "debian isos", "linux isos", "linux isos to save", "linux isos backup", etc. I have only one destination so duplicates get sieved out.
I work on it a little while watching TV.
Sorting files takes a lot of effort. While ago one of my HDD's crashed, I was able to recover 99% of files, but filenames was lost. Was trying to sort them out and at some point I gave up. Takes too much time and drives you mad.
the problem would be, some pictures are sorted... some are not, some are just assets in games
i'd like something that would sort by tagging a folder with "lots of pictures" or "probably a program" or "iso images" or whatnot
that would make it easier and keep part of the structure
yeah the duplicates are probably easy to find, at least for pictures...
I dont have a single passive task left where i don't need my brain so i can focus on sorting my crap
unless you guys wanna jump in call and we talk while i sort all of this crap...
OOOF
another point, some of my pictures are EXCELLENTLY sorted but some are just in random folders that don't make sence
oh yeah, 10TB with my 20ish upload, google says around 1100h, so like 1.5 months to just upload
https://www.reddit.com/r/DataHoarder/comments/cf2n2w/i_would_like_to_stop_hoarding_how_do_i_get_help/
I don't have a hoarding problem, large amount of these things is still data i need and access (pictures, videos)
i just wanna sort it, i think reduced it should be 4tb or so
I encountered the same issue many years ago. I had pictures on CDs for backups (yes, I am that old). Then I took it slowly and organised pictures in years and sub-directories of each month. I did this manually.
Now every year (in February) I make a habit of downloading the pictures and movies from all mediums (cameras, smartphones, tables, laptops), and create a new yearly directory with a sub-directory for each month. It takes me about a week to do this in my free time, but it's damn worth it for my family. This month in my free time I scan old pictures from my parents and my grandparents, because I never thought of paper degradation throughout time.
For documents I had projects structured on it's own directory, simply because I saw each project in it's own folder. That was easy and it is still easy to maintain. In future I may have to restructure this into years though.
Regarding upload, hell yeah! I totally understand you. In Ireland my upload is 1MB/s. I thought you had way better upload speeds, but it seems a low-end storage offer is not recommended for you. As others have guessed: NAS is the way to go sorting everything locally. After you finish the structure, if I may suggest, you could buy something like a 10TB online cloud lifetime backup during Black Friday offers, then upload it all encrypted using a single-board-computer (like NanoPi or Raspberry Pi) connected to your NAS; even if this takes 2 months, at least it consumes little electricity.
Regarding scripts, you could use "find" and execute a structure for each file, like moving in directories by month ("2023/06/same-partial-hierarchy" or "2023/06/random-file-name" to not overwrite files). Here is some hint about how I think of approaching it. Surely it all depends on what you wish to achieve in the end with regards to envisioning your desired structure and hierarchy.
In the end allow me to congratulate you. Having 10TB of history is not a burden, but a huge success. This means you have a lot activity and memories and stories to share with your family or friends. Don't worry if it is a mess; your satisfaction will be greater once you finish reorganising it. Take it slow and don't rush; but instead try to enjoy maintaining your memories. Please do not share your data with corporations; it would be sad for me to know someone shared 10TB of personal info with Google or Microsoft or some other marketing or AI-investing entity.
well, my camera makes a 50-300gb files and certain things i do want to keep in raw format.
my structure for certain photos is perfect, but sometimes when i take few pictures i never make a folder and in a year or two it lands in a unsorted pile alongside downloads, stuff i need and stuff i don't.
probably reduced it should be 4tb.
ordered a 6tb HHD with cosmetic damage few days ago for 100€ it's a WD one
the plan is to move about 1tb of files onto it (still unsorted) the use windirstat to get a rough idea where the size is coming from and sort (or delete) the biggest files.
The worst part about this is probably going thru old pictures like exes, dead people now and whatever else mental drama.....
I have the life policy of keeping 100% of the pictures (unless 2 are nearly identical)
so if you send me nudes i'm taking them to the grave lol
after i have reduced the size (let's say from 10TB to 6TB) i will start going thru pictures and videos and sorting them, i still have some old old stuff from like 2015 or whatever unsorted.
after im done with pictures, i'll leave the other stuff for now (whatever i want to keep, but dont wanna deal sorting)
and backup and push the files somewhere
in 2 years after im making more than 6.50€/h i'll get proper 2 NAS solution with syncthing or something
i definitely want redundant storage + space for versioning
let's hope i can set some arguments, i see where it would fail
for example if i have some programs, and the files might be exact across versions...
i will first dump all pictures/videos into 1 then remove duplicates i guess?
Don't run those commands on directories containing source code (Eg: if bootstrap.min.css exists in multiple projects, it will break the user interface for all apps except the first one) .
For documents/images/videos/movies etc. it is fine I guess. Post duplicates removal, you will have to sort manually anyway.
yeah i guess first sorting out pictures...
Take your time, don't try to do it all at once, and don't stress out over it. It's worth the effort to organize all that stuff, and you might find some hidden treasures in there. Good luck!
@DeadlyChemist These don't exactly answer your question but I find these helpful:
https://www.tagspaces.org/ (can connect to object storage inc. MinIO)
https://www.jam-software.com/treesize_free
yeah already found some p***
will see, i also heard from chatgpt that tabbles apperantly can do some of the work
Just one note on numbering... Leave some room upfront. 0010_LET, 0020_Google,... when you need to insert something, insert in the middle: 0015_VPS. This way you will have much more pleasant view for your eye for much longer time, than when adding A and breaking the alignment
i tried tabbles and it's worthless AND you have to pay...
slowly going thru some files, it's painful
As Sigmund Freud would say, throw your burden into mount Doom, Frodo. (Or was it JRR Tolkien?)
Since I did, I enjoy peace of mind and sleep well at night:
Yes. I had the same problem earlier. There are tools for that.
https://www.disksorter.com/disksorter_overview.html
Results are good, but required a time to used to this data structure on your storage.
I also using a lot of everything and xyplorer.
Try to use that too, it will boost sorting in dozens of times.
My issue is even worse. I have gigs of data unsorted spread across 6-7 laptops and 6 desktops. I tried to sort but gave up.
No wonder Hunter Biden asked someone else to sort his home made p0rn collection and corrupt payments on his laptop, he had gigabytes of it and was probably getting in the way of his crack/hooker habit.
Even I had data hoarding problem. Eventually just deleted everything except critical required stuff like documents and code. Now have less than <20 GB data. Easy to backup and system is always fast.
Like how each line of code is a liability to maintain in future, I felt each file was a liability and headache. Now I am more mindful of storing data.
Yeah but my 4tb is spread along the 10tb... i need to fish it out, but i need to sort it.
Same problem incomming, i took 10GB of pictures and video today
I will sort it when i arrive home
But also zip it up and put the backup somewhere... as i dont have dediasted backup drive.
this will become a problem in a year or two where i'll find and unpack this backup and be like, do I have original?
Pics get backup, other stuff does not (usually)
So yeah...
Same for me, but im.slowly putting it on 2 machines, its a mess....
Will try i guess
okay, so more questions to ya'll
how do you sort pictures?
do you include country/date in the folder name? or do you have folders for countries?
also, what's your method on encrypting media ? is there an easy something, that will just make a executable and whatever file to open it? so i can just double click and get my files back?
i mean veracrypt works but i need veracrypt installed...
For pics I have folders by year and within that folders by a particular trip or a non-trip catch all folder. That works really well for me.
agh okay, i kinda don't want to sort by year, but then again, unsure...
I'm looking at file tags now, pretty handy metadata, could add like names of people or whatever to the files, but the problem is, well, i can't find a program to easily add it, they all use their own formats... smh
i think to simply go with this format for now onwards, to the point i don't need to worry about it for next 10 years
what happened (comment, optional) | (where) [when]
School trip (7th class) # (berlin) [23.12.1960]
Night out # (london) [03.11.1960 18.00 - 04.12.1960 02.00]
should be easy to parse with regex whenever i need to do it
might put location at the end or remove alltogether, not sure, it's a lot of folders...
still need to figure out how to sort weird things
for example things that are in a order... like visiting certain place 10x, going to a date with certain person 10x or simply my home city at 10 different dates/sights
no clue, still roughly sorting files, can decide at the file/foldernames at end end