Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Sorting 10TB of mess (10+ years)
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Sorting 10TB of mess (10+ years)

So, I have all my data but slowly finding something is pain in the ass because it's a huge mess
pictures half sorted, pictures unsorted, documents all over, old programs, old backups, backups of backups

im planning to move everything to one spot, is there some software that would just go thru the 10TB and tag/sort/assist with everything as much as possible? i don't wanna spend next 100h going thru old stuff by hand if some AI/script can do most of the work

«1

Comments

  • yoursunnyyoursunny Member, IPv6 Advocate
    edited June 2023

    To be honest, if you won't spend the next 100 hours looking over the files, these files aren't actually important, and you'll unlikely look at them for the rest of your life.

    So the solution is:
    FORMAT D:

    Poof!
    Problem gone.

  • defaultdefault Veteran
    edited June 2023

    Dump it all into some cloud storage, then wait for AI to develop in a few years. Don't forget corporations want all that data to develop AI and grab your personal info; they don't care if it's unsorted, so just upload it into their cloud to help them out with your data :wink:

    As a side note, if you wish to get dirty, you could upload all data into a large server from @hosthatch (I remember they offered huge cheap storage) and sort it there using commands such as "find" to move pictures (jpg, png) into a specific directory, documents (doc, docx) into another directory, and so on.

  • bdlbdl Member
    edited June 2023

    https://github.com/qarmin/czkawka will help find duplicates. I found it particularly great for getting rid of backups of backups of backups. You can set one set of files to be the "reference" directory and it will delete all other copies except for the "reference" ones.

    I run it on docker on my NASes: https://hub.docker.com/r/jlesage/czkawka

  • FAT32FAT32 Administrator, Deal Compiler Extraordinaire

    I am in the same scenario as you now but I estimate it will take me longer than 100 hours... I got a NAS and reorgnising everything from scratch

    Use a hashing software to find duplication, compress those folder / projects with lots of small files etc...

    The best way imo is to number the folder like 001_LET. In the rare case where you need to insert a folder in between you can use alphabet so it become like this when sorted:

    001_LET
    001A_VPS
    002_Google
    

    It is definitely a pain but if the data is important for you, you will have to do it anyway

  • raindog308raindog308 Administrator, Veteran

    I'm the exact same position.

    For pictures, people like @KuJoe told me to load them into one of the major photo apps (Apple Photos, Google Photos, whatever) and use that instead of keeping them in YYYY-MM-DD Label directories.

    But for 24TB of music, video, PDFs, zips...ugh.

    And I keep growing it.

    Although my progress has been slow, what I did was setup some NAS space as the Perfectly Organized Future and then I move things to it. For example, although I might have a copy of...oh call it Debian 11.iso...under random folders titled "misc", "debian isos", "linux isos", "linux isos to save", "linux isos backup", etc. I have only one destination so duplicates get sieved out.

    I work on it a little while watching TV.

  • Sorting files takes a lot of effort. While ago one of my HDD's crashed, I was able to recover 99% of files, but filenames was lost. Was trying to sort them out and at some point I gave up. Takes too much time and drives you mad.

  • @default said:
    Dump it all into some cloud storage, then wait for AI to develop in a few years. Don't forget corporations want all that data to develop AI and grab your personal info; they don't care if it's unsorted, so just upload it into their cloud to help them out with your data :wink:

    As a side note, if you wish to get dirty, you could upload all data into a large server from @hosthatch (I remember they offered huge cheap storage) and sort it there using commands such as "find" to move pictures (jpg, png) into a specific directory, documents (doc, docx) into another directory, and so on.

    the problem would be, some pictures are sorted... some are not, some are just assets in games
    i'd like something that would sort by tagging a folder with "lots of pictures" or "probably a program" or "iso images" or whatnot
    that would make it easier and keep part of the structure

    @FAT32 said: Use a hashing software to find duplication, compress those folder / projects with lots of small files etc...

    yeah the duplicates are probably easy to find, at least for pictures...

    @raindog308 said: I work on it a little while watching TV.

    I dont have a single passive task left where i don't need my brain so i can focus on sorting my crap
    unless you guys wanna jump in call and we talk while i sort all of this crap...

    @moodwriter said: Sorting files takes a lot of effort. While ago one of my HDD's crashed, I was able to recover 99% of files, but filenames was lost. Was trying to sort them out and at some point I gave up. Takes too much time and drives you mad.

    OOOF

    another point, some of my pictures are EXCELLENTLY sorted but some are just in random folders that don't make sence

  • @default said: As a side note, if you wish to get dirty, you could upload all data into a large server from @hosthatch (I remember they offered huge cheap storage) and sort it there using commands such as "find" to move pictures (jpg, png) into a specific directory, documents (doc, docx) into another directory, and so on.

    oh yeah, 10TB with my 20ish upload, google says around 1100h, so like 1.5 months to just upload

  • I don't have a hoarding problem, large amount of these things is still data i need and access (pictures, videos)
    i just wanna sort it, i think reduced it should be 4tb or so

  • defaultdefault Veteran
    edited June 2023

    @DeadlyChemist said:

    @default said: As a side note, if you wish to get dirty, you could upload all data into a large server from @hosthatch (I remember they offered huge cheap storage) and sort it there using commands such as "find" to move pictures (jpg, png) into a specific directory, documents (doc, docx) into another directory, and so on.

    oh yeah, 10TB with my 20ish upload, google says around 1100h, so like 1.5 months to just upload

    I encountered the same issue many years ago. I had pictures on CDs for backups (yes, I am that old). Then I took it slowly and organised pictures in years and sub-directories of each month. I did this manually.

    Now every year (in February) I make a habit of downloading the pictures and movies from all mediums (cameras, smartphones, tables, laptops), and create a new yearly directory with a sub-directory for each month. It takes me about a week to do this in my free time, but it's damn worth it for my family. This month in my free time I scan old pictures from my parents and my grandparents, because I never thought of paper degradation throughout time.

    For documents I had projects structured on it's own directory, simply because I saw each project in it's own folder. That was easy and it is still easy to maintain. In future I may have to restructure this into years though.

    Regarding upload, hell yeah! I totally understand you. In Ireland my upload is 1MB/s. I thought you had way better upload speeds, but it seems a low-end storage offer is not recommended for you. As others have guessed: NAS is the way to go sorting everything locally. After you finish the structure, if I may suggest, you could buy something like a 10TB online cloud lifetime backup during Black Friday offers, then upload it all encrypted using a single-board-computer (like NanoPi or Raspberry Pi) connected to your NAS; even if this takes 2 months, at least it consumes little electricity.

    Regarding scripts, you could use "find" and execute a structure for each file, like moving in directories by month ("2023/06/same-partial-hierarchy" or "2023/06/random-file-name" to not overwrite files). Here is some hint about how I think of approaching it. Surely it all depends on what you wish to achieve in the end with regards to envisioning your desired structure and hierarchy.

    In the end allow me to congratulate you. Having 10TB of history is not a burden, but a huge success. This means you have a lot activity and memories and stories to share with your family or friends. Don't worry if it is a mess; your satisfaction will be greater once you finish reorganising it. Take it slow and don't rush; but instead try to enjoy maintaining your memories. Please do not share your data with corporations; it would be sad for me to know someone shared 10TB of personal info with Google or Microsoft or some other marketing or AI-investing entity.

    Thanked by 2FAT32 RFord
  • varwwwvarwww Member
    edited June 2023
    # Dry run
    fdupes -r DirectoryToDeleteDuplicateFilesFrom/
    
    # Delete
    # On duplicate -> Keeps first copy, deletes the other copies without confirmation
    fdupes -r -d -N DirectoryToDeleteDuplicateFilesFrom/
    
    # Find and Delete Empty Directories Recursively in Current Directory
    find ./ -type d -empty -delete
    
  • @default said: In the end allow me to congratulate you. Having 10TB of history is not a burden, but a huge success. This means you have a lot activity and memories and stories to share with your family or friends. Don't worry if it is a mess; your satisfaction will be greater once you finish reorganising it. Take it slow and don't rush; but instead try to enjoy maintaining your memories. Please do not share your data with corporations; it would be sad for me to know someone shared 10TB of personal info.

    well, my camera makes a 50-300gb files and certain things i do want to keep in raw format.
    my structure for certain photos is perfect, but sometimes when i take few pictures i never make a folder and in a year or two it lands in a unsorted pile alongside downloads, stuff i need and stuff i don't.

    probably reduced it should be 4tb.

    ordered a 6tb HHD with cosmetic damage few days ago for 100€ it's a WD one
    the plan is to move about 1tb of files onto it (still unsorted) the use windirstat to get a rough idea where the size is coming from and sort (or delete) the biggest files.

    The worst part about this is probably going thru old pictures like exes, dead people now and whatever else mental drama.....
    I have the life policy of keeping 100% of the pictures (unless 2 are nearly identical)
    so if you send me nudes i'm taking them to the grave lol

    after i have reduced the size (let's say from 10TB to 6TB) i will start going thru pictures and videos and sorting them, i still have some old old stuff from like 2015 or whatever unsorted.

    after im done with pictures, i'll leave the other stuff for now (whatever i want to keep, but dont wanna deal sorting)

    and backup and push the files somewhere

    in 2 years after im making more than 6.50€/h i'll get proper 2 NAS solution with syncthing or something

    i definitely want redundant storage + space for versioning

  • @varwww said:

    # Dry run
    fdupes -r DirectoryToDeleteDuplicateFilesFrom/
    
    # Delete
    # On duplicate -> Keeps first copy, deletes the other copies without confirmation
    fdupes -r -d -N DirectoryToDeleteDuplicateFilesFrom/
    
    # Find and Delete Empty Directories Recursively in Current Directory
    find ./ -type d -empty -delete
    

    let's hope i can set some arguments, i see where it would fail
    for example if i have some programs, and the files might be exact across versions...
    i will first dump all pictures/videos into 1 then remove duplicates i guess?

  • varwwwvarwww Member

    @DeadlyChemist said:

    @varwww said:

    # Dry run
    fdupes -r DirectoryToDeleteDuplicateFilesFrom/
    
    # Delete
    # On duplicate -> Keeps first copy, deletes the other copies without confirmation
    fdupes -r -d -N DirectoryToDeleteDuplicateFilesFrom/
    
    # Find and Delete Empty Directories Recursively in Current Directory
    find ./ -type d -empty -delete
    

    let's hope i can set some arguments, i see where it would fail
    for example if i have some programs, and the files might be exact across versions...
    i will first dump all pictures/videos into 1 then remove duplicates i guess?

    Don't run those commands on directories containing source code (Eg: if bootstrap.min.css exists in multiple projects, it will break the user interface for all apps except the first one) .

    For documents/images/videos/movies etc. it is fine I guess. Post duplicates removal, you will have to sort manually anyway.

  • @varwww said:

    @DeadlyChemist said:

    @varwww said:

    # Dry run
    fdupes -r DirectoryToDeleteDuplicateFilesFrom/
    
    # Delete
    # On duplicate -> Keeps first copy, deletes the other copies without confirmation
    fdupes -r -d -N DirectoryToDeleteDuplicateFilesFrom/
    
    # Find and Delete Empty Directories Recursively in Current Directory
    find ./ -type d -empty -delete
    

    let's hope i can set some arguments, i see where it would fail
    for example if i have some programs, and the files might be exact across versions...
    i will first dump all pictures/videos into 1 then remove duplicates i guess?

    Don't run those commands on directories containing source code (Eg: if bootstrap.min.css exists in multiple projects, it will break the user interface for all apps except the first one) .

    For documents/images/videos/movies etc. it is fine I guess. Post duplicates removal, you will have to sort manually anyway.

    yeah i guess first sorting out pictures...

  • jlet88jlet88 Member

    Take your time, don't try to do it all at once, and don't stress out over it. It's worth the effort to organize all that stuff, and you might find some hidden treasures in there. Good luck!

    Thanked by 2yoursunny ariq01
  • KassemKassem Member

    @DeadlyChemist These don't exactly answer your question but I find these helpful:

    https://www.tagspaces.org/ (can connect to object storage inc. MinIO)
    https://www.jam-software.com/treesize_free

  • @jlet88 said:
    Take your time, don't try to do it all at once, and don't stress out over it. It's worth the effort to organize all that stuff, and you might find some hidden treasures in there. Good luck!

    yeah already found some p*** :joy:

    @Kassem said:
    @DeadlyChemist These don't exactly answer your question but I find these helpful:

    https://www.tagspaces.org/ (can connect to object storage inc. MinIO)
    https://www.jam-software.com/treesize_free

    will see, i also heard from chatgpt that tabbles apperantly can do some of the work

  • MrEdMrEd Member

    @FAT32 said:
    I am in the same scenario as you now but I estimate it will take me longer than 100 hours... I got a NAS and reorgnising everything from scratch

    Use a hashing software to find duplication, compress those folder / projects with lots of small files etc...

    The best way imo is to number the folder like 001_LET. In the rare case where you need to insert a folder in between you can use alphabet so it become like this when sorted:

    001_LET
    001A_VPS
    002_Google
    

    It is definitely a pain but if the data is important for you, you will have to do it anyway

    Just one note on numbering... Leave some room upfront. 0010_LET, 0020_Google,... when you need to insert something, insert in the middle: 0015_VPS. This way you will have much more pleasant view for your eye for much longer time, than when adding A and breaking the alignment :)

    Thanked by 1yoursunny
  • i tried tabbles and it's worthless AND you have to pay...
    slowly going thru some files, it's painful

  • davidedavide Member
    edited July 2023

    As Sigmund Freud would say, throw your burden into mount Doom, Frodo. (Or was it JRR Tolkien?)

    Since I did, I enjoy peace of mind and sleep well at night:

    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sda1        55G   15G   38G  28% /
    tmpfs           199M  1.1M  198M   1% /run
    
  • @DeadlyChemist said: im planning to move everything to one spot, is there some software that would just go thru the 10TB and tag/sort/assist with everything as much as possible? i don't wanna spend next 100h going thru old stuff by hand if some AI/script can do most of the work

    Yes. I had the same problem earlier. There are tools for that.
    https://www.disksorter.com/disksorter_overview.html

    Results are good, but required a time to used to this data structure on your storage.

    I also using a lot of everything and xyplorer.
    Try to use that too, it will boost sorting in dozens of times.

  • asterisk14asterisk14 Banned
    edited July 2023

    My issue is even worse. I have gigs of data unsorted spread across 6-7 laptops and 6 desktops. I tried to sort but gave up.

    No wonder Hunter Biden asked someone else to sort his home made p0rn collection and corrupt payments on his laptop, he had gigabytes of it and was probably getting in the way of his crack/hooker habit.

  • varwwwvarwww Member

    Even I had data hoarding problem. Eventually just deleted everything except critical required stuff like documents and code. Now have less than <20 GB data. Easy to backup and system is always fast. :)

    Like how each line of code is a liability to maintain in future, I felt each file was a liability and headache. Now I am more mindful of storing data.

  • @varwww said:
    Even I had data hoarding problem. Eventually just deleted everything except critical required stuff like documents and code. Now have less than <20 GB data. Easy to backup and system is always fast. :)

    Like how each line of code is a liability to maintain in future, I felt each file was a liability and headache. Now I am more mindful of storing data.

    Yeah but my 4tb is spread along the 10tb... i need to fish it out, but i need to sort it.

    Same problem incomming, i took 10GB of pictures and video today
    I will sort it when i arrive home
    But also zip it up and put the backup somewhere... as i dont have dediasted backup drive.

    this will become a problem in a year or two where i'll find and unpack this backup and be like, do I have original?

    Pics get backup, other stuff does not (usually)

    So yeah...

    @asterisk14 said:
    My issue is even worse. I have gigs of data unsorted spread across 6-7 laptops and 6 desktops. I tried to sort but gave up.

    Same for me, but im.slowly putting it on 2 machines, its a mess....

    @desperand said:

    @DeadlyChemist said: im planning to move everything to one spot, is there some software that would just go thru the 10TB and tag/sort/assist with everything as much as possible? i don't wanna spend next 100h going thru old stuff by hand if some AI/script can do most of the work

    Yes. I had the same problem earlier. There are tools for that.
    https://www.disksorter.com/disksorter_overview.html

    Results are good, but required a time to used to this data structure on your storage.

    I also using a lot of everything and xyplorer.
    Try to use that too, it will boost sorting in dozens of times.

    Will try i guess

  • okay, so more questions to ya'll
    how do you sort pictures?

    do you include country/date in the folder name? or do you have folders for countries?

    also, what's your method on encrypting media ? is there an easy something, that will just make a executable and whatever file to open it? so i can just double click and get my files back?
    i mean veracrypt works but i need veracrypt installed...

  • TimRooTimRoo Member

    @DeadlyChemist said:
    okay, so more questions to ya'll
    how do you sort pictures?

    do you include country/date in the folder name? or do you have folders for countries?

    For pics I have folders by year and within that folders by a particular trip or a non-trip catch all folder. That works really well for me.

  • @TimRoo said: For pics I have folders by year and within that folders by a particular trip or a non-trip catch all folder. That works really well for me.

    agh okay, i kinda don't want to sort by year, but then again, unsure...
    I'm looking at file tags now, pretty handy metadata, could add like names of people or whatever to the files, but the problem is, well, i can't find a program to easily add it, they all use their own formats... smh

  • i think to simply go with this format for now onwards, to the point i don't need to worry about it for next 10 years

    what happened (comment, optional) | (where) [when]
    School trip (7th class) # (berlin) [23.12.1960]
    Night out # (london) [03.11.1960 18.00 - 04.12.1960 02.00]

    should be easy to parse with regex whenever i need to do it
    might put location at the end or remove alltogether, not sure, it's a lot of folders...

    still need to figure out how to sort weird things
    for example things that are in a order... like visiting certain place 10x, going to a date with certain person 10x or simply my home city at 10 different dates/sights

    no clue, still roughly sorting files, can decide at the file/foldernames at end end

Sign In or Register to comment.