Mongo for relatively big amounts of documents

Traffic · June 2015

I am willing to use MongoDB as a document storage for a project I'm working on. Data will rarely be read, but will be written in big amounts per day (>1 million documents per day).

Does anyone here have experience with such environments? What kind of setup/servers do you have for this?

Thanks in advance.

albertdb · June 2015

Sharding is your friend.

Traffic · June 2015

@albertdb said:
Sharding is your friend.

Thanks for your reply. Yes, I was thinking of using sharding, but I was hoping to get ideas on specific configurations suitable for handling this amount of data from real life experience.

Of course, I will stress test the system before going live, but I want to have an idea of what would be a good starting point.

deadbeef · June 2015

Rarely read -> disk based database. Cassandra is probably a good choice but it depends on the specifics.

ehab · June 2015

i had sensors push 10's of values in intervals per minute 24 hours a day, using sharding, replication, and didn't have any problems. normal centos server with at least 4GB ram is good enough.

Traffic · June 2015

@deadbeef said:
Rarely read -> disk based database. Cassandra is probably a good choice but it depends on the specifics.

It's just information about the client performing the request. The only things I require are being able to "find" a record from a unique ID and being able to search even with only full field match - although those would be only made from time to time, manually, so no problem if they must be queued to the user.

The data is ephemeral (30 days), so 30-60million rows should be able to be handled without problems.

@ehab said:
i had sensors push 10's of values in intervals per minute 24 hours a day, using sharding, replication, and didn't have any problems. normal centos server with at least 4GB ram is good enough.

I think this data is a bit larger than just sensor data, however your reply still helps me make an idea of the kind of HW I'd need.

deadbeef · June 2015

@Traffic said:
The data is ephemeral (30 days), so 30-60million rows should be able to be handled without problems.

Ah! Well, Mongo will do then

MarkTurner · June 2015

I've scaled Mongo quite large but you need to think a lot about memory.

Another option is Elasticsearch

deadbeef · June 2015

@MarkTurner said:
Another option is Elasticsearch

Just be careful with it, because it is not a dabatabe in the sense that it does NOT guarantee data integrity. In its docs its says it cannot be used as a "source of truth" and you need to have the data on a database for that.

raindog308 · June 2015

@joepie91 will be along shortly to explain why using Mongo is always a mistake.

Traffic · June 2015

@raindog308 said:
joepie91 will be along shortly to explain why using Mongo is always a mistake.

And what is your opinion? Is it a mistake? I think there's a use case for every storage system.

raindog308 · June 2015

I personally have no use for a storage system that admits unreliability.

Traffic · June 2015

@raindog308 said:
I personally have no use for a storage system that admits unreliability.

And what would be your suggestion?

dlaxotn2 · June 2015

@deadbeef said:

I wouldn't use elasticsearch as a general purpose db. It's tailored to search.
But if you had to search, es is awesome.

joepie91 · June 2015

raindog308 said: @joepie91 will be along shortly to explain why using Mongo is always a mistake.

Traffic said: And what is your opinion? Is it a mistake?

Yes.

Mongo...

... loses data (1, 2)
... in fact, for a long time, ignored errors by default and assumed every single read succeeded no matter what (which on 32-bits systems led to losing all data silently after some 3GB, due to MongoDB limitations)
... is slow, even at its advertised usecases, and claims to the contrary are completely lacking evidence (3, 4)
... forces the poor habit of implicit schemas in nearly all usecases (4)
... has locking issues (4)
... is not ACID-compliant (5)
... is a nightmare to scale and maintain
... isn't even exclusive in its offering of JSON-based storage; PostgreSQL does it too, and other (better) document stores like CouchDB have been around for a long time (6, 7)

... so realistically, there's nothing it's good at, and a bunch of stuff it's outright bad at.

Traffic said: I think there's a use case for every storage system.

That's nonsense. There is absolutely nothing that prevents a piece of software from being objectively bad. And MongoDB is such an objectively bad piece of software - there are no usecases that aren't better solved by alternative options. It lives purely off hype.

Traffic said: And what would be your suggestion?

PostgreSQL, most likely. That, or Cassandra. It depends on the kind of data.

dlaxotn2 · June 2015

i agree with ^ 100%
i think the hype comes from the fact that it was written partially in js

Traffic · June 2015

@joepie91 Thanks a lot for your insights regarding MongoDB and your advice on storage systems. It's been really useful.

I think I now have enough information to start making trials. More input is welcome of course, and thanks a lot to everyone who posted in this thread so far.

black · June 2015

What are you actually writing? You said "documents." If it's actually documents then Cassandra is not good for you. If it's like a location to a document or some sort of URL that points to the document, then you're fine. Here are the data types supported by Cassandra http://docs.datastax.com/en/cql/3.0/cql/cql_reference/cql_data_types_c.html . Make sure you understand the difference between NoSQL and other SQL like language in terms of what you're able to query.

Cassandra is pretty fast and scales linearly. For a distributed system, it's fairly easy to manage.

Traffic · June 2015

black said: What are you actually writing?

I need to save the data from each click (hit), and be able to recover it later with an unique ID. The click information is an array of data.

Traffic said: It's just information about the client performing the request. The only things I require are being able to "find" a record from a unique ID and being able to search even with only full field match - although those would be only made from time to time, manually, so no problem if they must be queued to the user.

black · June 2015

Traffic said: I need to save the data from each click (hit), and be able to recover it later with an unique ID. The click information is an array of data.

Cassandra should be fine then.

jcaleb · June 2015

joepie91 said: .. so realistically, there's nothing it's good at, and a bunch of stuff it's outright bad at.

it is good for doing small experiments. not good for anything to be used in real life

joepie91 · June 2015

jcaleb said: it is good for doing small experiments. not good for anything to be used in real life

Well, no, not even really that.

A small experiment is generally one of two things:

Something that might grow out to a 'real' project; if you're using MongoDB, you're going to have a bad time.
Experimenting with a new technology/concept to learn about it; you're basing your learning (indirectly) on a technology that you won't be able to deploy in production.

In both cases, you're better off using something that is either already production-ready, or likely will be production-ready in the near future. There's not really a point in experimenting with something that you can't use in production anyway.

Howdy, Stranger!

Categories

In this Discussion

Mongo for relatively big amounts of documents

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Mongo for relatively big amounts of documents

Comments