Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Anyone using Solr?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Anyone using Solr?

I'm currently working on a project and it will use Solr for search. I was wondering if anyone here has used it it production, how well it's performed and any issues you might have encountered.

In my case, it would be to index article titles, body text and possibly one other short text field.

I am also wondering if you have had any problems running this on a VPS (in this case, a KVM instance).

Comments

  • I have Elasticsearch/Solr/Sphinx running here, on VPS (KVM/Openstack), but each KVM has 64GB ram, so not sure how well that works.

    It's a massive java application, it won't have issues if you give it enough resources. For tiny stuff you might be better off with SaaS, because Solr is beyond overkill

  • I don't know too much about Solr, but I have it running as part of a repository software. I haven't seen issues with the indexes.

    It runs on a KVM instance with Tomcat. Specs: 6GB RAM, 4 vCores X5690 @ 3.47GHz

    If me, I'll take a look to ElasticSearch.

  • IonSwitch_StanIonSwitch_Stan Member, Host Rep

    Apache Solr is extremely popular and used in many large production sites. As @hzr mentioned, it is a Java application that needs heap management, and at scale, some proficiency with JVM based operations. There is no reason it operates better or worse than any other database inside a virtual machine.

    I found this blog post that more or less covers the major paint points I have experienced on some large installs (https://lucidworks.com/2010/01/21/the-seven-deadly-sins-of-solr/).

    I agree entirely with @hzr, for your first install, I would consider a managed SOLR install like AWS ElasticSearch.

  • We used solr, dumped it for elastic search, much more flexible and the cluster support was much better. We were running a 3 node 16Gb per server configuration, about 1 million documents and about 10-20 searches per second with a lots of updates and never had an issue.

  • I've used solr on a 4GB dedi back in the day. It might be ok on a 2GB vps but not much less than that. If you search "solr vs elastic search" you'll find many comparisons. Elastic's main advantages seem to be in scalability across server clusters. If your dataset is small enough for a single server, solr can be ok.

  • @IonSwitch_Stan said:
    I found this blog post that more or less covers the major paint points I have experienced on some large installs (https://lucidworks.com/2010/01/21/the-seven-deadly-sins-of-solr/).

    "Throw more RAM at it", "Don't be afraid of going SaaS", "Let it do what it's supposed to", "Don't run it on same system as service itself"

    What's next? "It'll only hurt for awhile until you get used to it"

  • @WSS said:

    @IonSwitch_Stan said:
    I found this blog post that more or less covers the major paint points I have experienced on some large installs (https://lucidworks.com/2010/01/21/the-seven-deadly-sins-of-solr/).

    "Throw more RAM at it", "Don't be afraid of going SaaS", "Let it do what it's supposed to", "Don't run it on same system as service itself"

    What's next? "It'll only hurt for awhile until you get used to it"

    That's shitty java for you, next they will make a flash based one.

  • Thanks, everyone. You're giving me some food for thought, here. I'm familiar with ElasticSearch, as I have set up clusters for it at my job. Given that this is a smaller, self-funded project, I was trying to keep costs down by using a single instance of Solr.

    But, like I stated, I've got some more to consider, now.

  • williewillie Member
    edited December 2017

    Both of them being java based, you need around 512mb to 1gb of ram just to get in the door. But after that the memory hogginess isn't that bad compared with other possible implementations. Search engines are like databases in that they inherently use a lot of ram for caching if the traffic is high. And the lookup load is high if you use features like faceted search. But the main index structures in Lucene (the java library that solr and elastic both use) are manually managed inside the code so they're not getting thrashed around in the GC all the time.

    Solr is fairly easy to set up and use. I haven't used ES.

    Added: if you want something smaller you could try Xapian (xapian.org).

  • xyzxyz Member
    edited December 2017

    Used to use Sphinxsearch on a 128MB OpenVZ VPS (along with nginx, PHP, MariaDB for a typical dynamic application getting around 100k-200k pageviews a day). Worked fairly well, though index rebuilds were a little painful.
    I configured it to use on-disk indexes; the index was around 40MB at the time, if I recall correctly, but I'm not sure if the current version of Sphinxsearch still supports that option.
    Regardless, I don't think any of the Java engines would've had a chance on a 128MB RAM VPS, though I've never tried them.

  • I agree 128mb wouldn't work for java. @jaypeesmith if you can say more about your application, it could help generate better advice here. There are some other alternatives available too. I remember deciding against sphinx for my own requirements but yours could be different. There is also CLucene which is a C++ implementation of Lucene. I don't know if there are solr-like wrappers for it.

  • jaypeesmithjaypeesmith Member
    edited December 2017

    @willie said:
    I agree 128mb wouldn't work for java. @jaypeesmith if you can say more about your application, it could help generate better advice here. There are some other alternatives available too. I remember deciding against sphinx for my own requirements but yours could be different. There is also CLucene which is a C++ implementation of Lucene. I don't know if there are solr-like wrappers for it.

    Sure. It's a Django web app, utilizing Haystack & Solr for search. Solr would run on a separate instance. In my dev setup, it's running on a 4GB instance, with 2GB dedicated to Solr. In production, I was looking at having Solr run on its own instance, starting out with either an 8GB or a 16GB KVM VPS, with half of the memory dedicated to Solr itself.

    Thanks.

  • jaypeesmith said: I was looking at having Solr run on its own instance, starting out with either an 8GB or a 16GB KVM VPS, with half of the memory dedicated to Solr itself.

    If solr is in its own instance using 1/2 the memory, what is the other half of the memory for? Solr doesn't spew stuff into memory constantly, but rather it just wants to hold its index in memory, which is approx the same size as the document set. I have a few tricks for caching just the hottest parts of the index if you're memory constrained.

    If you have a significant size doc set and query load and you reindex with any frequency, you might consider a dedi. Indexing my old 10GB-ish doc set took 8+ hours on an old AMD X2 processor, so still several hours on a modern one. That could be an issue on a vps without dedicated cores.

Sign In or Register to comment.