Friday, 12 July 2013

How to compile a custom similarity class for SOLR / Lucene using Eclipse (For someone who hasnt touched java for a long time....)


Ok so you have come to the realisation that the solr scoring algorithm is not quite doing what you need for the task at hand. You have scoured the net for possible solutions , even pestered the nerds on the #solr IRC channels. After exhausting all the possiblities you realise you are going to have to compile a new simialrity class for Solr and tweak it to your needs.

Note : Its been a long while since i did anything Java related. I welcome comments and suggestions - especially if the method outlined below seems a bit weird. I am writing this because there is little documentation on how this is done and I wish that there had been something to get me started in this area.

Assumptions : I assume that you are familiar with Eclipse and have it up and running (many people use eclipse for web development that does not involve JAVA with one of its many plugins eg: php, ruby )

In order to get up and running you will need some files from the distribution of eclipse that you are running. These files are contained within a ".war" file that comes with your solr distribution. I recommend using the file (outlined below) that comes with the same version you are going to be using the compiled similarity class with.

You are looking for a file called 

apache-solr-4.0.0.war   (your version numbers may be different)

this file usually resides in the "dist" folder.

make a folder in your eclipse "workspace"

eg :

%> mkdir ~/workspace/solr_war

copy the file here 

%> cp /path/to/apache-solr-4.0.0.war ~/workspace/solr_war

unpack the "war" file

%> cd  ~/workspace/solr_war
%> unzip apache-solr-4.0.0.war

.... stuff happens!

ok now that part is done we can move on to the Eclipse part

fire up eclipse

when loaded click

File - > New -> Java Project

give the project a new name eg:

MyNewSimilarityClass

click "Next"

click "Libraries"

click "Add External Jars"

navigate to ~/workspace/solr_war/WEB-INF/lib

select ALL jar files in this folder and click "OK" 

then click "Finish"

At this point Eclipse is now set up for you to create a new class , compile and export to a jar file.

--------------

Creating a new class

In Eclipse - on the left hand side where you have your new project 

Right click -> New -> Class

Name the class eg : MyNewSimilarityClass

and click finish.

At this point you will now have the stub of a class in your eclipse window something like this .

You will probably want to change this so that your class can extend the DefaultSimilarity class
and then you can simply over-ride these functions.

In my case I wanted to disable IDF (Inverse Document Frequency ) from the scoring algorithm 
my class ended up something like this ...


package org.apache.lucene.search.similarities;

public class MyDefaultSimilarity extends DefaultSimilarity{
  
  @Override
  public float idf(long docFreq, long numDocs) {
   return 1.0f;
  }
    
}

What your code contains may well be different from mine depending on your use 
case. There are different functions in DefaultSimilarity that can be over-ridden in addition to other scoring implementations you could extend. Please refer to the Solr WIKI's and browse the lucene search similarities packages to find out more.

Building a JAR file for use with SOLR

This one is nice and easy!

right click on your Java project

go to Export -> Java -> Jar File

Name the jar file , and pick the file destination

Click "Finish"

you will now have a jar file that can be used with your SOLR distribution.

Using a JAR file with SOLR

Your new JAR file will need to be copied into the "lib" folder of your instance folder.
this is usually in the same directory as your solr.xml file. so change to the folder where this file is
located eg:

%> mkdir /path/to/instancedir/lib

then copy JAR file here

%>  cp /path/to/myjarfile.jar /path/to/instancedir/lib/

now that your jar file is in place you just need to make sure that solr is conifigured to use it

use your favorite text editor to open solr.xml

%> vi /path/to/instancedir/solr.xml

and see that the following is in place

<solr persistent="true" sharedLib="lib">

note the   ' sharedLib="lib" ' if you have a weird directory structure
you should specify it here. other wise ensure it is as above!

Finally the next thing is to ensure that schema.xml is configured to use the new class

in my version of SOLR near the bottom of the schema.xml file are the following lines




  <!--
     <similarity class="com.example.solr.CustomSimilarityFactory">
       <str name="paramkey">param value</str>
     </similarity>
    -->


Uncomment and change to use your new class
  <!--
     <similarity class="com.example.solr.MySimilarityClass">
       <str name="paramkey">param value</str>
     </similarity>
    -->


Restart Solr to start using your new class! Hope this helps Nick ...