Ok so you have come to the realisation that the solr scoring algorithm is not quite doing what you need for the task at hand. You have scoured the net for possible solutions , even pestered the nerds on the #solr IRC channels. After exhausting all the possiblities you realise you are going to have to compile a new simialrity class for Solr and tweak it to your needs.
Note : Its been a long while since i did anything Java related. I welcome comments and suggestions - especially if the method outlined below seems a bit weird. I am writing this because there is little documentation on how this is done and I wish that there had been something to get me started in this area.
Assumptions : I assume that you are familiar with Eclipse and have it up and running (many people use eclipse for web development that does not involve JAVA with one of its many plugins eg: php, ruby )
In order to get up and running you will need some files from the distribution of eclipse that you are running. These files are contained within a ".war" file that comes with your solr distribution. I recommend using the file (outlined below) that comes with the same version you are going to be using the compiled similarity class with.
You are looking for a file called
apache-solr-4.0.0.war (your version numbers may be different)
this file usually resides in the "dist" folder.
make a folder in your eclipse "workspace"
eg :
%> mkdir ~/workspace/solr_war
copy the file here
%> cp /path/to/apache-solr-4.0.0.war ~/workspace/solr_war
unpack the "war" file
%> cd ~/workspace/solr_war
%> unzip apache-solr-4.0.0.war
.... stuff happens!
ok now that part is done we can move on to the Eclipse part
fire up eclipse
when loaded click
File - > New -> Java Project
give the project a new name eg:
MyNewSimilarityClass
click "Next"
click "Libraries"
click "Add External Jars"
navigate to ~/workspace/solr_war/WEB-INF/lib
select ALL jar files in this folder and click "OK"
then click "Finish"
At this point Eclipse is now set up for you to create a new class , compile and export to a jar file.
--------------
Creating a new class
Creating a new class
In Eclipse - on the left hand side where you have your new project
Right click -> New -> Class
Name the class eg : MyNewSimilarityClass
and click finish.
At this point you will now have the stub of a class in your eclipse window something like this .
You will probably want to change this so that your class can extend the DefaultSimilarity class
and then you can simply over-ride these functions.
and then you can simply over-ride these functions.
In my case I wanted to disable IDF (Inverse Document Frequency ) from the scoring algorithm
my class ended up something like this ...
package org.apache.lucene.search.similarities; public class MyDefaultSimilarity extends DefaultSimilarity{ @Override public float idf(long docFreq, long numDocs) { return 1.0f; } }
What your code contains may well be different from mine depending on your use
case. There are different functions in DefaultSimilarity that can be over-ridden in addition to other scoring implementations you could extend. Please refer to the Solr WIKI's and browse the lucene search similarities packages to find out more.
Building a JAR file for use with SOLR
This one is nice and easy!
right click on your Java project
go to Export -> Java -> Jar File
Name the jar file , and pick the file destination
Click "Finish"
you will now have a jar file that can be used with your SOLR distribution.
Using a JAR file with SOLR
Your new JAR file will need to be copied into the "lib" folder of your instance folder.
this is usually in the same directory as your solr.xml file. so change to the folder where this file is
located eg:
%> mkdir /path/to/instancedir/lib
then copy JAR file here
%> cp /path/to/myjarfile.jar /path/to/instancedir/lib/
now that your jar file is in place you just need to make sure that solr is conifigured to use it
use your favorite text editor to open solr.xml
%> vi /path/to/instancedir/solr.xml
and see that the following is in place
you should specify it here. other wise ensure it is as above!
Finally the next thing is to ensure that schema.xml is configured to use the new class
in my version of SOLR near the bottom of the schema.xml file are the following lines
This one is nice and easy!
right click on your Java project
go to Export -> Java -> Jar File
Name the jar file , and pick the file destination
Click "Finish"
you will now have a jar file that can be used with your SOLR distribution.
Using a JAR file with SOLR
Your new JAR file will need to be copied into the "lib" folder of your instance folder.
this is usually in the same directory as your solr.xml file. so change to the folder where this file is
located eg:
%> mkdir /path/to/instancedir/lib
then copy JAR file here
%> cp /path/to/myjarfile.jar /path/to/instancedir/lib/
now that your jar file is in place you just need to make sure that solr is conifigured to use it
use your favorite text editor to open solr.xml
%> vi /path/to/instancedir/solr.xml
and see that the following is in place
<solr persistent="true" sharedLib="lib">note the ' sharedLib="lib" ' if you have a weird directory structure
you should specify it here. other wise ensure it is as above!
Finally the next thing is to ensure that schema.xml is configured to use the new class
in my version of SOLR near the bottom of the schema.xml file are the following lines
<!-- <similarity class="com.example.solr.CustomSimilarityFactory"> <str name="paramkey">param value</str> </similarity> -->
Uncomment and change to use your new class
<!-- <similarity class="com.example.solr.MySimilarityClass"> <str name="paramkey">param value</str> </similarity> -->Restart Solr to start using your new class! Hope this helps Nick ...
No comments:
Post a Comment