Setting up NetBeans for Mahout on Cloudera (CentOS system)

This is a simple way to set up NetBeans to write Java programs for Apache Mahout. We will not use maven. On my machine Cloudera 5.x is installed through parcels. But steps are the same if you have another version of Cloudera installed with packages/parcels. These steps will also work if you have downloaded and installed mahout distribution (say, mahout-distribution-0.9.tar.gz) directly from Apache Mahout download site instead of installing it from cloudera repository.

1. Download and install NetBeans from here. Download full version, if you are in doubt. You need to have jdk 7 already installed. Else, you can download a composite package of jdk 7 with NetBeans from here and install both together.  You can install NetBeans as a local user or as root; it is up to you. If you install as root, it will be made available to all users.

2. In your machine, locate mahout jar files. Use ‘locate’ command for the purpose, as below. In our case, the answer to locate command shows that mahout jar files are in the folder  /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout. (If you have installed mahout directly by expanding mahout distribution then location of mahout libraries will be under the expanded tar/gz folder.)

[ashokharnal@master ~]$ locate -r mahout-core-.*job.jar
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/mahout/mahout-core-0.8-cdh5.0.0-job.jar
[ashokharnal@master ~]$

3. Start NetBeans (Applications–>Programming–>NetBeans IDE) and on the NetBeans menu, click Tools–>Libraries.  Click on the button ‘New Library’ to add a new library.

Image

Figure 1: ‘New Library’ dialogue box of NetBeans. Name of library is unimportant. Click image to enlarge.

This will create an empty library. To add jar files, select ‘mahout’  and click the button Add Jar/Folder. You will have to pick up jar files from three folders, one inside the other. First, reach out to folder where jar file, mahout-core-0.8*job.jar, is located. This folder is ‘mahout’. On my machine this folder appears as below:

Figure 2: mahout folder with jar files on my machine with Clouder 5.x. Note the 'lib' folder. This folder also contains plenty of jar files. And there is a folder hadoop under lib folder that also contains jar file(s).

Figure 2: ‘mahout’ folder with jar files on my machine with Cloudera 5.x. Note also the ‘lib’ folder. lib folder contains plenty of jar files. And there is also a folder ‘hadoop’ under lib folder.  It also contains jar file(s). Click image to enlarge.

Add all jar files from this folder to just created ‘netbeans mahout library’. There will be (around) six jar files.

Selecting six jar files from under mahout folder for adding to 'nebeans mahout library'.

Figure 3: Selecting six jar files from under mahout folder for adding to ‘nebeans mahout library’. Click image to enlarge.

The added jar files will appear as in Figure 4.

Figure 3: Six jar files added in NetBeans newly created mahout library.

Figure 4: Six jar files added to NetBeans newly created mahout library. Click image to enlarge.

Click again Add Jar/Folder button, and open the ‘lib’ folder under the mahout folder from where you just picked up the six mahout jar files. See Figure 2 above.  ‘lib’ folder will also contain one ‘hadoop’ folder and plenty of jar files. Add all jar files to netbeans mahout library. Next, add the jar file(s) available under ‘hadoop’ folder to this library (we add all jar files just to be on the safe side).

Figure All jar files added to NetBeans newly create library.

Figure 5: All jar files added to NetBeans newly create library. Click image to enlarge.

Click OK button and close the Ant Library Manager. We are now ready to write a mahout program. In NetBeans, click File–>New Project and in the dialogue box, select Java->Java Application as below. Click Next.

Figure Create a Java Project (Java Application).

Figure 6: Create a Java Application Project. Click Next. Click image to enlarge.

In the next window, name your Java Project. We will call it RecommenderIntro. The name of Java class, will also be, by default, the name of project. Click Finish button.

Figure Creating RecommenderIntro project.

Figure 7: Creating RecommenderIntro project. Click image  to enlarge.

In the RecommenderIntro.java tab, replace all its contents with the following code (copy and paste). The following program has been taken from the Manning book: Mahout in Action.


import java.io.File;
import java.util.List;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
public class RecommenderIntro
    {
    public static void main(String[] args) throws Exception
        {
    
        DataModel model = new FileDataModel( new File( "/home/ashokharnal/Downloads/input.csv" ) );
        UserSimilarity similarity = new PearsonCorrelationSimilarity( model );
        UserNeighborhood neighborhood = new NearestNUserNeighborhood( 2, similarity, model );
        Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity );
        List recommendations = recommender.recommend(1, 1);
        for (Object recommendation : recommendations)
            {
            System.out.println(recommendation);
             }
        }
    
    }

At line 16 of the above code, data file, ‘input.csv’ is required to be specified with full path. The file is on local file system and not hadoop. The data in ‘input.csv’ is as below. You can copy and paste it in your text editor to create the data file.

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

We need to add netbeans mahout library to this project. For that purpose, under Projects, expand RecommenderIntro, right-click on Libraries and click on Add Library. From the Add Library dialogue box select mahout and we are ready to compile and run the project (see figure 8 below).

Figure: Adding netbeans mahout library to our project.

Figure 8: Adding netbeans mahout library to our project. Click image to enlarge.

Click Run–>Clean and Build the Project and then press F6 to run the project. You should get the following result:
RecommendedItem[item:104, value:4.257081]

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: