Mahout on a single Cloudera machine

If you are running Cloudera in a pseudo distributed mode on a single machine (say, with a RAM of 8GB), you may find it problematic to run Mahout programs from command line as the map-reduce jobs they spawn, one after another, very soon run out of java heap space.

Here is how you can escape it. As root, open file /etc/hadoop/conf/mapred-site.xml and look for the following options:

  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx207577017</value>
  </property>

Replace it with:

  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx207577017</value>
    <!–<value> -d32</value> –>
    <value> -XX:+NewRatio=12</value>
    <value> -XX:+UseParallelGC</value>
    <value> -XX:+UseParallelOldGC</value>
  </property>

Restart all services. Hopefully now your mahout programs should run from command line.

 

Advertisements

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: