Mahout on a single Cloudera machine

If you are running Cloudera in a pseudo distributed mode on a single machine (say, with a RAM of 8GB), you may find it problematic to run Mahout programs from command line as the map-reduce jobs they spawn, one after another, very soon run out of java heap space.

Here is how you can escape it. As root, open file /etc/hadoop/conf/mapred-site.xml and look for the following options:

    <value> -Xmx207577017</value>

Replace it with:

    <value> -Xmx207577017</value>
    <!–<value> -d32</value> –>
    <value> -XX:+NewRatio=12</value>
    <value> -XX:+UseParallelGC</value>
    <value> -XX:+UseParallelOldGC</value>

Restart all services. Hopefully now your mahout programs should run from command line.


Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: