Wednesday, September 29, 2010

emr: Cannot run program "bash": error=12, Cannot allocate memory

Moving one of our jobs from hive 0.4 / hadoop 0.18 to hive 0.5 / hadoop 0.20 on amazon emr, I ran into a weird error in the reduce stage, something like: Task: attempt_201007141555_0001_r_000009_0 - The reduce copier failed
at org.apache.hadoop.mapred.Child.main(
Caused by: Cannot run program "bash": error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(
at org.apache.hadoop.util.Shell.runCommand(
at org.apache.hadoop.fs.DF.getAvailable(
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(
at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$
Caused by: error=12, Cannot allocate memory
at java.lang.UNIXProcess.(
at java.lang.ProcessImpl.start(
at java.lang.ProcessBuilder.start(
... 8 more

There's some discussion on this thread in the emr forums.

From Andrew's response to the thread:

The issue here is that when Java tries to fork a process (in this case bash), Linux allocates as much memory as the current Java process, even though the command you are running might use very little memory. When you have a large process on a machine that is low on memory this fork can fail because it is unable to allocate that memory.

The workaround here is to either use an instance with more memory (m2 class), or reduce the number of mappers or reducers you are running on each machine to free up some memory.

Since the task I was running was reduce heavy, I chose to just drop the number of mappers from 4 to 2. You can do this pretty easy with the emr bootstrap actions.

My job ended up looking something like this:

elastic-mapreduce --create --name "awesome script" \
--num-instances 8 --instance-type m1.large \
--hadoop-version 0.20 \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
--args "-s," \
--hive-script --arg s3://....../script

(relevant parts highlighted).

1 comment:

Kyri said...

We had the same problem spawning an System.exec to launch a virus scanner to check uploaded binaries. Real pain in ass. We made it a service in the end.