FAILED: Error in metadata: org.jets3t.service.S3ServiceException: Failed to sanitize XML document destined for handler class org.jets3t.service.impl.rest.XmlResponsesSaxParser$ListBucketHandler null 'null' -- ResponseCode: -1, ResponseStatus: null, RequestId: null, HostId: null
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
There's some discussion in the aws forums. The underlying cause is that it's running out of memory when trying to build the partition list.
A workaround is to increase the HADOOP_HEAPSIZE. This can be done by modifying hadoop-user-env.sh with an EMR bootstrap action. On an m1.large instance, 2G seems to do the trick for us.
Upload a script like the following somewhere in s3:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [ $# -lt 1 ]; then | |
SIZE="2048" | |
else | |
SIZE=$1 | |
fi | |
echo "HADOOP_HEAPSIZE=${SIZE}" >> /home/hadoop/conf/hadoop-user-env.sh |
You can now run this bootstrap action as part of your job:
elastic-mapreduce --create --alive \
--name "large partitions..." --hive-interactive \
--num-instances 1 --instance-type m1.large \
--hadoop-version 0.20 \
--bootstrap-action s3://<bucket/path>/set-hadoop-heap.sh
You should now be able to load your partitions.
No comments:
Post a Comment