The script will download files from an s3 url, concatenate them together, zip up the results and send it as an attachment to a specified email address. It sends email through smtp.mail.com, using account credentials you specify.
I wanted to make it easy to just append an additional step to any existing job, not requiring any additional machine setup or dependencies. I was able do this by making use of amazon's script-runner (s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar). The script-runner.jar step will let you execute an arbitrary script from a location in s3 as an emr job step.
As I mentioned, the intended usage is to run it as a job step with your hive script, passing it in the location of the resulting report.
E.g.:
elastic-mapreduce --create --name "my awesome report ${MONTH}" \
--num-instances 10 --instance-type c1.medium --hadoop-version 0.20 \
--hive-script --arg s3://path/to/hive/script.sql \
--args -d,MONTH=${MONTH} --args -d,START=${START} --args -d,END=${END} \
--jar s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar \
--args s3://path/to/emr-mailer/send-report.rb \
--args -n,report_${MONTH} --args -s,"my awesome report ${MONTH}" \
--args -e,awesome-reports@company.com \
--args -r,s3://path/to/report/results
Above you can see I'm starting a hive report as normal, then simply appending the script-runner step, calling the emr-mailer send-report.rb, telling it where the report will end up, and details about the email.
The full source code is available on github as emr-mailer.
The script is pretty simple, but let me know if you have any suggestions for improvements or other feedback.
No comments:
Post a Comment