How to clean up custom AMIs in order to use them with AWS Elastic Beanstalk

The issue

This time I needed to make some modifications for an application managed by AWS Elastic Beanstalk. I had to modify something on the host system which means I had to create a new AMI which will then be used by Elastic Beanstalk. At first I didn’t take care of cleaning up the EC2 instance before creating the AMI. This means new launched instances already contained some application code and most of all some old Elastic Beanstalk configurations. Unfortunately not all configurations were overriden during the (first initial) deployment. In my case the name of the attached SQS queue wasn’t updated (regarding SQS queue configurations and my observations see the end of this post about additional comments).

The solution

You have to delete some certain directories before creating the AMI. I couldn’t find any official tutorials from AWS or stackoverflow posts about which directories I have to delete. That’s why I want to summarize it here. It’s difficult to give a general instruction as Elastic Beanstalk supports a huge amount of different setups (Web server environment vs. Worker environment, Docker vs. Multi-container Docker, Go vs. .NET vs. Java vs …). You can use the following commands as a starting point. If you want to add something feel free to leave a comment!

So let’s start:

Delete the directory containing the application code:

rm -rf /opt/elasticbeanstalk/

Depending which platform (Go, Java, Python,…) you are using you should delete the directory containing executables, too. In my case it was Python which is also installed at /opt by Elastic Beanstalk:

rm -rf /opt/python/

Elastic Beanstalk uses different software as proxy servers for processing http requests. For python it’s Apache. Visit https://docs.aws.amazon.com/elasticbeanstalk/latest/platforms/platforms-supported.html#platforms-supported.python to see which platforms use Apache, nginx or IIS in the preconfigured AMIs.

So keep in mind the directories containing configuration files for apache, nginx or IIS. For apache you find them at:

/etc/httpd/

The most important files are probably:

  • /etc/httpd/conf/httpd.conf
  • /etc/httpd/conf.d/wsgi.conf
  • /etc/httpd/conf.d/wsgi_custom.conf (if you modified the wsgi settings)


Optional:

Delete logfiles created and filled up by Elastic Beanstalk (to avoid seeing old log entries in the Elastic Beanstalk GUI during the first initial deployment):

rm /var/log/eb-activity.log /var/log/eb-cfn-init-call.log /var/log/eb-cfn-init.log /var/log/eb-commandprocessor.log /var/log/eb-publish-logs.log /var/log/eb-tools.log

If you are using Elastic Beanstalk as an worker environment and you have attached a SQS queue you can delete the corresponding log directory, too:

rm -rf /var/log/aws-sqsd/

Additional comments

I was really surprised that I didn’t found anything about the configuration file for the sqs queue. The only more detailed information about Elastic Beanstalk and SQS queues I found was https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html which wasn’t very helpful for me but still interesting to read (especially regarding the HTTP headers for processing SQS messages).

The configuration is saved at /etc/aws-sqsd.d/default.yaml and has the following format:

---
http_connections: 10
http_port: 80
verbose: false
inactivity_timeout: 9
healthcheck: TCP:80
environment_name: My-ElasticBeanstalk-Environment-Example
queue_url: https://sqs.us-east-1.amazonaws.com/123456789/my-sqs-queue
dynamodb_ssl: true
quiet: false
via_sns: false
retention_period: 60
sqs_ssl: true
threads: 50
mime_type: application/json
error_visibility_timeout: 2
debug: false
http_path: /
sqs_verify_checksums: true
connect_timeout: 2
visibility_timeout: 10
keepalive: true

During the first initial deployment this file was not updated. Deleting this file and creating an AMI didn’t help, too. I assume that this file is generated by files from /opt/elasticbeanstalk. Using grep to find out from which configurations files the default.yaml is being generated didn’t yield anything. Doing a deployment later manual/automatically the file was updated with the correct SQS queue name. I assume this applies to the other settings, too.

If you know how this yaml is generated please leave a comment. I would be very interested to know the details.