For a project today I had to utilize files that I remember were is stored in an S3 bucket.

However, I did not particularly remember the exact location where the files were stored. The easiest way for me to identify the files would be to get a list of all the files that were in all the folders in all the buckets on S3 and export them to a text file. I could then simply use the search functionality to identify the files I need.

Here is an easy way to do this:

s3cmd is invaluable for this kind of thing

$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket.txt

I also discovered a new tool called s5cmd and apparently s5cmd is blazing fast and seems to have most of the functionalities that were there in s3cmd and s4cmd

s5cmd is a very fast S3 and local filesystem execution tool. It comes with support for a multitude of operations including tab completion and wildcard support for files, which can be very handy for your object storage workflow while working with large number of files.

# using s4cmd
s4cmd ls -r s3://tempbin/ | awk '{print $4}' > tempbin.txt

# using s5cmd
s5cmd ls s3://tempbin/* | awk '{print $4}' > tempbin.txt

Download a single S3 object

s5cmd cp s3://bucket/object.gz .

Download multiple S3 objects

Suppose we have the following objects:

s3://bucket/logs/2020/03/18/file1.gz
s3://bucket/logs/2020/03/19/file2.gz
s3://bucket/logs/2020/03/19/originals/file3.gz