List of All Files in an S3 Bucket Recursively
For a project today I had to utilize files that I remember were is stored in an S3 bucket.
However, I did not particularly remember the exact location where the files were stored. The easiest way for me to identify the files would be to get a list of all the files that were in all the folders in all the buckets on S3 and export them to a text file. I could then simply use the search functionality to identify the files I need.
Here is an easy way to do this:
s3cmd is invaluable for this kind of thing
$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket.txt
I also discovered a new tool called s5cmd
and apparently s5cmd
is blazing fast and seems to have most of the functionalities that were there in s3cmd
and s4cmd
s5cmd is a very fast S3 and local filesystem execution tool. It comes with support for a multitude of operations including tab completion and wildcard support for files, which can be very handy for your object storage workflow while working with large number of files.
# using s4cmd
s4cmd ls -r s3://tempbin/ | awk '{print $4}' > tempbin.txt
# using s5cmd
s5cmd ls s3://tempbin/* | awk '{print $4}' > tempbin.txt
Download a single S3 object⌗
s5cmd cp s3://bucket/object.gz .
Download multiple S3 objects⌗
Suppose we have the following objects:
s3://bucket/logs/2020/03/18/file1.gz
s3://bucket/logs/2020/03/19/file2.gz
s3://bucket/logs/2020/03/19/originals/file3.gz