I am currently implementing image storing architecture for my service.
As I read in one article it is a good idea to move whole
image upload and download traffic to the external cloud object storage.
https://medium.com/@jgefroh/software-architecture-image-uploading-67997101a034
As I noticed there are many cloud object storage providers:
- Amazon S3
- Google Cloud Storage
- Microsoft Azure Blob Storage
- Alibaba Object Storage
- Oracle Object Storage
- IBM Object Storage
- Backblaze B2 Object
- Exoscale Object Storage
- Aruba Object Storage
- OVH Object Storage
- DreamHost DreamObjects
- Rackspace Cloud Files
- Digital Ocean Spaces
- Wasabi Hot Object Storage
My first choice was Amazon S3 because
almost all of my system infrastructure is located on AWS.
However I see a lot of problems with this object storage.
(Please correct me if I am wrong in any point below)
1) Expensive log delivery
AWS is charging for all operational requests. If I have to pay for all requests I would like to see all request logs. and I would like to get these logs as fast as possible. AWS S3 provide log delivery, but with a big delay and each log is provided as a separate file in other S3 bucket, so each log is a separate S3 write request. Write requests are more expensive, they cost approximately 5$ per 1M requests. There is another option to trigger AWS Lambda whenever request is made, however it is also additional cost 0,2 $ per 1M lambda invocations. In summary - in my opinion log delivery of S3 requests is way to expensive.
2) Cannot configure maximum object content-length globally for a whole bucket.
I have not found the possibility to configure maximum object size (content-length) restriction for a whole bucket. In short - I want to have a possibility to block uploading files larger than specified limit for a chosen bucket. I know that it is possible to specify content-length of uploaded file in a presigned PUT urls, however I think this should be available to configure globally for a whole bucket.
3) Cannot configure request rate limit per IP numer per minute directly on a bucket.
Because all S3 requests are chargable I would like to have a possibility
to restrict a limit of requests that will be made on my bucket from one IP number.
I want to prevent massive uploads and downloads from one IP number
and I want it to be configurable for a whole bucket.
I know that this functionality can be privided by AWS WAF attached to Cloudfront
however such WAF inspected requests are way to expensive!
You have to pay 0,60$ per each 1M inspected requests.
Direct Amazon S3 requests costs 0,4$ per 1M requests,
so there is completely no point and it is completely not profitable
to use AWS WAF as a rate limit option for S3 requests as a "wallet protection" for DOS attacks.
4) Cannot create "one time - upload" presigned URL.
Generated presigned URLs can be used multiple times as long as the didnt expired.
It means that you can upload one file many times using same presigned URL.
It would be great if AWS S3 API would provide a possibility to create "one time upload" presigned urls. I know that I can implement such "one time - upload" functionality by myself.
For example see this link https://serverless.com/blog/s3-one-time-signed-url/
However in my opinion such functionality should be provided directly via S3 API
5) Every request to S3 is chargable!
Let's say you created a private bucket.
No one can access data in it however....
Anybody from the internet can run bulk requests on your bucket...
and Amazon will charge you for all that forbidden 403 requests!!!
It is not very comfortable that someone can "drain my wallet"
anytime by knowing only the name of my bucket!
It is far from being secure!, especially if you give someone
direct S3 presigned URL with bucket address.
Everyone who knows the name of a bucket can run bulk 403 requests and drain my wallet!!!
Someone already asked that question here and I guess it is still a problem
https://forums.aws.amazon.com/message.jspa?messageID=58518
In my opinion forbidden 403 requests should not be chargable at all!
6) Cannot block network traffic to S3 via NaCL rules
Because every request to S3 is chargable.
I would like to have a possibility to completely block
network traffic to my S3 bucket in a lower network layer.
Because S3 buckets cannot be placed in a private VPC
I cannot block traffic from a particular IP number via NaCl rules.
In my opinion AWS should provide such NaCl rules for S3 buckets
(and I mean NaCLs rules not ACLs rules that block only application layer)
Because of all these problems I am considering using nginx
as a proxy for all requests made to my private S3 buckets
Advantages of this solution:
Disadvantages of this solution:
I have to transfer all the traffic to S3 through my EC2 machines and scale my EC2 nginx machines with the use of autoscaling group.
Direct traffic to S3 bucket is still possible from the internet for everyone who knows my bucket name!
(No possibility to hide S3 bucket in private network)
MY QUESTIONS
Do you think that such approach with reverse proxy nginx server in front of object storage is good?
Or maybe a better way is to just find alternative cloud object storage provider and not proxy object storage requests at all?
I woud be very thankful for the recommendations of alternative storage providers.
Such info about given recommendation would be preferred.
Object storage provider name
A. What is the price for INGRESS traffic?
B. What is the price for EGRESS traffic?
C. What is the price for REQUESTS?
D. What payment options are available?
E. Are there any long term agreement?
F. Where data centers are located?
G. Does it provide S3 compatible API?
H. Does it provide full access for all request logs?
I. Does it provide configurable rate limit per IP number per min for a bucket?
J. Does it allow to hide object storage in private network or allow network traffic only from particular IP number?
In my opinion a PERFECT cloud object storage provider should:
1) Provide access logs of all requests made on bucket (IP number, response code, content-length, etc.)
2) Provide possibility to rate limit buckets requests per IP number per min
3) Provide possibility to cut off traffic from malicious IP numbers in network layer
4) Provide possibility to hide object storage buckets in private network or give access only for specified IP numbers
5) Do not charge for forbidden 403 requests
I would be very thankful for allt the answers, comments and recommendations
Best regards
Using nginx as a reverse proxy for cloud object storage is a good idea for many use-cases and you can find some guides online on how to do so (at least with s3).
I am not familiar with all features available by all cloud storage providers, but I doubt that any of them will give you all the features and flexibility you have with nginx.
Regarding your disadvantages:
Scaling is always an issue, but you can see with benchmark tests that nginx can handle a lot of throughput even in small machines
There are solution for that in AWS. First make your S3 bucket private, and then you can:
Note that both solutions for your second problem require some development