Built.io Blog

Using a Storage Service for a User Facing Website

,

orage services are convenient. It’s a lot easier than maintaining an in-house storage solution. But if your website has a significant need for users to upload and download files using a web interface, an in-house solution may look significantly more attractive to you. If you've split your product into a REST server and a web interface, the challenge might be even more daunting.

We faced this problem for a product we're developing. There is a REST server, and a web interface. To get to the REST server, you had to go through the web interface. We began with an in-house solution:

in_house1.png

Needless to say, it was cumbersome, manual, and just plain clunky.

We then decided on a storage service solution. At raw engineering, we love Amazon Web Services, So S3 was our natural choice. We had to decide on the best way to transfer a file from the browser to the S3 bucket. We concluded on using pre-authenticated uploads and pre-signed S3 queries.

storage_service.png

The details

S3 allows for browser based uploads using POST. Create an upload HTML form, with fields that contain authentication and verification for the upload. The upload form would only work for a set period of time, which is the expiry time. Here's an example using Python on the server side.

The key to the upload form is a “policy” document. This policy document contains the conditions under which an upload can take place. For almost all fields in your upload form, you need to have a condition specified in the policy document. Not doing so, will result in a failed upload. The following code demonstrates the creation of a bare minimum policy document whose lease will expire after one hour:

gmtime = datetime.fromtimestamp(time.mktime(time.gmtime()))
exp = gmtime + self.expiry_interval # timedelta(hours=1)
exp_time = exp.strftime("%Y-%m-%dT%H:%M:%SZ")

policy_doc = {
    "expiration": exp_time,
    "conditions": [
        {"bucket": bucket},
        ["starts-with", "$key", key-startswith],
        {"acl": "private"},
        ["starts-with", "$success_action_status", ""]
    ]
}
enc_policy_doc = base64.b64encode(json.dumps(policy_doc))

As we can see, the doc is a JSON object consisting of 2 top level elements: the “expiration” and “conditions”. The expiration of course, has the expiry time for this upload, which is set here to one hour from current time. The conditions element is an array of conditions for the form. The order of the conditions matter here, since they are matched with the fields in the form. The “starts-with” condition specifies what the target key to be stored in S3 should be. $key refers to the field name in the form that holds the key. You can add more conditions, as it corresponds to the entries in the form.

Once the policy document is prepared, we create a signature out of it using our S3 secret key:

signature = base64.b64encode(hmac.new(secret_key, enc_policy_doc, sha1).digest())

Now we can proceed to create the html form to be used for upload:

<form action=”https://yourbucket.s3.amazonaws.com” method=”post” enctype=”multipart/form-data”>
    <input type="hidden" name="key" value="${key}">
    <input type="hidden" name="AWSAccessKeyId" value="${acc_key}">
    <input type="hidden" name="acl" value="private">
    <input type="hidden" name="policy" value="${enc_policy_doc}">
    <input type="hidden" name="signature" value="${signature}">
    <input type="hidden" name="success_action_status" value="200">
    <input type="file" name="file">
</form>

The form has a number of hidden fields that are used to authenticate the upload using the policy document we prepared earlier. Pay particular attention to the order of these fields, and make sure that the file input field is last.

Next up is setting up a way to download the file from S3. The usual way to do this is using AWS’s REST calls. Unfortunately, doing this from a user's browser is difficult as it involves creating headers for the calls. S3 provides a way to do this, using an alternative called query string request authentication. We prepare a url which will perform a GET along with query parameters for authentication. The url contains 3 query strings: AWSAccessKeyId, Expires, Signature. The AWSAccessKeyId is the AWS access key. Expires is a unix timestamp that denotes when the request will expire. Finally, the Signature authenticates the request. The signature is constructed as shown in the sample below:

resource = "/%s/%s" % (bucket, key)
expires = int(time.mktime((gmtime + expiry_interval).timetuple()))
sig_data = "GETnnn%dn%s" % (expires, resource)
signature = urllib.quote_plus(base64.encodestring(hmac.new(secret_key, sig_data, sha1).digest()).strip())

The signature has to be url-encoded so as to be used in a URI. This means substituting the + symbol for %2B; and the forward-slash / symbol with %2F;. The resulting query string, usable in a browser, would look something like this:

http://yourbucket.s3.amazonaws.com/uploads/file_name.ext?AWSAccessKeyId=AKIAACCESSKEY&Expires=1342713594&Signature=N5r32QomDR9EXXcVB92FM%3D

So, there you go. Now you can upload and download directly from S3.

Subscribe to our blog