Demystifying direct uploads from the browser to Amazon S3 - with a full example in 167 lines of code

November 11, 2015 Amazon S3 JavaScript Node.js jQuery jQuery File Upload

If your web application stores user-uploaded files in Amazon S3, it usually makes sense to upload them directly from the browser. Otherwise you are hogging up your server bandwidth and cpu time. In particular, Heroku has a hard request timeout of 30 seconds, and large uploads or uploads through a slow user’s connection are impossible to complete through an app deployed to Heroku.

Implementing this may sound like a big complicated job, but in fact, it’s feasible without resorting to third-party libraries.

For cross-checking, here is the official documentation on browser-based uploads.

Bucket and user configuration

Obviously, you need to create a bucket to hold your files, at the S3 management console.

Also, you should create a minimal-privilege user instead of using your own credentials. You do that at the Amazon IAM management console.

After you create a user and save its key pair, you must declare its permissions by adding an IAM policy. The minimal policy that is required to upload files to S3 is:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::YOUR_BUCKET_ID_HERE/*"
            ]
        }
    ]
}

Note that this is different from the upload policy. This policy limits actions that are possible if you have both the access key and the secret key. Thus, if someone steals that key pair, the consequences would be few. (Remember that with your account’s root AWS key pair, an attacker can use all of AWS’ services which can be extremely expensive.)

Cross-Origin Request Support

CORS is a security feature that will only allow AJAX HTTP calls from approved domains. So you must let the S3 bucket know that you are going to make uploads from your website domain. Otherwise any requests will be considered unsafe and rejected by the browser.

You only have to do it once for each bucket, so I suggest you do it through the management console, though there’s an API call for that, too.

The minimal permissions are: allow POST requests from your application’s domain.

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <CORSRule>
        <AllowedMethod>POST</AllowedMethod>
        <AllowedOrigin>yourdomain.com</AllowedOrigin>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

How the upload works

In HTTP terms, the upload is a simple POST request to an S3 endpoint.

The request contains the file, a filename (key, in S3 terms), some metadata, and and a signed policy (more about that later).

The HTTP API is very straightforward (it’s not called a simple storage service for nothing). For example, it will not check that a file exists under the key you’re uploading, and silently overwrite it.

One concept that’s good to know is that S3 doesn’t have a directory structure. The structure you see in GUI clients is just a convenience. Internally, every file uploaded to S3 is referenced by a flat string key. This means you don’t need to bother with creating directories for your files.

What is a policy and how to construct one

The policy is like a ticket that permits the client to upload something to your S3 bucket. It’s a hash of permissions for various metadata.

The policy is signed with your application’s secret key. The secret key must be concealed, which is why the policy has to be constructed and signed on the server. In the simplest of scenarios, I guess, you can pre-sign a single policy for all future uploads, and get around having a server part to your app.

But, typically, it is important to only allow a safe subset of keys to be uploaded, ideally a specific key, because this is the only access control you get with uploading files - without a proper restriction, a client can replace any file in the bucket with his upload.

Another thing is controlling file size and ensuring the file has proper access level - typically public-read.

The policy has an expiration time, after which it becomes invalid. The best practice is to construct the policy on-demand just before the upload and set the policy lifetime to a minimum, like a couple of minutes. This minimizes the risk of someone reusing the policy.

I found it most convenient to have the backend prepare the entire set of parameters, rather than having the policy logic on the backend and the parameter logic on the frontend.

Here’s a minimal Node.js module to generate credentials:

var crypto = require('crypto');

// This is the entry function that produces data for the frontend
// config is hash of S3 configuration:
// * bucket
// * region
// * accessKey
// * secretKey
function s3Credentials(config, filename) {
  return {
    endpoint_url: "https://" + config.bucket + ".s3.amazonaws.com",
    params: s3Params(config, filename)
  }
}

// Returns the parameters that must be passed to the API call
function s3Params(config, filename) {
  var credential = amzCredential(config);
  var policy = s3UploadPolicy(config, filename, credential);
  var policyBase64 = new Buffer(JSON.stringify(policy)).toString('base64');
  return {
    key: filename,
    acl: 'public-read',
    success_action_status: '201',
    policy: policyBase64,
    'x-amz-algorithm': 'AWS4-HMAC-SHA256',
    'x-amz-credential': credential,
    'x-amz-date': dateString() + 'T000000Z',
    'x-amz-signature': s3UploadSignature(config, policyBase64, credential)
  }
}

function dateString() {
  var date = new Date().toISOString();
  return date.substr(0, 4) + date.substr(5, 2) + date.substr(8, 2);
}

function amzCredential(config) {
  return [config.accessKey, dateString(), config.region, 's3/aws4_request'].join('/')
}

// Constructs the policy
function s3UploadPolicy(config, filename, credential) {
  return {
    // 5 minutes into the future
    expiration: new Date((new Date).getTime() + (5 * 60 * 1000)).toISOString(),
    conditions: [
      { bucket: config.bucket },
      { key: filename },
      { acl: 'public-read' },
      { success_action_status: "201" },
      // Optionally control content type and file size
      // {'Content-Type': 'application/pdf'},
      ['content-length-range', 0, 1000000],
      { 'x-amz-algorithm': 'AWS4-HMAC-SHA256' },
      { 'x-amz-credential': credential },
      { 'x-amz-date': dateString() + 'T000000Z' }
    ],
  }
}

function hmac(key, string) {
  var hmac = require('crypto').createHmac('sha256', key);
  hmac.end(string);
  return hmac.read();
}

// Signs the policy with the credential
function s3UploadSignature(config, policyBase64, credential) {
  var dateKey = hmac('AWS4' + config.secretKey, dateString());
  var dateRegionKey = hmac(dateKey, config.region);
  var dateRegionServiceKey = hmac(dateRegionKey, 's3');
  var signingKey = hmac(dateRegionServiceKey, 'aws4_request');
  return hmac(signingKey, policyBase64).toString('hex');
}

module.exports = {
  s3Credentials: s3Credentials
}

As you can see, we’re not doing any API calls here or anything permanent, so the policy signature can be generated as many times as necessary and it doesn’t have to be persisted.

The most complicated part of all this is generating the signature. Take note that the intermediate hashing steps use binary keys, and only the final result is hex-encoded.

Actually uploading the file.

Once you have the parameters, all you need to send the file is a simple AJAX POST request. The only important detail is that the file field must be the last in the form data.

How to integrate into jQuery File Upload

jQuery File Upload is a popular jQuery plugin for file uploads. I bet you would like to use it for this upload, too - and it is indeed the easiest way to set up a user-friendly upload form.

// Requires jQuery and blueimp's jQuery.fileUpload

// Configuration
var bucket = 'browser-upload-demo';
// client-side validation by fileUpload should match the policy
// restrictions so that the checks fail early
var acceptFileType = /.*/i;
var maxFileSize = 1000000;
// The URL to your endpoint that maps to s3Credentials function
var credentialsUrl = '/s3_credentials';

window.initS3FileUpload = function($fileInput) {
  $fileInput.fileupload({
    acceptFileTypes: acceptFileType,
    maxFileSize: maxFileSize,
    url: 'https://' + bucket + '.s3.amazonaws.com',
    paramName: 'file',
    add: s3add,
    dataType: 'xml',
    done: onS3Done
  });
};

// This function retrieves s3 parameters from our server API and appends them
// to the upload form.
function s3add(e, data) {
  var filename = data.files[0].name;
  var params = [];
  $.ajax({
    url: credentialsUrl,
    type: 'GET',
    dataType: 'json',
    data: {
      filename: filename
    },
    success: function(s3Data) {
      data.formData = s3Data.params;
      data.submit();
    }
  });
  return params;
};

// Example of extracting information about the uploaded file
// Typically, after uploading a file to S3, you want to register that file with
// your backend. Remember that we did not persist anything before the upload.
function onS3Done(e, data) {
  var s3Url = $(data.jqXHR.responseXML).find('Location').text();
  var s3Key = $(data.jqXHR.responseXML).find('Key').text();
};

This code can work along all other fileUpload features, most importantly - progress events.

Demo

To demostrate all of this, I made a minimal working uploader with source on Github.

I think that S3 upload is one of those things that are fairly easy to implement yourself, but hard to wrap into a flexible library (for example, find one that matches your frontend/backend language pair). So don’t use a library! It’s just a couple hundred lines of code.

Buy Me a Coffee at ko-fi.com