James Cridland

How to only allow Cloudfront, and your own development team, to see your publicly-available EC2…

The way many of us use AWS Cloudfront and EC2 together is to have an origin server on EC2, and then a Cloudfront distribution over the top of it. That means that your website is available in two places: apublicurl.com and the origin server, so that Cloudfront can see it, ataprivateurl.com.

The benefit is that you can use your origin server for developmental purposes and to double-check what you can see before it’s cached by Cloudfront.

The downside is that anyone else can see your origin server too, and some spiders might come along and start spidering it. And that would be a mistake.

Yes, you *can* fiddle about with private subnets and things, but that breaks a few things: first, it means you can’t call your own scripts via aprivateurl.com which can be useful for cron; second, it means you can’t give the origin version to testing bots; and third, it means that your own dev team can’t see the origin server without a lot of messing about, and that’s bad.

Here’s the quick way to put controls on who can see it and who can’t.

Cloudfront

Set up a unique header that Cloudfront will send to your origin server. Mine sends something like X-Privateaccess=opensesame. Do this in AWS Cloudfront by clicking on the“Origins and Origin Groups” tab in your distribution, and adding these to your Origin Custom Headers.

Your origin server

Test for the presence of this origin custom header. If it’s not present, then tell it it to go away (or redirect to your public URL). The way I do this in PHP is, knowing the public URL:

if (!isset($\_SERVER\['HTTP\_X\_PRIVATEACCESS'\])) {  
  header('Location: '.$url\[‘public’\],true,301);  
  exit;  
}

That will conditionally block any traffic coming to your website without the header in place. (I’ve done it here, rather than in .htaccess, since there are some times where I don’t mind about that — so this is in a particular place in my codebase rather than everywhere).

Particularly, do something similar in your robots.txt if you can. Here’s mine:

if (!isset($\_SERVER\['HTTP\_X\_PRIVATEACCESS'\])) {  
  //This hasn't come via Cloudfront. So go away.  
  echo 'User-agent: \*  
  Disallow: /';  
  exit;  
}  
... rest of robots.txt goes here

There’s one more thing to do…

Your dev team’s browser

Download an extension that adds an HTTP header to your traffic, and set that header in the extension. (You could even set different values of the header for different people, for automatic server access login).

For ease of use, here’s some code for a full Chrome extension, since I didn’t trust any of them…

Save this as manifest.json:

{  
  "name": "Simple header adding thing",  
  "description": "Simple Chrome adding thing that does things",  
  "version": "1.0",  
  "manifest\_version": 2,  
  "icons": {"128": "logo\_128.png" },  
  "permissions": \[  
    "[http://yourorigin.example.com/\*](http://origin.podnews.net/*)",  
    "webRequest",  
    "webRequestBlocking"  
  \],  
  "background": {  
      "scripts": \[ "background.js" \]  
   }  
}

In the above, change the base URL to your origin server accordingly

Save this as background.js :

chrome.webRequest.onBeforeSendHeaders.addListener(  
  details => {  
    details.requestHeaders.push({name: '`X-Privateaccess`', value: 'opensesame'});  
    return {requestHeaders: details.requestHeaders};  
  },  
  {urls: \["<all\_urls>"\]},  
  \['blocking', 'requestHeaders'\],  
);

In the above, change the additional header name and value accordingly

…and make a colourful icon as a 128x128 png file. Save all three in a folder. In the Chrome Extension menu, turn developer mode on, and “load unpacked”. Job done.

Hey presto — your dev team can now see your origin server, and so can Cloudfront, but nobody else can.

Drawbacks

This is good, but isn’t perfect. A call to your origin server will still, of course, hit your origin server: just long enough to tell whatever it is to go away. This won’t give you protection from attacks, and calls won’t be cached. But it’s rather better than the “wrong” website address to get into search engines.