Running several production services, there are many point-in-time metrics of interest. While these can be calculated for the present instant, their dynamic properties are hard to measure this way. For some metrics there are easy-to-use tools like Google Analytics for page visits, for some other metrics are much harder to graph. For example, counting Facebook comments or G+ +1s requires several HTTP calls.
AWS provides several APIs to access CloudWatch, supporting a wide range of technology stack. For my use case I used a simple Bash script to collect and upload the data, packaged into a Docker container.
I’ve used a simple Dockerfile which installs some dependencies and the AWS CLI bundle, do some configuration, then it runs the monitoring script at every hour. These are the boilerplate stuff.
My aws_config/config is like:
And the aws_config/credentials is:
Don’t forget to generate an AWS Access Key, and attach a policy that allows putting monitoring data:
The monitoring script
This is the part where things get a bit more interesting. The monitoring uploading command is a simple one-liner, but how you calculate the metric value can be quite tricky. For a basic reference, this is a sample and not-so-useful script that uploads the current time:
For some practical examples, let’s count some social metrics from a site with an URL-list sitemap.
To count the sum of the Facebook shares:
The same for Twitter shares:
And lastly for G+ +1s, which is a bit more tricky:
Retrieving monitoring data is pretty straightforward using the official library, there aer only a handful of required parameters. First, the library must be included in the page:
Then a few global configuration is needed, as the region and the keys are needed to be set on a global object.
And then the actual call:
Also don’t forget that you’ll need to use an AWS key that has the required permission to read the metric statistics:
At this point, you have all the data you need, and you can plot/use it however you’d like. In my example I’ve plotted them to a very simple line chart.
Limitations of AWS
During this experiment, I’ve encountered several shortcomings in the AWS APIs that severely limits the usefulness of this monitoring approach. These are:
- Currently it is not possible to limit the GetMetricStatistics policy to a single metric, it is an all or nothing switch
- There is no way to rate limit an access key. Currently if an adversary obtains your AWS Access and Secret keys, she can flood it with requests which effectively costs you money without limits.
- AWS CloudWatch retains data for 14 days, so that’s the oldest point you can get back.
The effect of the first two is that you can’t publicly disclose your AWS keys, as you might inadverently disclose information of other metric statistics (#1), and you also open an attack surface to your account balance (#2). Of course these can be mitigated by hosting an API and enclose the keys inside it, but then you need to care about scaling it (for the data collection you already need a server, but it does not need to be scaled). Architecturally it would be far better if you could just distribute the keys and it would just work.
The current data retention limits (#3), severely impact the usefulness of historical graphing. While it allows the examination of short term dynamic properties, it makes it less useful as an analytical platform. It should be possible to retrieve older data, even for a fair price.
I started to examine CloudWatch hoping that it can be made to a universal and architecturally clean analytics solution. While it lives up to this promise in some ways like use of use, it eventually falls short in both aspects. Its main use is to monitor instances and service health, and while it is very good at this, there are some important features missing before it can be used for historical monitoring. That said, for visualization of short term dynamics it is still an usable, easily scriptable and versatile solution.