Sharing hints, tips, experience, ideas and other cool stuff about Amazon Web Services


Read our Blog

More announcements coming.

If you have been following  What’s new in AWS, you probably noticed there have been a lot of announcements during re:invent. Some of them got time to shine during the keynotes, some were released during the day, but we’ve noticed that there have been two planned released that were only mentioned in breakout sessions.

We’ve written them down for you here

CloudFormation Drift Detection

This first one was talked about in DEV317 – Deep Dive on AWS CloudFormation (videoslides).

Drift Detection is a CloudFormation feature that will be released (soon) in 2018 and will make it possible to view and detect changes made outside of CloudFormation to resources managed by CloudFormation.

This will make it possible to be confident that the changeset that CloudFormation wants to execute actually matches the changes that will occur. It will also help you to proactively investigate why there were changes outside of CloudFormation in the first place.

Service Discovery (for ECS)

The second announcements was a complete session: CON403 – Introducing Service Discovery for Amazon ECS (no slides/video yet).

Service Discovery consists of two parts, a new feature in Route53 and a new feature in ECS and will be released in Q1 2018 (the description if the talks even mentions January).

In Route53 there will be a new API to create Namespaces, define Services and register Instances (in this case instances means ECS tasks) under them. These services can be queried via DNS. R53 supports A records (returning up to 8 ips per query) and SRV records (for ip + port combinations).

ECS can use these APIs to register your tasks. It will even go a step further and not only use the Route53 Healthchecks, but also update Route53 on ECS events like service scaling and task health.

Both management and service registery APIs will be availble from Route 53, so you can also build your own integration for workloads outside of ECS

Windows servers patching with AWS EC2 Systems Manager


Amazon EC2 Systems Manager is a collection of capabilities that helps you automate management tasks such as collecting system inventory, applying operating system patches, automating the creation of Amazon Machine Images (AMIs), and configuring operating systems and applications at scale. It is available at no cost to manage both your EC2 and on-premises resources!

Amazon EC2 Systems Manager relies on the Amazon Simple Systems Management Service (SSM) agent being installed on the guests. The SSM agent is pre-installed on Windows Server 2016 instances or Windows Server 2003-2012 R2 instances created from AMI’s published after November 2016. You need at least SSM agent version 2.0.599.0 installed on the target EC2 instance.

In this article we will focus on using Systems Manager to apply Windows Updates to EC2 instances. Patch Management is always an operational pain point so its welcome that AWS offers a solution.

You start by creating groups of instances by applying a tag called ‘Patch Group’. Then you create a group of patches by forming a patch baseline containing and excluding the patches you require (or use the AWS default patch baseline). At last you create a maintenance window to have your patch baseline attached and applied to a patch group. The actual ‘Patch Now’ run-command is nothing more than an API call, so there’s no obligation to use Maintenance Windows. Personally I’m a fan of Rundeck, so I’ll show you how to have the patches applied to the instances using both methods.

Configure your instances

The guest SSM agent setting inside with Windows OS requires permissions to connect to AWS EC2 Systems Manager. We grant these rights by creating an EC2 Service Role with the policy document ‘AmazonEC2RoleforSSM’ attached. Then you can attach this role to your instances. The instance also needs outbound internet connection to be able to connect to SSM. This can be either through an Internet Gateway or a NAT Gateway (or NAT Instance).

If you have this done right, your instance(s) should pop-up under ‘Managed Instances’ in the EC2 console:

Take note of the SSM Agent Version. As said earlier it must be at least version 2.0.599.0. The Systems Manager Service also requires a “Patch Group”-tag on the EC2 instance. The key for a patch group tag must be Patch Group. Note that the key is case sensitive. The value can be anything you want to specify, but the key must be Patch Group.

If done correctly, your tag will be picked up by SSM. You can confirm this on the ‘Managed Instances’ page:


Patch Baselines

AWS provides a default Patch Baseline called ‘AWS-DefaultPatchBaseline’. It auto-approves all critical and security updates with a ‘critical’ or ‘important’ classification seven days after they have been released by Microsoft. If you’re happy with that you can use this baseline. If you’re not, you can simply create your own according to your requirements: set approval for specific products and patch classifications, exclude a specific KB etc

Once your happy with your baseline, you can hit ‘Create’. Now assign it to one or more Patch Groups (or make it the default baseline and throw away the AWS one). Hit the ‘actions’ menu and chose ‘Modify Patch Groups’

Type the names of the Patch Groups you defined when tagging your instances

Your baseline is now attached to the specified patch groups. You can now start evaluating your instances against the baseline, and update them accordingly.


Applying the patch baseline to a specific instance or to a patch group is nothing more than executing an AWS SSM run command. You can schedule this run command through AWS SSM ‘Maintenance Windows’, a cron job on a server (like Rundeck) or manual through the AWS Console.

Let’s first check everything manually. In the AWS EC2 console, go to ‘Run Commands’ and create a new Run Command. Select the ‘AWS-ApplyPatchBaseline’ command document and pick an instance run this on. For the ‘operation’, choose ‘Scan’. This will evaluate the instance against the baseline without installing anything yet.

Once the run command finishes, you can go back to the ‘Managed Instances’ page. Highlight the instance(s) on which the run command was executed and click on the ‘Patch’ tab. Here you can see the result of the scan:

To actually install the missing updates, execute the same run command document, but now with the ‘Install’ operation. This will install the missing KBs to the instances and reboot them if needed.

Or execute the following aws cli command to accomplish the same:

Maintenance Windows

In stead of manually starting a run command or cron job, we can also use the AWS provided Maintenance Windows feature. Systems Manager Maintenance Windows let you define a schedule for when to perform actions on your instances such as patching the operating system. Each Maintenance Window has a schedule, a duration, a set of registered targets, and a set of registered tasks.

Before actually creating a Maintenance Window, we must configure a Maintenance Window role. We need this so Systems Manager can execute tasks in Maintenance Windows on our behalf. So we go to the IAM page and create a new role. We pick an “EC2 service role” type and make sure to attach the “AmazonSSMMaintenanceWindowRole” policy to it. Once the role is created, we must modify it. Click “edit Trust Relationships”. Add a comma after “”, and then add “Service”: “” to the existing policy:

Back to SSM now to actually create the Maintenance Window. Give it a useful name and specify your preferred schedule. I’m setting ‘every 30 minutes’ just for demonstration purposes, but in a real setup you would most probably choose something like ‘Every  Sunday’. You can also configure your own Cron expression.

This leaves us now with an empty Maintenance Window: there are no tasks nor targets associated yet.

To assign targets to the Maintenance Window, click on the “Register new targets” button on the “Targets” tab. We dynamically select the targets by using the “Patch Group” tag.

We will now have an ID linked to our “dev” Patch Group. This “Window Target ID” is used in the next step.

From the “tasks” tab of the Maintenance Window, click on “Schedule new task”. Pick the “AWS-ApplyPatchBaseline” document. Under “Registered Targets”, select the correct Window Target ID. For the operation, select “Install”. For the “Role”, select the IAM role with the AmazonSSMMaintenanceWindowRole attached to it (the one we created earlier). Set your preferred concurrency level and register the task by clicking on the blue button. The end result should look like this:

Now we have to wait for the schedule of the Maintenance Window. In this example we specified ‘every 30 minutes’ as a schedule, so the waiting shouldn’t take too long. Under the ‘History’ tab of the Maintenance Window you can follow all actions. The Maintenance Window will simply launch a Run Command, so you could go to that console screen too. If you enabled logging to S3, you could find the output of the Run Command over there. If not, you can view a (truncated) output via the Run Command itself:

If we now go back to the “Managed Instances” page and look at the “Patch” tab of our test instance, we will see it is not missing any updates anymore!


Success! Another  on the Automation checklist!



Cloudar as a Next Generation MSP

Historically, Managed Service Providers (MSP) had the broad responsibility of taking care of customer environments from the bare metal up to the Operating System and sometimes the Application level. In the new world of Cloud Computing, things are changing at a rapid pace and responsibilities are shifting. Where the datacenter and physical layer of cloud computing are the sole responsibility of AWS, MSP’s are moving up in the stack and will focus on cloud security, consultancy, application level monitoring, cost control and high availability.

What can one expect from a (as AWS tends to call it) Next Generation MSP? It all starts with knowledge. An MSP has engineers that are both thoroughly trained and certified on AWS. At Cloudar we have a development track that makes sure all our engineers are AWS Certified. We not only have a high percentage of Associates, but also a good number of Professionals, and even one holding all five certificates. This creates an internal eco system, facilitated by Slack, where knowledge is shared and customer issues are quickly discussed and solved. Cloudar is an Advanced Consultancy Partner, with a broad network within AWS. 

Where automation was important in traditional managed hosting, it is vital in an AWS environment. And thanks to the broad range of API’s available on AWS, the sky is the limit. Now why is automation that important nowadays? I see three reasons. First in terms of cost. Build Once, Deploy Many. If you need to repeat a task, script it. It will be cheaper in the long run. Second, and this is often overlooked, in terms of security. A lot of security issues stem from human error. By scripting repeatable processes, and peer reviewing these scripts, the chances of creating unintentional security holes are greatly diminished. Third in terms of usability. When using proper source control and deployment tools, everyone can deploy new environments or applications with a click on the button. A next-gen MSP will apply all these skills to setup and manage your environment.

A traditional MSP was mostly concerned with threshold based monitoring. Are my servers still online, is my hardware healthy and are my disks not full? While some of these are still very relevant, monitoring will shift more towards application level monitoring. From uptime of servers to uptime of applications. At Cloudar, on top of traditional threshold bases monitoring, we offer Application Performance Monitoring and even Real User Monitoring. This way you not only know whether all components of your application are healthy, but also that the application itself works within expected boundaries. We have our standard set of monitoring tools to deal with this, but are also happy to assist you in using third party tools like Datadog or New Relic. You will get your own dashboard to check your environment health at any time.

It has never been easier to build highly available environments. In the old days, it took weeks if not months to setup multi-datacenter solutions. From ordering hardware, configuring global load balancing, storage replication, VMWare SRM… AWS has all infrastructural requirements built in. This means it has become second nature to always start from a high availability scenario with at least two Availability Zones in mind. From there on, a next-gen MSP will look at your workloads and determine what the best way is to run them in the cloud. In all this, Cloudar acts as your trusted partner, and determines what the best course of action is. This can range from a traditional lift and shift, over cloud optimized to a new cloud native deployment together with one of our application development partners.

Controlling costs is in the DNA of AWS, we made it our own to do the same. This means we will not only design the most cost effective environments for our customers, but also will continuously assess whether this is still the case during the lifetime of your setup. We do this on two ways. Primarily, we follow up on all new AWS announcements and will check what the impact can be for our customer base. A small example can be seen in a previous blog post: About AWS and Saving Money. If we see ways you can save money by using new services, we will let you know. Second through the use of Cloudcheckr. This tool will scan your environment and will make recommendations on downsizing instances, unused resources, buying RI’s and all other cost saving options.

As you can see, things are changing in MSP land. An MSP does not solely host your servers anymore. It is your partner in a cloud world that lives by principles and processes of DevOps. Cloudar is born in the cloud. Many of the changes traditional MSP’s need to make to stay on board, are in our DNA.

Cloudar, a short recap…

Today, exactly 2 years ago, Senne Vaeyens and myself hired Ben Bridts as our first employee at Cloudar.
What started out as a great idea transformed into a solid business model and shaped Cloudar as the company it is today. With 15 dedicated full-time engineers on the payroll we are now able to provide first class support and advice in AWS.
Trusted by many customers (from startups to large enterprises), we have proven our expertise in Amazon Web Services, DevOps and Managed Services along the way and we’re planning to take this to an ever higher level in 2017.
In 2017 we will:
– Grow our business and keep on extending the team (feel free to contact me if you would like to join)
– Take our solid partnership with AWS to the next level
– Establish more vendor partnerships with AWS Technology Partners
– Extend our customer base
– Obtain several AWS competences, including the AWS Managed Services, DevOps and Big Data Competences.
– Improve customer service and provide top-notch AWS support and expertise to our customers
– Explore new markets and technologies

All of this wouldn’t be possible without the help of our great team, so please join me in giving a big thumbs up for the entire Cloudar Team.

Cloudar team meme

If anyone would like to know more about the services Cloudar can provide to you as a customer, feel free to drop me an mail at or send me a PM.



Using the Application Load Balancer and WAF to replace CloudFront Security Groups

If you’ve been using a Lambda function to update security groups that grant CloudFront access to your resources, you may have seen problems starting to appear the last few days. There are now 32 IP ranges used by CloudFront, and you can add only 50 rules in a security group. This seems fine, but if you want to allow both HTTP and HTTPS,  you’ll have to split the 64 rules over two groups. This may limit you in other ways, as you can add only 5 security groups to a resource.

You can replace this lambda with the recently launched WAF  (web application firewall) for ALB (application load balancers) .

Here is how to do that (assuming you already have a CloudFront distribution and Application Load Balancer setup).

CloudFront configuration

  1. Go to the “Origins” tab of the Distribution you want to use and edit the origin that’s pointing to your ALB.
  2. Add a new Origin Custom Header. You can use any header name and value you like, I opted for “X-Origin-Verify” with a random value
    edit origin

WAF/ALB Configuration

  1. Go to the WAF service page and create a new Web ACL
  2. Give the ACL a name and select the region and name of your ALB
    acl config
  3. Create a new “String matching condition”. We’ll create one called “cloudfront-origin-header” that will match when our custom header has the same random value.
  4. (Optional) If you want to allow your own ip, without the secret header for testing purposes add an “IP match condition” that will match the IPs you trust. We have named that condition “trusted-ips”
    ip condition
  5. Now we can create a rule to allow requests that match the conditions we created. Click on “Create rule”  to create a rule for all requests with our custom header.
    header rule
  6. (Optional) Do the same for a rule with the IP condition
    trusted ip rule
  7. Configure the ACL to allow the rules we just created and block all requests that don’t match any rules
    acl create


If you surf directly to the ALB with an untrusted IP address, you should now see a 403 page:


However, when you add the Custom header, or go through CloudFront, you are allowed to visit the website:

curl alb allowedcurl cloudfront


This service is very new, so while setting this up, we ran into some rough edges. We’ve opened  a support request so that AWS can look into fixing those.

  • You can’t see the ACLs you created inside a region (WAF for CloudFront is a global service) if you use the CLI. According the the documentation, you should be able to do this if you override the endpoint url. At the time of writing this gives errors. If you want to try if this has been fixed you can use this command: aws waf list-web-acls --endpoint-url
  • Currently there are no metrics available for the WAF inside a region (even though you have to specify a metric name for the rules and conditions you create).
  • If there are no healthy hosts in the target group of your ALB, you will always get a 503 error response. Even if the requests gets blocked by the WAF.

Troposphere helper functions

Here at Cloudar we write a lot of CloudFormation to provision AWS resources. We really like the way CloudFormation creates resources in parallel and how it provides an easy way to clean up all created resources.

However writing CloudFormation can be a bit of a pain. Even though AWS made this a lot easier with YAML support, for  big templates we still use Troposphere. Troposphere is a python package that provides a simple (one-on-one) mapping to CloudFormation. It has some advantages like offline error checking, but its greatest assets is that it can be used in combination with a real programming language.

Having python available to write templates, leads to writing helper functions to simplify some verbose constructs and bundle commonly used resources. Today we’re happy to publish this as on open source package.

You can find our troposphere helpers on pypi or on github.

Here is an example of how it can simplify CloudFormation code:

This is the normal way to add Option Settings to an ElasticBeanstalk configuration in Troposphere:

Using our helper function this can be reduced to:

And this is only one of the different functions in there (and we expect to keep adding to this over time).

Do you have any helper functions you’ve written? Let us know!

Using Route53 to support on-demand applications

One of the best ways to save money on AWS is turning resources off when you don’t use them. This is pretty easy to automate if you have consistent usage patterns (like an application that’s only used during business hours), but can be harder if the usage is very irregular (for example an application that’s only used a few times per quarter).

We recently worked with a customer that had some applications that could be without usage for months. To be more cost efficient, they were looking for a solution where:

  • They could turn off as much instances and services as possible
  • The users could start the application with one button click if they needed to use it
  • The users didn’t have AWS credentials

We came up with the following solution to satisfy these requirements, and if you’re running the same kind of applications, maybe you can also reduce costs by implementing this.

Failover Diagram

This solutions works by taking advantage of the Route53 health checks. We’ve split up our infrastructure in two parts: an always-on part that uses low cost or usage based services to provide the user with a way to start the real application; and a part that can started and stopped on demand.

We configure the on-demand part to be the primary resource in Route53 and the always-on part as a failover. This way the traffic will be routed to the real application if it’s online, and the user will get a static webpage that gives him the option to start the application if it’s not.

If we look at how this would go if the application is offline, these are the steps that would happen:

  1. The user requests the DNS record for from Route53. Because the real application is offline, Route53 will respond with the recordset of the fallback CloudFront distribution.
  2. The user request a page from CloudFront. CloudFront will get this from S3 and serve it to the user. This page contains an explanation of why the application is not available and a button to start it.
  3. When the user clicks the button, it uses javascript to call the API Gateway and invoke a lambda function.
  4. The lambda function calls Service Catalog or CloudFormation (depending on your environment) to start the real application
  5. When the application has started, the health check will pass, and Route53 will start returning the recordset for the CloudFront distribution that is linked to the application
  6. When the user uses the new DNS records, it will go through the second CloudFront distribution and to the real application

Some things to keep in mind.

This only a high level overview of a possible solution. To implement this, you would also have to consider the following:

  • After starting the Application, the static webpage should refresh the page, to force the browser to do a new DNS lookup.
  • CloudFront will cache errors for 5 minutes by default. Decreasing this will make the failover go faster.
  • The TTL of an CloudFront DNS record is 60 seconds

No Limit?

[A:] No no limits, we’ll reach for the sky!download (4)
No valley to deep, no maintain to high
No no limits, won’t give up the fight
We do what we want and we do it with pride
No no, no no no no, no no no no, no no there’s no limit!
No no, no no no no, no no no no, no no there’s no limit!

No limit? Well, actually there is. Several actually. And that became painfully clear yesterday, when I was scripting the new environment for one of our customers. Not using Troposphere, so it can more easily be managed by non-Python savvy people.

What they need is not that special. They want to be able to deploy identical environments fast and easy. Not very complex environments either. Mainly EC2 and RDS. Say 10 servers and 5 DB instances.

But you know how it goes. All servers in an environment have different disk layouts. Different instance types. Different availability zones. And while the requirement now is to deploy completely identical environments, you know the day will come someone will come up to you and ask: why are we using SSD disks in our Dev environment? Why are those partitions so large in Test? So it’s best to be prepared, and allow for some flexibility. The plan was to create a CloudFormation script, and deploy it using Ansible. All configurable parameters can then be put in Ansible in an easy Yaml structure instead of -for example- a JSON parameter file.

So I started writing the code to create one server and its backend RDS instance, thinking: if I get this straightened out, it’s just a matter of copy pasting it for most other servers and instances, and setting server specific parameter values in Ansible. Well, pretty soon I hit the first AWS limit: one can only have 60 parameters for a CloudFormation template. I had many more. Bummer. I first looked into nested Stacks to overcome this limit, but as you can’t pass parameters straight to a child stack, they were not the answer here. They are an answer to a different problem though, but more on that later.

The best way to work around parameter limits, are mappings. It’s not ideal though, as my goal was to only configure new environments by creating a new playbook in Ansible and never having to touch the template code for this. Unfortunately, that is not an option. I now create a mapping per environment, and configure most variables there. The environment to deploy is passed as a parameter, which can then be used to search through the mapping and values are read using the Fn::FindInMap function. Pretty much as show below:

So yeah, I was pretty pleased with the result. I was able to rewrite my code and transfer a lot of parameters to mappings. A new environment would now mean creating a new entry in the map. Not that big a deal. And hey, one can have a hundred mappings per template. We will never have that many environments. We are golden! Well… until I started to copy and paste all mapping entries… There I hit the second limit. One can only have a maximum of 63 mapping attributes. OK, that is 33 more than what is stated in the official documentation, but with the variables I wanted and the amount of servers, that was not nearly enough.

Now what? Well, back to the Nested Stacks. While they are not an answer to the parameter limit, they are to the mapping attributes boundary. When I create a child template for each type of server with its RDS instance, I don’t need that many mapping attributes per template, and all is well again. You can also pass parameters from parent to child, like this:
And in the child stack you declare the parameter again and pick it up with a Ref:
Granted, at first sight it adds more complexity to the code. On the other hand it makes it more modular, and we probably are now safe from some other limits like the amount of resources per template, the maximum size of your template file or the total amount of swirly brackets you can have in one template. Actually I made that last one up, but for a complete list you can check the documentation at

About AWS and saving money, new EBS disks, backups… and beer

AWS is one of the few companies that actually try to take less money from their valued customers. Take the Trusted Advisor as an example. A web application available to everybody with an AWS account that will tell you where you are spending too much money on AWS resources. Resources you don’t use or are underutilized. It will tell you exactly how much money you can save by downgrading instances, removing idle load balancers or downsizing EBS volumes.

We at Cloudar try to incorporate that same philosophy. We actively seek for ways to make our customers pay less money for more or less the same level of service.

One way we could save quite some money for one of our customers recently, was by starting to use the newly introduced HDD drives. They are the answer to all your backup to disk needs.

Previously, using magnetic storage (standard) was not always a valid option. The max throughput was considerably lower than the throughput of an ssd volume. Another issue was the maximum volume size, which was 1 TiB. If you needed large amounts of data to be stored on disk, this was not always enough.

So we h2016-04-27 20_42_24-Product Detailsave a customer who was, for reasons mentioned above, backing up Oracle to disk on ssd. Right when we received the message from AWS that sc1 storage was available, we contacted our customer and asked them whether they wanted to cut their EBS cost for backup to disk in 4.

They did.

Looking at the figures (, sc1 is the ideal volume for backups. It is cheap (one fourth of the price of ssd), and has a high throughput. In fact, it has a considerably higher throughput than a standard ssd volume! This at the expense of IOPS, where ssd still is king. But for backups, random IOPS are not important, throughput is.

And not only is it one fourth of the price of an ssd disk, it is even cheaper than S3 standard storage. And it is a lot cheaper than what you pay for S3 storage to store your snapshots. In case you were unaware: storing snapshots on S3 is not at normal S3 cost. It’s more than 3 times the cost of s3 storage. All things you need to keep in mind when finding the best scenario for you.

2016-04-27 20_43_01-Amazon Elastic Block Store (EBS) – Pricing

So that day, our customer dba started to write his backups to the new sc1 volume. The result? No change. It took exactly the same amount of time. What does this tell us? First that the disk probably was not the bottleneck in this scenario. It’s more likely that Oracle could not deliver the data fast enough to hit any limits on disk level. Second, sc1 is an alternative to ssd in distinct cases. Third, the customer now pays 1/4 of the price for his backups. It saves him a few thousand dollars per month. He is happy.

I would urge you try it out for yourself. Just add an sc1 disk (or an st1 depending on the scenario), and do the test. It’s cheap to test, and easy to throw away if it doesn’t suite your needs.

So always be on the lookout for new AWS announcements. One day, you will be able to save some dollars. Dollars you can then spend on other cool AWS features. Or, of course, on beer.


P.S. Prices in this post are for eu-west-1. They can differ per region and are subject to change.

Automating Windows migrations to Aws with Double Take Move and Ansible


When you’re a cloud reseller/architect you often get contacted by customers who want to migrate their infra to Aws.

Although I’m not really for the lift and shift way of working, sometimes there is no way around it.


Instead of spending hours of work on installing and configuring, exporting, importing, etc… we can now really get things going by using Double Take Move and Ansible.

For this article you need some basic knowledge of Ansible.

A good place to start is ( )


Double take move is really well made and very user friendly!

And the license cost to use this product is forgotten easily when you don’t have to spend hours in exporting-importing – troubleshooting these kind of moves.




Prep work

Ah yes, there is always some prep work to do. (more if your not already using Ansible now)


Lets first configure the Aws environment, create the needed vpc, subnets,vpn’s, seurity groups, roles,etc…


Make sure that the vpc cidr block and subnets match your current setup exactly.

(And also create the target servers in the correct subnets)

DHCP option set

Make sure that you create a dhcp option set if the servers you are migrating are on dhcp.



Public ip’s

Maybe for connectivity to work you will need to attach some public ip’s to your source and target servers.

We used this mostly in Azure to Aws migrations.


Ansible Setup

We have been using Ansible for quite some time now and we find that installing it on Centos or Ubuntu is the best way to go.

Below are the basic steps for Centos

  • We mostly use at least Ansible version 2, therefore we need to enable the the epel-testing repository to install Ansible. Edit the file under /etc/yum.repos.d/epel-testing.repo to enable it. Then run the below commands


Accept defaults for the keygen. Or change to your way of working (you can then push this key out to linux server, which are not in scope of this blog)

  • Install pywinrm

  • If your windows systems are in a domain (most of them normally are) install the Kerberos dependencies

  • You will also need the python part to this


Please read the kerberos documentation carefully, as your really need this to be correct and working.



Edit the /etc/krb5.conf file and change it to reflect your domain



When that is done you can test the connection is working by running the below command



If nothing is returned -> don’t panic!!! Then it worked !

You can then check your Kerberos ticket with the command




Under your /etc/ansible directory there is a hosts file.

It contains some examples in how to use an Ansible Inventory file.

Create yours any way you like.

But for the these migration you can do something like this



For each group you create here you can/must create a credential file with the same name.

So in this case a sourceservers.yml and targetservers.yml

Store these under the /etc/ansible/group_vars.

Content of these files for local users



Content of the file for domain users



You can also add it in the hosts file like so



Windows Configuration

Make sure your target servers are as identical as possible to your source servers.

So same os ,service pack,IP and disk layout and your good to go. (ooh and don’t rename your target server to the source server just yet, double take will complain and will not continue. But a usefull name is a good way to identify the target server)

You will need to do the below on all source and target servers, for the target servers you can maybe create an ami from which to deploy, depending in how many servers you need to migrate.

  • Configure winrm on all windows machines that you need to migrate (script for this can be found here :
  • Also make sure you have at least version 3 of powershell installed, so basically check all your servers that are below server 2012.
  • Preferably create an “ansible” user on those systems and allow it to connect through winrm (there is a local group called WinRMRemoteWMIUsers__, add it to this group. Also the local admin, else you will not be able to do everything that is needed here)
  • Because of Ansible’s way of spawning allot of connections I found that increasing the MaxShellsPerUser parameter for winrm to give less problems.

Command :

Hint: You can combine the above in the ConfigureRemotingForAnsible.ps1 that you download from the ansible site, by added the following on the bottom of the script

I found that in most cases you will need to reboot the server in order for it all to work correctly.



Ofcourse we need to modify some firewall rules here and there.

Make sure that ansible can reach your servers on 5986 tcp.

Also make sure that source and target servers can speak with each other directly over port 6320 and 6325 tcp and udp.

The double take console will also need to speak with all servers on these ports.


Note: ofcourse make sure that all other needed rules,routes,vpn’s,etc are in place for your servers.


Test test test

We can now test the connection to the windows servers.

(If you are using the domain credentials make sure you have a valid Kerberos ticket first.)

Run the following

to verify the source server connections

to verify the target server connections


Double take console

I’m not going to go into details here but on the machine you have installed the double take console you add all the servers (source and target), attach the licenses to them and setup full server replication jobs with the parameters of your choice.

Wait before failing over, we will need some more playbooks depending on your server licensing.



On to the interesting stuff, unless you want to manually install double take software on all servers then go do that now 🙂


I downloaded the doubletake software, and unzipped the following directory /setup/dt/x64 folder and placed it in a S3 bucket. If you have 32-bit servers extract also the 32bit folder. The below examples only use the 64bit installer… if the need arises we can create also the 32bit playbook.

Make sure the files are public else you will not be able to download it on the source servers, use a the readonly S3 policy attached to a role for the targetservers.

Before uploading also modifie the DTsetup.ini file to allow a quiet installation. (modify it anyway you want, make sure that the diskqueue folder has around 20gb of free space)



When the above is done, we can continue to write our playbooks.

Write the following playbook, place it in the /etc/ansible directory.



Then create the following directory structure


The create the following “role”, save it as main.yml


Now we can test the doubletake installation like this

Or if you encrypted the files with vault then


If everything is working as it should then doubletake should be installed everywhere, nice and fast no?



Windows Licensing


It all depends what you want to do, but this example will change the windows activation to the Aws kms servers, thus using the aws licencing instead of your own or…

Source : “Unable to activate Windows”

Ok that will be a lot of manual work, so let’s not do that.


Since ansible is still a work in progress I found that the module win_unzip does not work all the time.

Therefore I chose to put the ec2install.exe also in an S3 bucket.

(wanted to do download the latest ec2config service from amazon and unzip it , then install it…if it works better in the future I’ll make an update)


Write the following playbook



The create the following directory structure


Then write the following main.yml and place it in the dir above


DNS forwarder

If you have a domain running you probably also have windows dns, because you now going to move to aws, we need to change to forwarder to aws.

!The below script will replace all your forwarders! If you don’t want this then there is also ‘add-dnsserverforwarder’ and ‘Remove-DnsServerForwarder’.


So maybe create a group in the ansible host file [activedirectory]


To find the ip for the forwarder take your VPC cidr block and change the last digit to 2.

Example: the dns forwarder is at

source : (VPC Subnets –> subnet sizing)


Create the below playbook under /etc/ansible





Right we have the necessary components now, let do the failover to aws

In the double take console start failing over your servers, best to start with the core servers, like AD, then maybe SQL, exchange.

Then applications servers and webservers… it’s really up to you


When everything is failed over, check to see if ansible is able to reach your servers.

(With Kerberos or local user)

Also it can get confusing now because your target servers are also your source servers now! 🙂


Anyway, run the setdnsforwarder.yml first to make sure you have internet access.

Then run the windowsactivation.yml


Everything should now reboot and come back online, activated with the aws kms server.

Since this is a repeatable process you can first do a testfailover, test this out, tune where needed, then do the actual failover.


If you have questions or just don’t want do to this yourself, contact us by email or phone (+32 3 450 80 30).


Go Automate Something!