3 critical steps to a well architected Cloud Migration project

Architected Cloud Migration

When embracing the change that accompanies any platform modernization project, it’s important to have a workable plan and execution strategy for cloud migration.  Many organizations have some understanding as to the business impact of their data failing or being breeched. But do they fully understand what it means to not fully execute and test their migration plans? Below are 3 best practices for developing a solid backup strategy and preparing your organization for the worst.

Failing to Plan is Planning to Fail

The most critical part of any cloud migration is the planning process. Lincoln said that “if you give me 6 hours to chop down a tree, I would spend the first 4 hours sharpening the axe.” That is undeniably true in this scenario.  When building a cloud migration plan at Fortified Data, we go through an extensive checklist that determines the organization’s data maturity and uncovers the client’s disaster recovery requirements. And of the categories on that checklist, most all of them happen before we move a single bit of data. We get an understanding of what data needs to move, how the destination could be impacted by the new load, and where the risk points are; then we mitigate the risk points by adjusting the plan. Developing a plan requires a stepped approach. Create the plan, test the plan, then adjust the plan for success.  

To prevent against a failure, we design backup strategies and rollback plans to support business data if the migration isn’t successful. We then further test the migration so downstream applications validate once we completed the migration. Again, all of this happens before we move any data.

Expect a Disaster, then Plan to Recover

A disaster recovery plan can be quite complex. You should never wing it. Every organization needs to plan for a failure and have it in place before executing an enterprise level server migration. When thinking about a backup strategy, we examine at 2 factors: the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). These two variables measure both the ability to recover files by specifying a point in time restore of the backup, and the time until systems are completely back up and running. Ultimately, these two set of numbers tell us how much data you’re willing to lose in the advent of a failure, and how long your systems can be down. For some organizations, these variables are government regulated, but for most it’s a numbers game that equates directly to lost revenue.

Some organizations simply can’t take an extended maintenance time window, so we use technology such as database mirroring or transaction log shipping to act as a hot standby type server (The data is synchronized between multiple systems for the premigration phase). We wait until the final moments before the actual migration, shut down all applications, then flip the switch and make the move during a defined window of time. In reality, we’re simply turning on a server we’ve pre-staged. There are a few strategies like this that work to significantly reduce the risk of downtime, and we work closely with our clients, so the business is not affected in a significant way.

Test your theory, then test it some more. Then have someone else test it just to be sure

It’s not enough to simply have a plan. All too often, organizations go through their migration checklist and think, “Yep, we have a backup. Check.” Many times they haven’t validated the backup by running a test or by restoring it. The only good backup is a one that has been restored and you’ve validated it works the way it’s meant to.

There are many tools in the market to help in the migration process, whether it be from on-prem to the cloud or even from the cloud to on-premises. Third party tools are great. They can reduce the work load for a resource-strapped team and provide guidance through a complicated process. But no matter which tool you use, you can’t over plan enough to make up for major mistakes that we’ve seen through lack of proper planning and testing. Often the result for an untested plan is an unexpected result.

In the case of MySpace, which lost 12 years of data due to a server migration error last week – not only did they lose credibility in the market, but years of music and cultural artifacts were deleted in one technical process, something that never should have happened. We have to ask how one company can fail on one backup and be out of 12 years’ worth of data. There must have been multiple issues happening at the same time – the migration wasn’t planned well, and it wasn’t tested, and if you’re losing 12 years’ worth of data, there’s a fundamental disaster recover problem that they need to address.