As the Director of Cloud and Virtualization Operations at Datapipe Inc., my day is comprised of overseeing and managing cloud technologies and operations. We manage and support multiple cloud and virtualization platforms, all of which need some sort of storage, whether it’s local to the hosts, shared, or global object-based storage.
Any given day is filled with global operational activities that demand attention and quick resolution by our support teams, including new deployments, upgrades, downgrades, capacity management, configuration changes, feature enhancements, and other customer-driven changes.
In addition to our normal daily activities, occasionally we need to find time to respond to unplanned outages or system failures. In a cloud environment, a single system failure may or may not mean anything to the active workloads depending on type of outage, but if the storage subsystem is impacted, then all workloads hosted on that storage are impacted.
Let’s face it, when hardware fails it impacts workloads running on that hardware. So we are constantly repairing and replacing hardware. Underneath all of this is where storage sits. Storage is where data lives and it needs to be available no matter what the status of the top tier hardware (i.e., the servers, hypervisors, etc). In a virtualized environment such as the Datapipe cloud, we need this storage up and running and performing at all times.
Let me give you some history around our cloud storage experience. Uptime, as you may know, is the most critical part of any computer system. Five nines (99.999%) uptime is very important in multi-tenant service provider environments, as getting a downtime window that is agreeable to every customer hosted in this environment is next to impossible and should not be a requirement in this day and age.
We have used several white box storage technologies from various vendors with a promise of performance and in-line upgrades with minimal downtime. What we found was that commodity storage is cheap and easy to deploy, but that’s about as far as it goes.
As soon as you put demanding workloads on a shared storage environment, the reality starts to surface quickly. The number one concern is noisy neighbors, where one workload competes for IOPS and throughput against the rest of the workloads, causing system-wide performance issues. We looked at many different storage solutions and even implemented them in lab environments to gauge performance and reliability. Our search came to a pleasant end when we discovered SolidFire.
We chose SolidFire for agility in provisioning, full API access, and ease of scale. It’s also non-impactful to current deployment. SolidFire eliminated our noisy neighbor issues, guaranteed IOPS per volume, and most importantly, provides second-to-none support. Their technical support team leaves nothing to chance and proactively monitors multiple storage units in multiple data centers around the globe. We found an answer to all our heartaches with a single vendor: SolidFire.
I am sure ZFS and semi-SSD-based storage has a place in dedicated developments and backup arena, but not in high-performance and reliability-demanding deployments such as Datapipe’s Stratosphere cloud. Now when I need to do a code update or add capacity on a cloud storage subsystem, I open a tunnel and SolidFire does the rest. No outage, no maintenance window required. It’s totally worry-free, allowing me to reduce the complexity of my operations, and focus instead on those other daily operations that aren’t as streamlined and do cause me worry!