It’s been years since I last used my good ol’ 35mm film Canon Rebel EOS G camera. Back then, you would wait for days for your images to be developed to see if you got any shots that were what you wanted. When you did finally get shots you liked, you were looking for albums to store all those prints. Today, many photographers are forced with a similar but different storage problem. Instead of looking for shelf space to store all those albums, photographers are looking for efficient, performance-driven, cost effective data storage devices, and solutions that meet their complex needs. This article doesn’t nearly go into all the options available out there, but I’ve tried to highlight some of the more mainstream options that we hear about so often.
A little about me before you take my word for it. No, I’m not a long-time professional photographer. Instead, I’m a new up-and-coming photographer that got started into this line of the business once I started shooting dancers on stage for my wife’s dance studio about 12 years ago. I started out with an old Nikon D90 back then and a lot has changed since. My real background is in the IT world. I’ve been a long-standing systems administrator for about 30 years now. I’m a Capella University Doctor of Business Administration in IT management student, I hold a Master’s of Science in Information Assurance and Security degree from Norwich University, a Bachelor’s of Science degree in Management of Information Technology from DeSales University, and three Associates of Applied Science degrees from Northampton Community College – also in the IT field. Let’s just leave it as … I’m a born IT nerd and I LOVE the big data realm.
Let’s face it, you LOVE that new shiny 45.7MP Nikon Z7ii (Yes! I’m a die-hard Nikon shooter), or you went even bigger and went to the 61.0MP Sony A7R IV. I bet you were thinking, I’m going to just love the quality of these images and how much I’m going to be able to re-compose my images since they’re so HUGE and I have all these pixels I can play with. But, I bet the minute you started downloading the images you were thinking, holy smokes! What am I going to do with all these files? How am I possibly going to be able to store them all? We’re talking about RAW images in the 50MB per shot. To put it in perspective, in a 1,249-image standard senior session that I do, that equates to an AVERAGE of 40GB. Let’s say you’re just a weekend shooter and shoot about 40 sessions in a year with 4 sessions in a weekend, that’s just over 8TB of data in just ONE year. You want to keep all your images for the past… 3 years? Well, now you’re talking about 24TB of USABLE drive space (we’ll come back to usable vs. raw storage here in a bit).
Your first bit of homework
In order to really dive into storage solutions and how best to determine which one works for you, you really need to identify a few things.
- How big is a single image from my camera in RAW uncompressed format?
- How many shots do I take in a single session?
- Do I cull my images and keep only the good images or do I keep everything?
- How many sessions do I intend to shoot in a year?
- How long do I hope to hold ACTIVE images in a local repository?
Once you’ve gotten your answers here, you should be able to come up with a pretty good idea of how much data storage you actually need. You’ll take your number of images you get from a single session (or how many you intend to keep), times the size of a single image, times the number of sessions you intend to do in a year, times how long you want to maintain access to those images. Once you get that number, divide it by 1024. This converts the MB’s of storage to a more usable number (GBs). If you’re over a thousand, which most likely you are, divide it by 1024 again to get TB. This is mostly likely the number we’re going to work with. BUT .. don’t stop there. Remember, you should always account for growth. Do you think you’re gonna get a new camera? How about more sessions in a year? So, let’s just take that number and add … 25% — or whatever feels comfortable to you.
Before we get into devices, I feel it’s imperative we talk briefly about your data and how you should protect it before we even talk about storing it. I would personally say that if you talk to any IT professional or anyone that has ever been through a data-loss event, they would tell you that you should abide by the idea of backups in the 3-2-1 strategy.
The 3-2-1 backup strategy simply states that you should have 3 copies of your data (your production data and 2 backup copies) on two different media (disk and tape) with one copy off-site for disaster recovery. As a photographer, we’re generally making some sort of change to our images once they hit our computers. So, we generally always want to be in the habit of having 3 copies of our data readily available. The first copy of your data is your actively used and manipulated copy of your data. This is the one that most of us would be using with Lightroom or whatever photo software you use. The second copy of your data would normally also be on-site and would house all your original images plus your manipulated files (PSDs, XMP files from Lightroom, etc.). This second copy of your data is your first failsafe in case of deleting a file accidentally or overwriting a file, etc. from your active data set. Remember! This second dataset should NEVER be used as your active dataset. If you need to manipulate a file, it should always be in your first active dataset area.
The third dataset is a major talking point reserved for a different article altogether. The third dataset is all about a dataset that is somehow stored offsite. Whether that dataset be in the trunk your car on a USB hard drive, a USB hard drive at a neighbor’s house, a family member’s house, etc. or be stored on stored on a enterprise data storage provider’s cloud storage service, you want that dataset someplace other than where you work on your data.
Keep in mind that you need sufficient storage on your active and secondary datasets in order to keep copies of everything that you need. In the example above and following my 25% rule, I would need 30TB of usable drive storage in my active storage unit as well as my secondary storage unit.
The real reason why you need backups
I’ll talk about this a bit more when I get to RAID devices in the next section, but keep in mind that one of the very valid reasons why you need backups is not just for mechanical or electrical problems but more importantly due to human error. I would say over my 30 years of experience, the biggest culprit of data loss is not from server failures, not from unexpected disk failures, but actually due to human error. Clicking and dragging a folder to an unknown destination, deleting a folder that you thought was empty, or overwriting files with edits when they were in fact the originals. Poof! You just lost all that time and effort! Having that second and third dataset can really be a life saver.
Second article coming soon to talk more about data backup options like drive to drive, drive to nas, nas to drives, nas to cloud, computer to cloud, etc.
MTBF and AFR
Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The term is used for repairable systems, while mean time to failure (MTTF) denotes the expected time to failure for a non-repairable system.
Seagate and many other manufacturers no longer use the industry standard “Mean Time Between Failures” (MTBF) to quantify disk drive average failure rates. Hard drive manufacturers are changing to another standard: “Annualized Failure Rate” (AFR). MTBF has proven useful in the past, but it is flawed. To address issues of reliability, Seagate is changing to another standard: “Annualized Failure Rate” (AFR). MTBF is a statistical term relating to reliability as expressed in power on hours (p.o.h.) and is often a specification associated with hard drive mechanisms.
It was originally developed for the military and can be calculated several different ways, each yielding substantially different results. It is common to see MTBF ratings between 300,000 to 1,200,000 hours for hard disk drive mechanisms, which might lead one to conclude that the specification promises between 30 and 120 years of continuous operation. This is not the case! The specification is based on a large (statistically significant) number of drives running continuously at a test site, with data extrapolated according to various known statistical models to yield the results.
Based on the observed error rate over a few weeks or months, the MTBF is estimated and not representative of how long your individual drive, or any individual product, is likely to last. Nor is the MTBF a warranty – it is representative of the relative reliability of a family of products. A higher MTBF merely suggests a generally more reliable and robust family of mechanisms (depending upon the consistency of the statistical models used). Historically, the field MTBF, which includes all returns regardless of cause, is typically 50-60% of projected MTBF.
Seagate and WD’s new standard is AFR. AFR is similar to MTBF and differs only in units. While MTBF is the probable average number of service hours between failures, AFR is the probable percent of failures per year, based on the manufacturer’s total number of installed units of similar type. AFR is an estimate of the percentage of products that will fail in the field due to a supplier cause in one year. Seagate has transitioned from average measures to percentage measures.
MTBF quantifies the probability of failure for a product, however, when a product is first introduced: this rate is often a predicted number, and only after a substantial amount of testing or extensive use in the field can a manufacturer provide demonstrated or actual MTBF measurements. AFR will better allow service plans and spare unit strategies to be set.
Hard drive reliability is closely related to temperature. By operational design, the ambient temperature is 86°F. Temperatures above 122°F or below 41°F, decrease reliability. Directed airflow up to 150 linear feet/min. is recommended for high speed drives.
Hard Drive Manufacturer Reliability Results
Whenever you’re in the market to purchase storage drives, it’s in your best interest to always check and see what the reliability results from BackBlaze are. In case you don’t know who Backblaze is, they’re what I would consider an Open Source, Community Driven, cloud storage provider. They use industry open-hardware devices supplied by a company called 45Drives (more on them later) and drives from all sorts of manufacturers to run their data storage systems. Check out their latest results here:
Out of the thousands of drives they have in their configurations, they keep a record of every single drive failure and showcase it in their quarterly results. This by far the first place I look before I personally by drives of any type for me or customers.
Now that you have an idea what MTBF and AFR is, we should talk a little about RAID and how RAID helps to fill the voids of MTBF/AFR and provide you some hardware piece of mind through hardware redundancy. One BIG note to understand. RAID IS NOT A BACKUP!!! Let me say that one more time:
RAID IS NOT A BACKUP METHOD!
OK, now that that’s out of the way. Let’s talk about what RAID actually is.
Storage Devices Overview
With so many different options out there for you to choose from, how in the world do you know where to even start? Well, let’s start with the basics and we’ll go from there. I’ll briefly talk about some of the advantages and disadvantages (in my humble opinion) of each as they refer to the workflow of an industry professional photographer.
The dreaded USB hard drive
We’ve all been there at one point in time or another. We’ve used those single external USB spinning hard drives to act as either our active or secondary datasets. Probably one of the most versatile options when it comes to storage devices is the USB storage device. Ranging from 2TB to 14tb and price points from $59.99 to $3327.99 respectively, USB drives have always been one of the easiest choices available in data storage expansion needs. Their cost, connectivity options, and compatibility remains some of the biggest advantages of these simple solutions. However, they have a very real disadvantage. Most of these external units hold what we IT professionals classify as “B” grade drives. Drives that would typically be considered “desktop” rated and therefore not made from longevity or large workloads. Keep in mind that almost all of the high storage option devices like this all use spinning disks and have an MTBF or AFR that is significantly lower than enterprise class drives and the workload requirements of drives of that class.
Since these all use spinning disks, you should be very aware of their specified MTBF / AFR values on each and every drive you buy. Fully understand that under real-world conditions where you’re using these drives plugged in all the time, your actual use could very well be near or less than 3 years. At the 3-year point, you should be looking at moving data to a new drive. It should also be noted that not all drives are created equal, some drives could very well last you less than 1-year of real-world use.
When you want a simple USB drive and you’re looking for something more robust, take a look at the WD MyBook DUO.
That all being said, USB drives still have their proper application and use cases. For example. I have in my current possession, seven 14tb WD element drives. They’re used to fulfill my current requirements for my offsite storage needs. I should note, that this is just ONE way I’m fulfilling the requirements of my offsite storage needs and I’m currently looking at other options.
Currently, I’m using six 14TB Seagate EXOS Enterprise-Class hard drives in a RAID 5 Array (more on RAID later). Then I’m replicating that data to a NAS device with eight 10TB drives. For my offsite needs, I’m using these 14TB WD Elements drives in a monthly rotation that go down to a neighbor’s house for offsite storage.
Pros: Cost Effective, Versatile, Operating System agnostic
Cons: MTBF/AFR is Subpar, little to no hardware redundancy, clunky to keep track of what’s on each drive when you need more than one for datasets, data throughput is less than desired. Terrible overall performance for real-time use with Adobe Lightroom.