Data Tapes always have a pair of storage capacities mentioned with them. For Example LTO-7 capacity is mentioned as 6.25TB /15.0TB. The lower capacity is called the Native capacity and the bigger number is the compressed capacity. We take a look at what the difference is , and how the compression actually is achieved.
- LTO, 3590, 3592 and DDS tapes capacities are mentioned in pairs
- Lower value = Native capacity. Its guaranteed
- Higher value = Compressed capacity. Data dependent
- Compression works by eliminating the redundancies in your data.
Ratios - The easiest Part of Compression
Lets look at LTO-7 tape media, which has a native capacity of 6.25TB. Compressed capacity is 15.0TB, which is 2.5 times.
This is compression ratio. Now the LTO consortium has assumed that your data will be compressible 2.5 times - so a 6.25TB tape will be able to store upto 15.0 TB data, by compressing it 2.5 times. Upto LTO-5 a compression ratio of 2 was assumed. Sony't AIT SDX tape drives boasted a compression ratio of 2.5 back in 1999.
- Compressed capacity = Native capacity x Compression Ratio.
This is the easiest part, but also the trickiest. You may or may not achieve the compression ratio since it depends on your Data.
Compression is Data Dependent
Compression works by removing redundancies in the data. If your data is highly repeatable, it will be highly compressible. If the data doesn't have redundant repeated patterns, it will be less compressible. If you are storing a Zip/tar/rar file for example, you will see very little compression achieved because this is already compressed. You might see it being further compressed a bit because of the change in compression algorithm.
So, the highest compressible data will have repeating redundant patterns of data . If your data has patterns, you may see a compression ratio of even 7 or 8 times. An Example is if you have a few virtual machines , they all have the same OS files, exactly same data that instead of being stored twice, can be stored once only and compressed.
Native Capacity is Guaranteed, compressed is data dependent
Native capacity is the guaranteed capacity of the tape ( its in TB btw, not in TiB) that you get irrespective of weather your data is compressible or not. Compressed capacity depends on the nature of your data. You may see compression ratio of 1 ( no compression) for data that's already compressed or 8-10 for data that's highly repeating in nature.
Good news is that all the compression happens transparently to you, using your tape drives processing power. So you don't have to worry about compression, or it slowing down your server - all that is taken care by the tape drive.