Big Data and Open-Source Protection


Big Data and Open-Source Protection

Great presentation by Mark Coletta , Senior Product Manager,  NetBackup, Veritas at Percona Live . Mark shared his best practices to protect today’s big data and open-source workloads.

Why is this important? Big data is driving significant open-source database adoption. In conjunction with digital transformation initiatives, enterprises are pursuing hybrid multi-cloud infrastructure to avoid vendor lock-in, maintain agility, and reduce cost.

NoSQL/open-source databases are now the norm. 78 percent of organizations report using NoSQL or open source databases. 50 percent of organizations are using Hadoop/Apache HBase. Here’s how the databases currently in use stack up:

  • MySQL = 59%
  • MongoDB = 54%
  • PostgreSQL = 27%
  • Cassandra = 19%
  • MariaDB = 15%

Data protection challenges are a function of big data analytics (40 percent identified), open source databases (28 percent), virtualized environments (16 percent), public cloud (12 percent), hyper-converged infrastructure (4 percent).

Governance has become an issue for corporate management. As such, business strategy now includes policy management, risk identification, evaluation, reporting/auditing, compliance, and measures to ensure conformity of policies and laws. You need to think about how governance and risk fit together to meet compliance requirements. Are you able to remove data if asked?

With the critically of data to the organization, companies have to think about natural disasters with a regional impact like hurricanes, tornadoes, floods, and fires. They must have back-ups available in the event of a natural disaster.

Likewise, there are localized business impacts. Unplanned outages as a result of a power outage, network outage, software error, or cybersecurity incident/ransomware. Attacks are becoming more frequent and according to the FBI, ransomware attacks cost companies an average of $2.3 million.

There are a number of data protection strategy considerations including downtime, mean time to recover, protection/recovery tasks, budget/cost, downtime – site down, customers go elsewhere, infrastructure – how much redundancy, storage, complexity, management – how many DBAs, network administrators, infrastructure – how big are the pipes, how to handle scale.

Mark suggests thinking about two ends of the spectrum – recovery point objective (RPO) and recovery time objective (RTO). For the recovery point, identify when the incident occurred, how far back you have to go to recover (i.e., the last time you had usable data). For recovery time, how long will it take to make the system fully operational again? You need to evaluate priorities from a business application perspective.

Mark ran through the three types of protection recovery to evaluate:

Manual Backup/Recovery

This may be a script or plug-in a drive.


  • Fewer infrastructure resources

  • Less planning is required

  • Upfront costs are lower.


  • You are not protected against human error or malicious activity

  • The data you lose may be unrecoverable or unable to be recreated

  • There is no compliance verification

  • Customer confidence will be low

  • RTO is high and there is little to no RPO

  • There’s no long-term retention of data

  • It’s very time-consuming for personnel to be doing repetitive, manual processes


You are able to build intelligence into capabilities snapshotting a local point-in-time look at the data and you achieve replication by sending data to one host/location to another.


  • Multiple copies of the data 
  • Minimal or no data loss
  • Short RTO and RPO
  • Failover/failback (automatic or manual)
  • Fast restore from snapshot


  • Does not protect against human error or malicious activity
  • High hardware and network cost – double storage space and pipelines are required to move the data around
  • High maintenance costs
  • Higher complexity
  • Snapshot retention requirements
  • Susceptible to data corruption
  • No long-term retention capabilities
  • Planning and design consideration is necessary

Backup Software

Dedicated backup software automates a lot of the process like setting up the policies and procedures of your data protection strategy. The software can cover a lot of different platforms – operating systems, file systems, and databases.


  • Long-term retention
  • Resources based on use
  • Compliance verification
  • Confidence in recoverability
  • RTO/RPO based on needs
  • Can leverage snapshots
  • Protects against human error or malicious activities


  • Initial investment – time and capital
  • Planning and design consideration
  • Media failure is still a possibility over the long-term
  • Cost to manage – training and education

When making the decision on what’s the right strategy for your business, ask how much is our data worth and how much are you willing to risk?

The mean cost of an unplanned outage is $4,851 per minute or $530,000 per hour ( Ponemon – “The Cost of Data Center Outages” ). The average outage is 3.5 hours. Can your company afford a $1.8 million hit to your bottom line?

猎云网 CEO 靳继磊:新需求驱动新升级,共话消费升级下半场的新机遇


20 年前 微软如何颠覆了我们使用鼠标的方式?



Big Data and Open-Source Protection