1. TrilioVault 3.0 Release Notes

July 23, 2018

1.1. TrilioVault 3.0 Release Notes

This document provides information on TrilioVault 3.0, highlighting new features, enhancements and known issues at the time of release.

1.2. Release Scope

TrilioVault release 3.0 introduces new features and capabilities including support for S3 storage targets, capturing tenant’s networking topology, expanded lifecycle cloud management support with Red Hat Director, and more.

1.2.1. New support for S3 storage targets, including AWS, Ceph and Red Hat Ceph Storage (RHCS)

Feature Use Case Customer benefit

Amazon Web Services (AWS) S3 support

Backup to AWS S3
  • Reduce costs by lev eraging lower cost cloud storage for backup data
DR with AWS S3 Geo Replication
  • Leverage AWS built-in replication to recover remotely in case of a geographic disaster
Trilio stores directly to S3
  • Eliminate the need for a cloud gateway to transfer local data to the cloud

Ceph with RGW (S3) support

Leverage software defined storage
  • Reduce costs and eliminate vendor locking by moving from HW to SW defined storage.
  • Standardize on Ceph storage with Cinder Block as the source and Ceph RGW as the target
DR using Ceph replication
  • Use own infrastructure and Ceph replication capabilities to recover remotely in case of a geographic disaster

1.2.1.1. Description

Since the introduction of Amazon S3, object storage is quickly becoming storage of choice for cloud platforms. Object storage offers very reliable, infinitely scalable storage using cheap hardware. Object storage has become storage of choice for archival, backup and disaster recovery, web hosting, documentation and other use cases.

TrilioVault incorporates Linux’s Filesystem in Userspace (FUSE) with patent pending processing to optimize data handling using the object store. With that, Trilio maintains the same functionality in using S3 as with using the NFS backup target including:

  • Incremental forever
  • Snapshot retention policy with automatic retirement
  • Synthetically full, mountable snapshots
  • Efficient restores with minimum requirement of staging area
  • Scalable solution that linearly scales with compute nodes without adding any performance or data bandwidth bottlenecks

1.2.2. New backup and restore of tenant’s private network topology

Feature Use Case Customer benefit

Tenant’s Network Topology Restore

Disaster Recovery, Cloud Migration
  • Facilitate recovery of all tenant private networks
  • Reduce recovery time with tenant self-service, automated action

1.2.2.1. Description

Another milestone achievement in release 3.0, is the ability to protect tenant’s network space. With this, TrilioVault helps tenants recover the entire network topology including:
  • Networks
  • Subnets
  • Routers
  • Static Routes
  • Ports
  • Floating IP’s

Taking advantage of this additional backup could not be any simpler, as tenants have nothing to do! The entire tenant’s network topology information is automatically included in every snapshot of every workload. This ensures the data is there when needed, eliminates the risk of human error in configuring another protection aspect and keeps it simple.

For recovery, tenants may respectively use a point-in-time snapshot from any workload. A new option under Selective Restore is added to restore the network topology.  TrilioVault will recreate the entire tenant network topology from scratch, to exactly the way it was at the time of backup. It will define the private networks with their subnets, recreate the routers, add the correct interfaces to each router and add static routes to the router if applicable. Furthermore, by preserving the floating IP’s, users are guaranteed that upon restoring a VM which used a floating IP, the same floating IP can indeed be used and will be assigned to that VM.

An important consideration in restoring tenants’ networks is that their public network interface may very well have changed. This is always the case in a disaster recovery scenario. For that reason, TrilioVault will stop short of connecting the new private networks to the public one, allowing tenants to take this last step manually.

Note:

  • To eliminate conflicts, the tenant’s space must have no networking components defined. The restore will fail if any conflict is found, and the network will be reinstated to what it was prior to the attempted restore.
  • As always, Network Topology restore is fully enabled programmatically as well as through the GUI.

image0

1.2.3. New High Availability Cluster Architecture with easier than ever Configurator

Feature Use Case Customer benefit
TrilioVault built-in HA cluster
  • Resilience to TVM node failure
  • Business critical backups continue even if one TVM node fails
  • Resilience to underlying host failure
  • Best practice is to distribute the TVM cluster across 2 or 3 different physical hosts. In this case business critical backups continue even if one underlying host fails.
  • Load balancing
  • Better utilization of underlying hosts when the cluster is physically distributed
TrilioVault centralized deployment with improved GUI
  • Centralized deployment process
  • Deployment is easier and faster: In the past, the configuration process had to be repeated for each TVM node. Now, once TVM nodes are installed, deployment of the cluster automatically configures all TVM nodes.
  • Troubleshooting
  • Easier troubleshooting with a collapsible Ansible output tab
  • Reconfiguration
  • Easier reconfiguration through a single centralized tab

1.2.3.1. Description

Architecture

Starting with release 3.0, TrilioVault is deployed using a built-in high availability (HA) cluster architecture, supporting a single node or a three-node cluster. The three-node cluster is the recommended best practice for fault tolerance and load balancing. The deployment is HA ready even with a single node, allowing to expand to three nodes at a later time. For that reason, TrilioVault requires an additional IP for the cluster even in a single node deployment. The cluster IP (aka virtual IP, VIP) is used for managing the HA cluster and is used to register the TrilioVault service endpoint in the keystone service catalog.

The TrilioVault installation and deployment process handles all the necessary software (e.g. HAProxy) so users don’t have to manage it on their own.

image1

The TVM nodes cannot be installed as VMs under the same OpenStack cloud being protected. They need to be outside of OpenStack on one or more independent KVM hosts. Ideally these KVM hosts would be managed as a virtualized infrastructure using oVirt/RHV, virt-manager or other management tools

Configuration GUI

The centralized deployment feature is accompanied by a new and improved GUI featuring a Grafana based dashboard, easy to view and modify configuration details, and easy to view Ansible outputs with collapsible level of information

image2

Figure : New Configurator Screen - Dashboard

image3

Figure : New Configurator Screen - Config Details

image4

Figure New Configurator Screen - Ansible Output

1.2.4. Expanded lifecycle cloud management support

Feature Use Case Customer benefit

Red Hat director integration

  • TrilioVault deployment, updates and upgrades can be rolled through Red Hat director
  • At this time support is limited to RHOSP10 with NFS target
  • Deploying TrilioVault
  • Restoring TrilioVault
  • TrilioVault persistency through OpenStack upgrading/updating
  • Faster deployment: Customers relying on Red Hat OpenStack Platform director to install and update  their cloud can now use the same tool to deploy TrilioVault
  • One less tool to learn when users are already comfortable with director
  • If needed, TrilioVault components can be re-installed using Red Hat director itself. No other manual intervention is required.
  • TrilioVault persists through OpenStack upgrade and updates using Director

Debian packaging

Deploy TrilioVault components on Ubuntu platforms using APT repository

  • Deploying TrilioVault with Mirantis MCP
  • Seamless installation for Ubuntu based deployments utilizing Ubuntu package management tools
  • Automate TrilioVault deployment by using Trilio-provided shell scripts or by adapting your Ansible environment using Trilio-provided Ansible examples

Improved Ansible Automation

  • Any TrilioVault deployment
  • Faster time to deploy as Ansible is more efficient
  • Deployments are easier to troubleshoot

1.2.4.1. Description

1.2.5. Red Hat director integration

Red Hat OpenStack Platform (RHOSP) director integration allows customers to deploy TrilioVault using the same lifecycle management tool they use for the cloud itself. The integration supports both cases where the overcloud is deployed for the first time or is already deployed. Release 3.0 supports RHOSP version long-lived version 10. Long-lived version 13 is expected to follow soon.

Trilio extends our thanks to Red Hat staff for reviewing and assisting us with the integration process. As explained in the KB article, the integration follows Red Hat’s published guidelines precisely.

1.2.6. Mirantis distribution with Debian packaging

Mirantis field personnel and customers who are looking to deploy TrilioVault 3.0 can now do this through familiar Ubuntu package management tools.

1.2.7. Improved Ansible Automation

The TrilioVault configuration process has been completely rearchitected using ansible scripts. Ansible, in the last few years, has grown in popularity as a preferred configuration management tool and TrilioVault uses Ansible play books extensively to configure the TrilioVault cluster.

Ansible modules are inherently idempotent and hence TrilioVault configuration can run any number of times to change or reconfigure the TrilioVault cluster.

1.2.8. Enhancement Requests

Release 3.0 includes the following requested enhancements:

# Reference Description Resolution
1 MCP-TV-RQ1 Passive provisioning of Keystone catalog records (Eliminate requirement for Admin privileges in managing endpoints in Keystone catalog) While registering API endpoints, TVault now checks whether respective service and endpoints are already present, and does not override them if they are. The requirement for Admin privilege s has been eliminated.
2 MCP-TV-RQ2 APT packaging of TrilioVault extensions for OpenStack Debian packaging is now supported.
3 MCP-TV-RQ3 REST API endpoint for TrilioVault Controller service configuration Configurator API documentation has been added to the deployment guide

1.3. Deprecated Functionality

# Topic Description Alternative
1 Swift target

With the introduction of S3 support, we have deprecated Swift as target and it is no longer supported.

This is due to multiple performance challenges combined with declining demand for Swift based systems.

NFS and S3

1.4. Known Issues

This release contains the following known issues which are tracked for a future update.

# Case # Description Workaround
1 TVAULT-2516

if wlm-workloads service is stopped on primary node, restore will get stuck.

If wlm-workloads service is stopped on the primary node, then restore will remain in restoring state. Later, if wlm-workloads service is restarted, restore fails with error “Restore did not finish successfully”

Restart wlm-workloads service of that node
2 TVAULT-2539 Global job scheduler status fluctuates
  1. Change parameter “global_job_scheduler_override”

    present in workloadmgr.conf to True on all the nodes.

  2. restart wlm-api on all nodes.

3 TVAULT-2542 If virtual IP switched over during snapshot creation then snapshot remains in ‘executing status’
  1. Restart RabbitMQ on secondary nodes
  2. Restart wlm-workloads on secondary nodes
4 TVAULT-2558 Errors “OSError: [Errno 2] No such file or directory” may be observed during snapshot creation with NFS backup target
  1. Append “lookupcache=none” against “vault_storage_nfs_options” parameter in /etc/tvault-contego/t vault-contego.conf on OpenStack compute nodes and /etc/workloadmgr /workloadmgr.conf on TVM nodes.
  2. Restart tvault-contego service on all compute nodes and wlm-api service on all TVM nodes.
5 TVAULT-2592 On some browsers, the Grafana panel of the Configurator asks for security permissions Open a new tab with https://virtualip:3001 and add the ssl exception to get the dashboard working.
6 TVAULT-2604 RabbitMQ: Data Replication failed after primary node goes into standby and reverts back to active mode
  1. Restart RabbitMQ on secondary nodes
  2. Restart wlm-workloads on secondary nodes
7 TVAULT-2609 “Volume type mapping” missing from selective restore when browsed from restore tab in UI Option is visible when it is opened from Project/Backups /Workloads/Snap shots drop down list option for “Selective Restore”.
8 TVAULT-2616

TVault reconfiguration may fail after deleting existing TVM node and adding newly created TVM node.

The following error  will be shown
fatal: [TVM_3]: FAILED! =>
{“changed”: false, “msg”: “Unable to restart service MySQL: Job for mariadb.servic e failed because a timeout was exceeded. See "systemctl status mariadb.servic e” and "journalctl -xe” for details.n”}
  1. Reinitialize Database from UI.
  2. On all TVault nodes : rm /etc/galera_clu ster_configured
  3. Reconfigure with valid values
9 TVAULT-2620 Galera may become inconsistent if reconfiguring without reinitializing the database 1) Delete the file “/etc/galera_cl uster_configure d” from all three nodes 2) Re-initilaize TVault
10 TVAULT-2624 Snapshot remains in executing/uploa ding state if wlm-workload s service is stopped on the node where the snapshot got scheduled. No error is shown Restart wlm-workloads service of that node
11 TVAULT-2625 TVault reconfiguration might fail intermittently at configuring TrilioVault cluster and cause the cluster to go into inconsistent state. 1. Reinitialize Database from UI. 2. On all TVault nodes : rm /etc/galera_clu ster_configured 3. Reconfigure with valid values
12 TVAULT-2627 If a network port goes down for any node in a multi-node setup, pacemaker service gets stopped on that node. When the network port comes back up, the node fails to join cluster. Restart the pacemaker service of that particular node.
13 TVAULT-2629 Network restore does not proceed if there is no network available on setup UI. Proceed for network restore with CLI
14 TVAULT-2614 After upgrading to 3.0 release, email settings are not imported. Manually configure the email settings
15 NOTE When using Red Hat director with an existing TrilioVault deployment, the existing deployment must be cleaned up manually before the upgrade
  1. Uninstall all old tvault-conte go-api, tvault-horiz on-api, python-workl oadmgclient pip packages from all controller nodes
  2. Uninstall all tvault-contego extension pip package and clean /home/tvault directory on all compute nodes
  3. Make sure /usr/lib/pyt hon2.7/site-pac kages/ directory does not have any old egg-info directories for tvault packages on all overcloud nodes(compute and controller nodes)
16 NOTE In Horizon UI, backups admin nodes tab, a node may not be visible. Login to that particular node and restart wlm-workloads service.