Dell PowerEdge MX7000 EMC OpenManage Enterprise-Modular Edition Version 1.10.2 - Page 102

Method, Payload, ResetType: RESET_ALL, ForceReset: true}

Page 102 highlights

are working, running the promote task disrupts workloads that are still running on the lead chassis computes. For information about relocating working components that is, computes and network switches from the failed lead, see, the list item 3.c, "Steps that are required to restore the failed lead before putting it into production." b. After determining that the lead chassis has failed and is inaccessible, you must remotely shut down power to the lead chassis or physically remove the chassis from the stack before running the "promote" task on the backup. If lead chassis not turned off or removed from the stack before the promote task, the failed or partially failed lead chassis may revive after promoting the backup and cause situations of multiple leads. Multiple leads can create confusion and interference in managing the group. 2. Running the "promote" task on the backup lead chassis: a. If the lead chassis is up and running, the backup chassis web interface blocks the "promote" task. Ensure that the lead has failed and is inaccessible before initiating the promote task on backup. The backup may erroneously block the "promote" when the lead is accessible through the private network, but it may not be reachable on the public user management network. In such cases, OMEModular RESTful API can be used to run the promote task forcefully. For more information, see the RESTful API guide. b. A job is created after the "promote" operation is started. The job may be completed in 10-45 minutes, based on the number of chassis in the group and amount of configuration that has to be restored. c. If the lead chassis is configured to forward alerts to external destinations (email, trap, system log), any alerts that components in the group generate while the lead is down, are available only locally in their respective hardware or alert logs. During the lead outage, the leads cannot be forwarded to configured external destinations. The outage is the period between lead failure and successful promotion of backup. 3. Expected behavior after the "promote" task: a. The backup chassis becomes the lead and all the member chassis are accessible as they were on the earlier lead chassis. After the "promote" task, references to the old lead chassis exist as a member of the same group. The references are created to prevent any disruption to the working computes in the old lead in a lead chassis MM failure situation. The "promote" task rediscovers all the members in the group and if any member chassis is inaccessible then, the chassis is still listed in the lead home page with a broken connection and available repair options. You can use the repair option to add the member chassis again or remove the chassis from the group. b. All firmware baselines or catalogs, alert policies, templates or identity-pools, and fabrics settings are restored as they were on the failed lead chassis. However, following are some exceptions and limitations: 1. Any recent configuration changes on the failed lead within the 90 minutes window that is needed for copying to the backup, those configurations may not be copied completely to the backup and are not restored completely after the "promote" task. 2. The in-progress and partially copied jobs that are associated with templates/identity-pools continue to run. You can perform one of the following tasks: a. Stop the running job. b. Reclaim any identity-pool assignments. c. Restart the job to redeploy the template. 3. Any template that is attached to an occupied slot through the lead before the backup takes over as the new lead, is not deployed on the existing sled when it is removed or reinserted. For the deployment to work, the administrator must detach the template from the slot, reattach the template to the slot, and remove or reinsert the existing sled. Or, insert a new sled. 4. Any firmware catalogs that are created with automatic update catalog on a schedule are restored as manual updates. Edit the catalog and provide automatic update method with update frequency. 5. Alert Policies, with stale or no references to devices on the old lead, are not restored on the new lead. c. Steps that are required to restore the failed lead before putting it into production: 1. On the new lead, turn off the chassis remotely before performing the "promote" task on the backup. If the chassis not turned off, the partially failed lead may come online and cause a situation of multiple leads. There is limited support in automatic detection and recovery of this situation. If the earlier lead comes online and automatic recovery is possible, the earlier lead is forced to join the group as a member. 2. On the new lead, remove the earlier lead chassis from the group to remove references to it. 3. On the old lead, gain physical access to the failed lead chassis as soon as possible and unstack it from the group. If there were any templates with identity-pool assignments that are deployed to any computes on the old lead, then reclaim the identity-pool assignments from the computes. Reclaiming the identity pool assignments is required to prevent any network identity collision when the old chassis is put back into production. 4. Do not delete fabrics from the old lead chassis as deleting the fabrics can lead to network loss once the old lead is added back to the network. 5. On the old lead, run a force "reset configuration" using the following REST API payload: URI: /api/ApplicationService/Actions/ApplicationService.ResetApplication Method: POST Payload: {"ResetType": "RESET_ALL", "ForceReset": true} d. Relocate the working components of the old lead to other chassis in the group: 102 Use case scenarios

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107

are working, running the promote task disrupts workloads that are still running on the lead chassis computes. For information about
relocating working components that is, computes and network switches from the failed lead, see, the list item 3.c, "Steps that are
required to restore the failed lead before putting it into production."
b.
After determining that the lead chassis has failed and is inaccessible, you must remotely shut down power to the lead chassis or
physically remove the chassis from the stack before running the "promote" task on the backup. If lead chassis not turned off or
removed from the stack before the promote task, the failed or partially failed lead chassis may revive after promoting the backup
and cause situations of multiple leads. Multiple leads can create confusion and interference in managing the group.
2.
Running the "promote" task on the backup lead chassis:
a.
If the lead chassis is up and running, the backup chassis web interface blocks the "promote" task. Ensure that the lead has failed
and is inaccessible before initiating the promote task on backup. The backup may erroneously block the "promote" when the lead is
accessible through the private network, but it may not be reachable on the public user management network. In such cases, OME-
Modular RESTful API can be used to run the promote task forcefully. For more information, see the RESTful API guide.
b.
A job is created after the "promote" operation is started. The job may be completed in 10-45 minutes, based on the number of
chassis in the group and amount of configuration that has to be restored.
c.
If the lead chassis is configured to forward alerts to external destinations (email, trap, system log), any alerts that components in
the group generate while the lead is down, are available only locally in their respective hardware or alert logs. During the lead
outage, the leads cannot be forwarded to configured external destinations. The outage is the period between lead failure and
successful promotion of backup.
3.
Expected behavior after the "promote" task:
a.
The backup chassis becomes the lead and all the member chassis are accessible as they were on the earlier lead chassis. After the
"promote" task, references to the old lead chassis exist as a member of the same group. The references are created to prevent
any disruption to the working computes in the old lead in a lead chassis MM failure situation.
The "promote" task rediscovers all the members in the group and if any member chassis is inaccessible then, the chassis is still
listed in the lead home page with a broken connection and available repair options. You can use the repair option to add the
member chassis again or remove the chassis from the group.
b.
All firmware baselines or catalogs, alert policies, templates or identity-pools, and fabrics settings are restored as they were on the
failed lead chassis. However, following are some exceptions and limitations:
1.
Any recent configuration changes on the failed lead within the 90 minutes window that is needed for copying to the backup,
those configurations may not be copied completely to the backup and are not restored completely after the "promote" task.
2.
The in-progress and partially copied jobs that are associated with templates/identity-pools continue to run. You can perform
one of the following tasks:
a.
Stop the running job.
b.
Reclaim any identity-pool assignments.
c.
Restart the job to redeploy the template.
3.
Any template that is attached to an occupied slot through the lead before the backup takes over as the new lead, is not
deployed on the existing sled when it is removed or reinserted. For the deployment to work, the administrator must detach the
template from the slot, reattach the template to the slot, and remove or reinsert the existing sled. Or, insert a new sled.
4.
Any firmware catalogs that are created with automatic update catalog on a schedule are restored as manual updates. Edit the
catalog and provide automatic update method with update frequency.
5.
Alert Policies, with stale or no references to devices on the old lead, are not restored on the new lead.
c.
Steps that are required to restore the failed lead before putting it into production:
1.
On the new lead, turn off the chassis remotely before performing the "promote" task on the backup. If the chassis not turned
off, the partially failed lead may come online and cause a situation of multiple leads. There is limited support in automatic
detection and recovery of this situation. If the earlier lead comes online and automatic recovery is possible, the earlier lead is
forced to join the group as a member.
2.
On the new lead, remove the earlier lead chassis from the group to remove references to it.
3.
On the old lead, gain physical access to the failed lead chassis as soon as possible and unstack it from the group. If there were
any templates with identity-pool assignments that are deployed to any computes on the old lead, then reclaim the identity-pool
assignments from the computes. Reclaiming the identity pool assignments is required to prevent any network identity collision
when the old chassis is put back into production.
4.
Do not delete fabrics from the old lead chassis as deleting the fabrics can lead to network loss once the old lead is added back
to the network.
5.
On the old lead, run a force “reset configuration” using the following REST API payload:
URI:
/api/ApplicationService/Actions/ApplicationService.ResetApplication
Method:
POST
Payload:
{"ResetType": "RESET_ALL", "ForceReset": true}
d.
Relocate the working components of the old lead to other chassis in the group:
102
Use case scenarios