• info@lab2prod.com.au
  • Australia
NSX-T
NSX-T Edge Deletion Failed

NSX-T Edge Deletion Failed

Manually Cleaning Up Orphaned Edge Nodes

What? Cleaning Up Stale Edge Nodes from NSX-T Manager

I recently ran into an issue where a host and its underlying storage failed in one of my environments. This host had NSX-T Edge nodes residing on it, and when the host and storage failed, NSX-T Manager lost access to the Edge nodes on it, and the Edge nodes were stuck in a DELETION FAILED state.

This issue should not really impact production environments, due to be shared storage, vSphere HA, and any other mechanisms for VM restoration. This environment was a lab, and as a result the nodes were inaccessible from NSX-T Manager.

I should also note that the DELETION FAILED state only resulted after attempting to delete the Edge node’s in the UI, refer to the image below.

Edge nodes stuck with DELETION FAILED

This article will walk through 2 approaches to cleaning up the orphaned nodes, which include; API and manually cleaning up t he Corfu database. It’s important to remember that cleaning up the database yourself should really be done with support, however, I detail the process in this article.

Step 1: Using API to clean up the nodes with (DELETION FAILED)

There are 2 API endpoints that can be leveraged to clean up the nodes, they are.

  • https://nsxtManagerFQDN/api/v1/transport-nodes/
  • https://nsxtManagerFQDN/api/v1/fabric/nodes

The first option in my case did not show the orphaned Edge nodes, only a host transport node was displayed.

api transport-nodes not listing edge nodes

The second URI displayed the orphaned Edge nodes.

edge nodes listed in fabric/nodes

Now, using the ID highlighted in the image, you should be able to delete the node by issuing a DELETE request to https://nsxtManagerFQDN/api/v1/fabric/nodes /nodeUUID. As can be seen in the image below.

deleting nodes results in 200 ok

Notice in the request, I got a 200 response, so in theory the node should have been deleted. However, when searching for the node in NSX-T Manager, the node still appears.

Unfortunately, deleting the node with API did not work either.

Step 2: Cleaning up stale Edge nodes using Corfu-browser

The next option I used to clean up the database was using Corfu-browser. This is generally not recommended without support, however, as it was a lab I pushed on.

Step 2a: Log into NSX-T Manager as root

First you will need to use a terminal client and log into the NSX-T Manager as root. Once logged in, navigate to /opt/vmware/corfu-tools.

use putty to log into nsx-t manager

Step 2b: Use Corfu-browser

Next, whilst in this directory, issue the command in the snippet below, ensuring you change the hostname to suit your NSX-T Manager node IP.

java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/*:/usr/tomcat/lib/*" com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 printTable -role nsx-manager -objectType EdgeNode > /tmp/EdgeNodeIDs

After issuing this command, there will be a file called EdgeNodeIDs located at /tmp/EdgeNodeIDs. The file will have a lot of information in it, however, the field you will require is the externalId field. Below is a snippet of what the file should look like.

VALUE: com.vmware.nsx.management.edge.node.model.EdgeNode@5517e36b[
  allocationList=<null>,
  pendingMsgBusRegistration=false,
  autoDeployed=true,
  nodeUserSettings=com.vmware.nsx.management.edge.node.model.EdgeNodeUserSetting                                                                                                                                                                                        s@157544e5[
    cliPassword=password,
    rootPassword=password,
    cliUsername=admin,
    auditUsername=<null>,
    auditPassword=<null>
  ],
  nodeConfigSettings=com.vmware.nsx.management.edge.node.model.EdgeNodeConfigSet                                                                                                                                                                                        tings@1c0db15[
    managementPortSubnets={
      com.vmware.nsx.management.edge.lrouter.ports.model.SubnetModel@713cdd61[
        prefixLength=24,
        ipAddresses={
          192.168.63.60
        },
        ipConfigs=java.util.ArrayList@494e9f73{

        },
        raPrefixTime=<null>
      ]
    },
    hostname=en4-mgmt.shank.com,
    defaultGatewayAddresses={
      192.168.63.1
    },
    searchDomains={
      shank.com
    },
    ntpServers={
      192.168.63.101
    },
    dnsServers={
      192.168.63.101
    },
    formFactor=MEDIUM,
    enableSsh=true,
    allowSshRootLogin=true,
    syslogServers=<null>
  ],
  vsphereConfig=com.vmware.nsx.management.edge.node.model.VsphereDeploymentConfi                                                                                                                                                                                        g@7aff0c30[
    vcId=bd0270b9-7ee3-4a3e-b3d3-9d79f5888204,
    managementNetworkId=dvportgroup-23005,
    computeId=domain-c19037,
    storageId=datastore-22030,
    dataNetworkIds={
      dvportgroup-74021,
      dvportgroup-74021
    },
    hostId=host-22026,
    computeFolderId=<null>,
    advancedConfiguration=<null>
  ],
  reservationInfo=com.vmware.nsx.management.edge.node.model.ReservationInfo@4af5                                                                                                                                                                                        8e7b[
    memoryReservation=com.vmware.nsx.management.edge.node.model.MemoryReservatio                                                                                                                                                                                        n@5d33742[
      reservationPercentage=100
    ],
    cpuReservation=com.vmware.nsx.management.edge.node.model.CPUReservation@7cf3                                                                                                                                                                                        544[
      reservationInShares=HIGH_PRIORITY,
      reservationInMhz=0
    ]
  ],
  nodeType=com.vmware.nsx.management.fabricnode.common.FabricNodeTypeEnum@6818cc                                                                                                                                                                                        fe[
    name=EdgeNode,
    name=EDGE_NODE,
    ordinal=1
  ],
  externalId=aea3d092-6f60-4c87-901b-1d0f74b0ea66,
  ipAddresses=java.util.ArrayList@4087fff9{
    192.168.63.60
  },
  tags=<null>,
  displayName=edge2,
  description=,
  createUser=admin,
  lastModifiedUser=admin,
  createTime=1628460474598,
  lastModifiedTime=1632872454818,
  systemResourceFlag=false,
  revision=6,
  touched=false,
  id=com.vmware.nsx.management.common.IdentifierImpl@37539d92[
    objectType=Node,
    stringId=<null>,
    uuid=aea3d092-6f60-4c87-901b-1d0f74b0ea66
  ],
  nonMonotonicRevision=6
]

The field we require can also be seen in the API calls run earlier, however, there it is just called id. You can also get the ID from the NSX-T Manager UI.

Using the ID, issue the command java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp “/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/:/usr/tomcat/lib/” com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 removeEntries -role nsx-manager -objectType EdgeNode -uuid “aea3d092-6f60-4c87-901b-1d0f74b0ea66”

Once again, ensure you change the hostname and uuid to suit your environment. Once run, you should see output similar to below.

root@nsxmgr:/opt/vmware/corfu-tools# java -Dlog4j.configurationFile=/opt/vmware/corfu-tools/corfu-browser-log4j2.xml -cp "/opt/vmware/corfu-tools/corfu-editor-1.0-jar-with-dependencies.jar:/opt/vmware/proton-tomcat/webapps/nsxapi/WEB-INF/lib/*:/usr/tomcat/lib/*" com.vmware.nsx.management.tools.corfu.CorfuEditorMain -hostname 192.168.63.55 -port 9000 removeEntries -role nsx-manager -objectType EdgeNode -uuid "aea3d092-6f60-4c87-901b-1d0f74b0ea66"
SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Overriding NSX service type to: nsx-manager
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Table mapping mechanism is enabled.
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.management
Reflections took 2975 ms to scan 105 urls, producing 2333 keys and 18905 values
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.pace
Reflections took 139 ms to scan 2 urls, producing 52 keys and 123 values
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsxapi
Reflections took 415 ms to scan 3 urls, producing 166 keys and 9685 values
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.vmc
Reflections took 256 ms to scan 696 urls, producing 0 keys and 0 values
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Processing all classes in package : com.vmware.nsx.csm
Reflections took 254 ms to scan 696 urls, producing 0 keys and 0 values
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] ObjectTypeRegistry is initialized.
No registered metrics logger provided.
Corfu runtime version source(ee70bb3) initialized.
Bootstrap Layout Servers [192.168.63.55:9000]
setCacheDisabled: Deprecated, please set parameters instead
enableTls: Deprecated, please set parameters instead
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Trying to connect with TLS support
connect: runtime parameters CorfuRuntime.CorfuRuntimeParameters(maxWriteSize=2147483647, bulkReadSize=10, fastLoaderTimeout=PT30M, holeFillRetry=10, holeFillRetryThreshold=PT1S, holeFillTimeout=PT10S, cacheDisabled=true, maxCacheEntries=0, maxCacheWeight=0, cacheConcurrencyLevel=0, cacheExpiryTime=9223372036854775807, followBackpointersEnabled=false, holeFillingDisabled=false, writeRetry=5, trimRetry=2, checkpointRetries=5, streamBatchSize=10, checkpointReadBatchSize=5, runtimeGCPeriod=PT20M, clusterId=null, systemDownHandlerTriggerLimit=60, layoutServers=[], invalidateRetry=5, priorityLevel=HIGH, codecType=ZSTD, metricsEnabled=true)
Connecting to Corfu server instance, layout servers=[192.168.63.55:9000]
Construct ssl context based on the following information:
Key store file path: /config/cluster-manager/cluster-manager/private/keystore.jks.
Key store password file path: /config/cluster-manager/cluster-manager/private/keystore.password.
Trust store file path: /config/cluster-manager/cluster-manager/public/truststore.jks.
Trust store password file path: /config/cluster-manager/cluster-manager/public/truststore.password.
Connect Async 192.168.63.55:9000
channelActive: Outgoing connection established to: /192.168.63.55:9000 from id=/192.168.63.55:33962
userEventTriggered: unhandled event SslHandshakeCompletionEvent(SUCCESS)
channelRead: Handshake Response received. Removing readTimeoutHandler from pipeline.
channelRead: node id matching is not requested by client.
channelRead: Handshake succeeded. Server Corfu Version: [source(ee70bb3)]
channelRead: Removing handshake handler from pipeline.
Unavailable or unrecognised attach API : java.lang.ClassNotFoundException: com.sun.tools.attach.VirtualMachine
Detected JVM data model settings of: 64-Bit HotSpot JVM with Compressed OOPs
Connected to new cluster gTXV62MwQCic9iNWjY9fIQ
connect: client version source(ee70bb3), server version is source(ee70bb3)
- [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Successfully connected to Corfu server(s) '192.168.63.55:9000'.
ObjectBuilder: open Corfu stream nsx-manager Node 2db0 id f3cb5120-7734-3d61-bf5f-765d58f3e026
ObjectBuilder: open Corfu stream string-audit id 5a3b0d28-4435-3c1a-bd1b-189e1ae2066f
About to remove the following entries:
============================================================
com.vmware.nsx.management.common.IdentifierImpl@3da0180a[
  objectType=Node,
  stringId=<null>,
  uuid=aea3d092-6f60-4c87-901b-1d0f74b0ea66
]
============================================================
********************************************
********************************************
PRESS ANY KEY TO CONTINUE OR CTRL-C TO ABORT
********************************************
********************************************

Successfully removed 1 entries.

Once the process is complete, the node should be removed and the UI should reflect this. Notice, there are only 3 nodes now.

Repeat this process for all remaining stale entries.

Conclusion

From time to time you may face stale corfu database entries for Edge nodes in NSX-T Manager, it’s important you attempt to clean them up using either the UI or API before jumping straight into the database. Hopefully this article has assisted you with cleaning up Edge nodes stuck in the DELETION FAILED state.

There are other examples of utilizing corfu-browser, one example is here.

Leave a Reply

Your email address will not be published. Required fields are marked *