NSX-t Uplink profiles and TEP subnets

Ahhh NSX-t… the next generation of VMwares Software Defined Networking stack – now replacing NSX-v. Pulling your hair out getting it to work? Me (and 2 others i know) too……

EDIT:

Further research into this has turned up that the below is NOT indeed correct for 3.1.3 onwards. It has now been consolidated – which is a good thing because routing between TEPs would have likely lead to a heap of extra traffic.
Significant Bits blog goes into far more detail here

If you do go down the road of limiting the VLANs on the “Edge trunk” instead of 0-4094 you will need all VLANs the edges need. In my below example, it would mean “1600, 1651, 1652” VLAN1601 is no longer needed as we have consolidated the TEP subnets.

Slightly older but still relevant info:

Ok, lets quickly go over some background info:
This is not a “How to install NSX-t: The complete guide” or “How to migrate from NSX-v to NSX-t” because this has been very well covered. I don’t see the need to add another blog going over the same things.
What I will say is that the following are EXCELLENT resources that when you know what you are looking at and join them together, you can get a working cluster:

What this blog IS but, is specifically around some of the settings none of these mention OR Have changed significantly in Version 3.1.3 that mean they no longer work
My environment is vSphere 7.0U3, ESXi 7.0U3, VDS 7.0.3 and NSX-t 3.1.3

Very simply / the TLDR of this is:
You need a SEPARATE VLAN and TEP IP Pool for the Host Transport Zone and the Edge Transport Zone.
Yes you still need a NSX-t “External” VLAN based Segment

I want to thank 2 very good friends of mine that helped work all this out and get me up and running – Nick from DevLAN.io and Damien from Rendrag.net (Eg: They did all the leg work and were nice enough to share their answers to allow me to write this)

VDS Port Groups

One thing a lot of the documents was missing is what the VDS Port groups look like, so lets take a peek behind the curtain:

So what do we have here, we have my 5 NSX-t Segments and 1 VLAN Trunk. I’m going to focus on PG-Edge-Uplinks.
This should be a standard VDS port group, with VLAN Trunking enabled.
You CAN break this out to single VLANs trunks and get more granular – and I would suggest it. I just haven’t done that yet but intend to.

Just for completeness, lets see what this looks like from the NSX-t side:

We will circle back to how these are used in a moment

Physical VLANs

In my setup, I have created 2 VLANs for transport
1600 – Hosts
1601 – Edges
They are standard VLANs with 1600 MTU (or higher) and trunked to all switches / hosts
My next step would be to make these on Port Groups on the DVS

TEP IP Pools

The other thing that tripped me up was the TEP IP pools. All the documentation says to use 1 group for this, but you actually need 2. One for hosts and one for Edges.

These need to be routable to each other as well as tunnels will be built between the subnets. In my case, these are done on my core switch for inter VLAN routing

Host Setup / Host profiles

For completeness, this is my host setup (System > Fabric > Nodes > ESXi Node)

Uplink Profiles

Here is where it started to click. We need 2 Uplink profiles – one for the hosts and one for the edges, each using their own TEP Pools. Note the 2 VLANs used for transport here.

Edge Transport Nodes

Ok, here is where the magic REALLY happens and where I got stuck. On the edge nodes, you need to create 2 switches, with different profiles because this is how the edges build tunnels to the hosts.
If you have them all on one subnet, tunnels don’t create. So you have to separate them

And a sprinkle of OSPF

Astute readers would have noted that I linked to a OSPF article above. That’s because I have 0 BGP devices in my lab – it’s all OSPF
Everything is the same, except BGP is disabled and OSPF is enabled. Here is a picture of my interfaces and OSPF so you can visualise it but