[Difx-users] HPC using Ethernet vs Infiniband

Geoff Crew gbc at haystack.mit.edu
Tue Sep 21 09:27:05 EDT 2021


Our experience with IB on our first cluster build-out at Haystack was 
not a happy one: we had a very difficult time locating versions of 
hardware and software that ran on our (then) rather heterogeneous 
network.  In the end someone drew the short straw and climbed a very 
long and time-consuming learning curve to make it work.

For our next build-out we went with ethernet and many things magically 
'worked' and we were even able to get our IB switch reflashed to do 
ethernet.  "worked" means not having to spend time troubleshooting any 
of the stuff that was killing us on IB and we got on with out life.  
We're still waiting on an upgrade to a new OS for the Mark6 playback 
units to get certain performance NICs to work, but we're managing.

However: performance is not what is advertised.   While I suppose there 
are things I can investigate and tune in the ethernet fabric to improve 
matters, I came to the conclusion (after some link tests) that the real 
culprit was affinity.  Specifically, line speed tests get the nominal 
performance  (40Gbps) because the test application takes care to set up 
the appropriate affinity between the network driver and the processing 
core running the network benchmarking application. Simple test 
applications (mpispeed, &c) that do not do this get about half that (20 
Gbps).

As there is no provision in DiFX for this, I gave up worrying about 
getting optimal performance.  I have seen suggestions that the more 
recent OpenMPI libraries may do better with affinity but I haven't had 
the leisure to investigate.  We're more people-limited than 
computer-limited at this time.

However, we're ultimately limited by Mark6 playback (16 Gbps) so again, 
there hasn't been much point to optimizing things that won't matter.

On 9/20/21 3:28 PM, Walter Brisken via Difx-users wrote:
>
> Hi DiFX Users,
>
> In the not so distant future we at VLBA may be be in the position to 
> upgrade the network backbone of the VLBA correlator.  Currently we 
> have a 40 Gbps Infiniband system dating back about 10 years. At the 
> time we installed that system, Infiniband showed clear advantages, 
> likely driven by RDMA capability which offloads a significant amount 
> of work from the CPU.  Now it seems Ethernet has RoCE (RDMA over 
> Converged Ethernet) which aims to do the same thing.
>
> 1. Does anyone have experience with RoCE?  If so, is this as easy to 
> configure as the OpenMPI page suggests?  Any drawbacks of using it?
>
> 2. Has anyone else gone through this decision process recently? If so, 
> any thoughts or advice?
>
> 3. Has anyone run DiFX on an RoCE-based network?
>
>     -Walter
>
> -------------------------
> Walter Brisken
> NRAO
> Deputy Assistant Director for VLBA Development
> (505)-234-5912 (cell)
> (575)-835-7133 (office; not useful during COVID times)
>
> _______________________________________________
> Difx-users mailing list
> Difx-users at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>
-- 
Geoff Crew
gbc at haystack.mit.edu



More information about the Difx-users mailing list