[Difx-users] HPC using Ethernet vs Infiniband

Bill Boas bill.boas at gmail.com
Tue Sep 21 10:25:43 EDT 2021


Walter, Adam, et all,

The Open Fabrics Alliance, www.openfabrics.org, developed the software for
both IB and ROCE. Suggest your questions may well get useful responses for
both, for and against, by contacting the Alliance.

One useful rarely mentioned fact is that at the physical cable level the
SERDES for both Ethernet and IB is identical in the NVIDIA (nee Mellanox)
chips and adapter cards, and the physical cable latency difference is the
serialization time for serial (Ethernet) vs parallel (IB).

So the criteria to consider are primarily in the software distributions and
host interfaces, mostly PCIe. Here the options to evaluate are NVIDIA,
Cornelis (nee Intel's Omnipath, IB by another label) and most recently UCX
and CXL both follow ons from IB and OpenFabrics which incidentally is
coming up to 20 years from conception. There is also GIGAIO which is
physically a PCI fabric.

Bill.
Bill Boas
ex-Co-Founder OpenFabrics Alliance
M: 510-375-8840

On Mon, Sep 20, 2021 at 4:05 PM Adam Deller via Difx-users <
difx-users at listmgr.nrao.edu> wrote:

> I've spoken to people about RoCE, but not sure if any of them have gone
> ahead and taken the plunge on it yet.  I'll ask around to update myself.
>
> Cheers,
> Adam
>
>
>
> On Tue, 21 Sept 2021 at 05:28, Walter Brisken via Difx-users <
> difx-users at listmgr.nrao.edu> wrote:
>
>>
>> Hi DiFX Users,
>>
>> In the not so distant future we at VLBA may be be in the position to
>> upgrade
>> the network backbone of the VLBA correlator.  Currently we have a 40 Gbps
>> Infiniband system dating back about 10 years.  At the time we installed
>> that
>> system, Infiniband showed clear advantages, likely driven by RDMA
>> capability
>> which offloads a significant amount of work from the CPU.  Now it seems
>> Ethernet has RoCE (RDMA over Converged Ethernet) which aims to do the
>> same
>> thing.
>>
>> 1. Does anyone have experience with RoCE?  If so, is this as easy to
>> configure
>> as the OpenMPI page suggests?  Any drawbacks of using it?
>>
>> 2. Has anyone else gone through this decision process recently?  If so,
>> any
>> thoughts or advice?
>>
>> 3. Has anyone run DiFX on an RoCE-based network?
>>
>>         -Walter
>>
>> -------------------------
>> Walter Brisken
>> NRAO
>> Deputy Assistant Director for VLBA Development
>> (505)-234-5912 (cell)
>> (575)-835-7133 (office; not useful during COVID times)
>>
>> _______________________________________________
>> Difx-users mailing list
>> Difx-users at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>
>
>
> --
> !=============================================================!
> A/Prof. Adam Deller
> ARC Future Fellow
> Centre for Astrophysics & Supercomputing
> Swinburne University of Technology
> John St, Hawthorn VIC 3122 Australia
> phone: +61 3 9214 5307
> fax: +61 3 9214 8797
>
> office days (usually): Mon-Thu
> !=============================================================!
> _______________________________________________
> Difx-users mailing list
> Difx-users at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>


-- 
Bill.
Bill Boas
510-375-8840
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20210921/7d948919/attachment.html>


More information about the Difx-users mailing list