proxmox

Windows file transfers are FAST

Windows file transfers are FAST

#Windows #file #transfers #FAST

“Raid Owl”

100G Networking Part 1 –

Mikrotik 100G Switch –
Connectx5 –

——————————————————————————————-
πŸ›’ Amazon Shop –
πŸ‘• …

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

47 Comments

  1. Before your wife introduces your nether regions to a 5 iron (realistically she's probably trying to save money so cheaper to ruin a goodwill wooden driver) consider an Epyc Milan used enterprise setup or a used Epyc Genoa setup but you'll have to get her a Gucci bag first with Genoa.

    With Milan You get 128 PCI-E lanes PER CPU installed, PCIE 4.0, and a lot of boards have 7 PCIE 4.0 X16 slots. You get access to ECC DDR4 3200 and up so mountains of memory is cheap. Core counts top out at 64 for Milan which is fine it only boosts to nearly 4GHz all core on a 64 core proc psshhh slow right? Many of the boards support bifurcation so you can run a whole bunch of U.2 HBAs / PCIE switches for truly face melting storage speeds if you really want to go there. Or you can be pedestrian and throw in 3 8 port SAS24 HBAs with a total of 24 1.6TB 12G SAS SSDs? Threadripper is cool, undoubtedly. But there is something about looking at a flashy corvette that only costs $130,000 from a $1.6M excavator that builds car factories which only cost you 61K at a gov auction and earns your company a corvette every time you book a job. Take a look at Milan.

  2. @raidowl to get closer to 100G speed you will need smb multichannel, smb direct and jumbo frames on all nics in the path set with a minimum mtu of 9000. Use microsoft diskspd tool to test the transfer as it can be configured to use up each core of ur PC as most time windows file explorer copy tasks is notoriously single threaded. If you are testing using a VM u will also need to do PCI pass-through with the nic or setup SR-IOV for the card and VM adapter.

  3. As far as I'm aware, part of the issue is using the standard Windows file transfer program. I've seen RoboCopy and ChoEazyCopy both being suggested as faster copying methods for Windows, so maybe you might want to look into those?

  4. I am glad you are upgrading your setup. It is kinda weird how for the longest time we have always been told that you don't need faster than gigabit because your WAN connection is a lot slower than that anyway. Now I have 10G internet at my home and am actually needing to upgrade my home network because most of the devices on the network don't actually support that. I kinda feel like your home LAN should be at least 2-3x what your wan connection is which means I need to seriously do some upgrades. And do I need those speeds? Sure don't but I do try and utilize it as best as I can. I have a tor node setup and I have my own nas and server setup to account for as well. I am really hopeful that now that Gigabit and faster speeds are becoming more common that we will finally move on from the idea that gigabit is fast enough. I would love to see SMPTE 2110 proliferate for virtual systems and I do think that it is getting easier and easier to just run your own homelab.

  5. RoCE requires not only nic support but switch support as well to make 100 gigs possible as switch will make proper QoS and pass packets faster. On some conferences some ppl claimed that dpdk is also need but I am not sure if this is True… Also if you are using RoCEv1 instead of RoCEv2 you should get slighty faster speeds.

    Intel with their iWARP make claims that iWARP needs only nic support without switch but never had chance to test it out.

  6. I had a similar issue w/McSpecial fw'd nics for my e810s. I have the -cqda2, which is 100G total. You'd want to get the -2cqda2 to be able to use both ports at 100G. After my little failure w/a couple Dell cards, claimed to be Intel, but had to return because nothing I tried would get the fw to update. Seems even these would be cheaper than actual Nvidia ConnectX-6 dx cards.

  7. Five things:

    1) (Side rant: ConnectX-5 (MCX556A-ECAT) are ethernet-only cards, so the eBay listing is misleading because the ports aren't VPI, but that's besides the point.)

    But as a PCIe 3.0 x16 card that's starved of bandwidth by dropping it down to an x8 configuration, you're disproportionately killing the performance of the card.

    57.6 Gbps out of a possible 64 Gbp is actually about 90% efficiency, which is pretty darn good.

    2) If you want to test it at its full bandwidth, take out your video card and put the Mellanox card in the primary PCIe slot and disable PCIe bifurcation from x16 to x8/x8 (unless it will disable said bifurcation automatically when it detects that it doesn't have a second PCIe card installed).

    (This is the reason why my AMD Ryzen 9 5950X compute cluster nodes are running fully headless now because I had a GPU in it when I initially set up the system, but then pulled the GPU out and then dropped in my Mellanox ConnectX-4 100 Gbps IB cards in there instead.)

    3) Also, based on my testing, I had tried to keep said GPU in my 5950X in at the same time that I also had a GPU in as well. At first, I tried putting the Mellanox card into the primary slot and the GPU in the secondary slot (using an Asus X570 motherboard) and it complained that the GPU wasn't in the primary slot, so it forced me to swap them. With that, I was only giving said Mellanox ConnectX-4 100 Gbps IB card PCIe either 3.0 x4 or maybe even 4.0 x4, (which doesn't matter because since the card is only a PCIe 3.0 card anyways, it would've dropped down to PCIe 3.0 x4 speeds even if it was a PCIe 4.0 x4 slot).

    Instead of getting 97.6 Gbps like I can when it is in the primary PCIe 4.0 x16 running @ 3.0 x16 speeds, it dropped all the way down to 14 Gbps (out of a possible 32 Gbps).

    So, moral of the story:
    When you starve the card of bandwidth, the drop in the bandwidth that you'll actually be able to get doesn't fall off linearly. (It's a shallow exponential decrease.)

    4) Once you have it in your primary PCIe slot, the total bandwidth that's available to it will increase, and at that point, you can START your performance tuning of the connection/card.

    To do that, you can START by increasing the MTU from the default 1500 for Ethernet to 9000 (if your card, being an ethernet-only, will support it).

    That can help. If it DOESN'T support a MTU of 9000 bytes, then you can try the IB datagram mode max of 4096 bytes minus 4 bytes for the header = 4092 bytes.

    SOMETIMES playing around with the MTU can help, but if you plan on moving a bunch of really small files in addition to moving a lot of really big video files, then the larger MTU can hinder your small file transfer performance, at which point, it might be potentially worth it for you to store your small files into a .zip or 7-zip archive file (don't even bother compressing it), send it over the network, and then unpack it on the server, on the other end.

    That CAN, be faster (sometimes).

    (I've played with all sorts of variation with this and it REALLY depends on what it is that you're trying to do.)

    5) After that, the final performance tuning bit that you can try is to set the data transfer process affinity mask using Process Lasso or something akin to that, so that the NIC/port is bound to a specific CPU core otherwise, if the data is floating around inside the CPU die (core hopping), then that will rob it of performance if the Windows scheduler can't figure out which core the data I/O should be handled by.

    RDMA and RDMA over converged ethernet is an art to be able to leverage the technology in an efficient and effective manner.

    And if you read the official performance tuning guide from Mellanox (read: Nvidia), they'll tell you all sorts of stuff that you can try and do, but the problem is that really only works if the network workload is dominated by that specific type of workload. If you have a mix workload network (which most people do), then implementing the recommendations from said performance tuning guide can actually hurt performance as the default is the worst for all of the workloads uniformly (i.e. it's unoptimised for ANY workload), but as a result, they all perform decently well vs. if you tuned it for a specific workload and if a workload shows up on the network that ISN'T the workload that the network is tuned for, then that workload will suffer.)

    You can try these things.

    Reply/tag me if you have any further questions.

    (I've been running my 100 Gbps IB network since 2018.)

  8. Awesome video! What happens if you copy 5 files at the same time? Also I’m going to get you one of my cards so you can see if this is a drive os queuing issue.

  9. What is your firewall? Ie in your network dies 100gbs goes through it? Layer 3 asics may also help another, especially compared to the weak mikrotik. Gotta play wierd games to not be limited by thier stuff. However w layer 3 switch, not having authoritative dhcp may be an issue, but setting up proper acls will tear your hair out. Wait….sorry no offense

  10. I am, by no means, an expert in this field, so the information I'm about to provide is just theoretical based on other information I see. Your real-world speed might be bottled neck by the processing power of your CPU… Went to look up the LTT video where I saw this information. Watched the whole video and came back to this. Hit play, forgetting the whole reason that I went to look at the LTT video, and the next words that came out of your mouth are about getting a thread ripper. Oh yeah… I have a half typed comment that I haven't submitted yet…

    LOL

  11. Excellent video!
    I suggest you to try tuxera’s fusion file share smb, they can provide you a demo license. On the storage try xinnor, I guess graid is out of scope.
    My two cents.

  12. 100 Gig ?? I mean love it don't get me wrong …but w the hella are you planning to do with 100 Gb??😳 and such a big switch too !? You definetely turned me curious

  13. SMB is such a frustration.. you have bonding two Inc’s , your simple smb method (nice to learn new tricks). But why after getting so much working well is the file transfer speed still Middle Ages?

  14. great timing, I am currently upgrading to 100G (/25G) as well πŸ™‚

    Just bought a CRS510 which will hopefully arrive soon, my Server has a 100G NIC and my Workstation is currently at 40G…

    I already benchmarked it using a DAC cable and found out that my CPU and RAM are limiting the speed to ~26Gbit/s, but it is still good enough for my use case (I will have other clients and servers using 10/25G connections)

  15. So using Adaptec 8 Series raid controllers… My off the cuff speed estimations are 100MB per drive, 15 drives R6 x8 PCIe…. depend on file size/type/content slower (50G of PDFs suck) or faster (80GB video)…. on average I can move 30TB <20 hours…. but ConnectX-3 56Gbs MULTI CHANNEL…. if trying Multi… I found separate network/subnet for port 1 – port 2, but had good results with 1 subnet when first starting the steep learning curve…. not a lot of consumer user manuals available… and Windows transfer rate graph is showing Transfer as GBs,, so X8…. 2500GBs = 20Gbs…. and basic ethernet I'm seeing an overhead 10 -20% based on type of file…. all those tags added to the data packet…. but I'm still learning so I may have just wasted a few Killowatts typing…. Edit: multi channel may work in you current hardware config…. try enabling in windows PS,, all machines with dual Nics….

  16. i think you need to goto 100g bonded from ws to nas then the bottleneck will be nvme? you cards are dual port already from the looks – go for it and slay the performance bottlenecks once and for all – people are genuinely interested and this trend is not going away – plenty of people will want ws/nas ultra fast connections – maybe you can sync to a dual nas somehow too – have it sync in the background over 56 g or something

  17. Thanks for confirming!! i went through a similar journey. I switched from 10G to 100G and had all running with x8 and x16.
    It was a Rabbitbhole to get to at least 3GB/s with SMB while i expected lime you to have 7 to 10GB/s.
    It was so complex and I wonder,, are we the only ones??? There is almost no good info out there and I think we are at the forefront of Homelab.

  18. 0:00 "I have a problem…" finding a shirt that didn't come from the bottom of the pile of (clean?) laundry. Jkjkjk, I'm just jealous that I'm not your "full 100G connection with RDMA that actually gets 100G speeds." Fun video. Nice work, thanks!

  19. interesting vid, just a guess:
    1) you might be limited by CPU actually, because there's tons of processing handling networking, storage, and the transfer process itself
    maybe you'll need to configure CPU affinity to dedicate specific CPU cores to specific tasks ? (it's very common practice on workstations)
    2) you might be limited by Windows Explorer doing the file transfer
    maybe try something else like Total Commander, or Free File Sync ?

  20. So why do you approach this from a 1 to 1 prospective? Why are you not testing with multiple devices transferring data as is intended?

  21. Hey Raid Owl I have to say this- I LOVE your videos. You explain the issues you run into extreme well and I really appreciate how you walk through and troubleshoot your issues. I think it's awesome that you listen to the feedback of viewers and react accordingly. The last video you had on this everyone said "time to upgrade your hardware!" and you saw that, thought about it, and did it your way while explaining why you made the decisions you did. You're extremely relatable- you do things for the fun of experimenting with it and explain your mistakes and/or lack of knowledge with humility. We all benefit and learn with you because of it, thank you. So often you'll see content creators who curate their content to show them in the best light- the smarted most capable people who never make mistakes and never show the issues they encounter even if there were millions. Of course there isn't anything wrong with presenting concise content like that- a 10 hour video of troubleshooting wouldn't be all that entertaining- but I feel you personally have a fantastic mix of explaining issues while still being concise.

    Just want to say again I really appreciate how you create your content. I personally have a home lab setup very similar to yours. I am considering the pros and cons of the same exact switch you bought. I am curious about the same levels of performance that you exactly are trying to achieve. I'm gaining a lot from your work and I just have to say again- appreciate you so much! Keep it up!

    BTW if you upgrade from your MikroTik switch help me out and sell me yours! haha. Joking but maybe not really lol.

    Also REALLY funny comedy in this video. You had me laughing as the Threadripper slowly faded in. Amazing. I would LOVE to see you do this workstation build as that's also something I've been curious about doing. Let's see it!

    Best of luck and looking forward to more videos!

  22. Great video. Im only doing 25gb SMB Direct myself. I also had to go direct and bypass the 25gb ports on my Unifi switch because it doesnt support RoCE. Y ou can run iWARP type of RDMA, but this runs over TCP still and isnt nearly as efficient as RoCE is.

    While this doesn't seem like an issue for you since you aren't running a bunch of NVMEs, be aware that Storage Spaces runs into major bottlenecks when striping NVME arrays when you have 3+. I tried default settings and custom tuned settings and it just couldn't get anywhere close to the maximum. If you make a striped array in Disk Management the speeds will immediately jump to double or more what Storage Spaces is capable of. Ill be making a thread showing some speed differenced on L1Techs this weekend if you want to check it out. The interesting part here is that it is basically showing that Storage Spaces has significant overhead in how it processes data. It works great for large HDD arrays and even some cache disks, but it cannot reach the limit of what multiple gen4 NVMEs are capable of.

  23. Ah cool we tried in VM and with RAM disks. I was curious about those. In the end, it's often more about available lanes/lane speed, storage speeds, and even sometimes CPU power. Also files are WEIRD. If there are bunch of little files instead of one big one, we can lose out even on a high speed connection with CONSTANT fluctuations.
    I am curious if those higher speeds you found can be used to run back up pushes from an active server. Maybe in a full server environment, you can at least increase the rate at which you can back up your configurations.

  24. would be interesting to deepdive into the Truenas performace stuff, disable compression etc, windows server is such a license hell

  25. When running SSDs/NVMEs in a pool under Windows you need to remember to run through a lot of hoops to get TRIM to work.
    I found that out the hard way my self.

  26. Crying on 1gbit ethernet 😒😒😒😒😒😒😒😒😒😒😒😒😒

Leave a Reply