SQM for 1 Gbps Lines With OpenWrt

Pictured Gigabit Switch: TP-Link 8-Port Gigabit Ethernet Switch. Amazon Referral Link: https://www.amazon.com/gp/product/B00K4DS5KU/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00K4DS5KU&linkCode=as2&tag=stopl02-20&linkId=afa6ee32eda065c194d2a161f3799c99
Pictured Access Point: Ubiquiti Unifi 6 Lite. Official Link: https://store.ui.com/collections/unifi-network-access-points/products/unifi-ap-6-lite (The Unifi 6 seems to be in short supply. You may have to settle for a Unifi AP-AC-Lite for now)
Pictured OpenWrt Device: NanoPi R4S. Official Link: https://www.friendlyarm.com/index.php?route=product/product&product_id=284 can also be found on aliexpress. Aliexpress Link: https://www.aliexpress.com/item/1005001941753177.html?spm=a2g0s.9042311.0.0.57064c4dsBfzcu
1.1 Introduction and Why?
The diagram above demonstrates how you would install a more powerful OpenWrt PC or ARM PC as a router into your network.  
The reason why we would want to do this is so we can stop bufferbloat at higher bandwidths with SQM (Smart Queue Management). Currently consumer routers usually can't push past 350 Mbps with luci-app-sqm on because the SQM algorithm cake, uses a lot of CPU processing power. The only way we can get close to 1 Gbits with SQM is by building our own router or using hardware like the NanoPi R4S.
Building your home network infrastructure like this, is more reliable and better than consumer routers which try to put the modem, routing, and wireless all in one.
What is Bufferbloat and why stop it?
It is lag or ping spikes in video games or zoom calls that is caused when you or someone else uses up all your bandwidth. It could be torrenting, 4k streaming, bulk downloads, or even a speedtest. SQM algorithms (fq_codel or cake) which are available on OpenWrt, can completely mitigate these pings and ensures low latency even under full load. Overall, you do sacrifice a little max speed 5-10% for guaranteed low latencies.
1.2 What Hardware For The OpenWrt Router?
If you're new to this I recommend option 1 for price, power consumption, and performance. Advanced users could try the x86 computer route!
Hardware Option 1: The NanoPi R4S (~$85 with accessories and case)
The predecessor, the R2S was known to handle ingress up to 465 Mbps, egress up to 750 Mbps (Source). This R4S is more powerful with 6 cores and 1GB of RAM. R4S is also a low power device! I consider this the best bang for your money for Gigabit SQM.
My ISP gives me 800Mbps DL and 35Mbps UL. My NanoPi R4S running FriendlyWrt can do SQM with fq_codel at these speeds without any issue. The NanoPi R4S performance with fq_codel should be similar to the screenshot below at 800 Mbps down.
 Pictured: htop of the 6 core NanoPi R4S with SQM (fq_codel) at 800mbps. The numbers 0-5 represent the load of each cpu core.
Pictured: htop of the 6 core NanoPi R4S with SQM (fq_codel) at 800mbps. The numbers 0-5 represent the load of each cpu core.
Although cake is the better algorithm it is more cpu intensive and seems to cap out around 638 Mbps, so I'd recommend fq_codel for people with higher than 600mbps bandwidths.
Pictured: htop of the 6 core NanoPi R4S with SQM (cake). It capped out at 638Mbps when I have 800 Mbps from the ISP. Notice the CPU load was heavier than fq_codel!
Installation is easy. You just need to flash a microSD card with friendlyWrt. They have a tutorial here: https://wiki.friendlyelec.com/wiki/index.php/NanoPi_R4S#Install_OS
All you have to do to install is...
1. Plug in a microSD card to your computer. 
2. Download rk3399-sd-friendlywrt-5.10-YYYYMMDD.img.zip
3. Get win32diskimager and launch it.
4. On win32diskimager select your image file that you downloaded and select your microSD drive letter. Then flash!
5. After flashing is done eject microSD and unplug.
6. Plug in microSD into NanoPi R4S.
7. Hook up WAN to your modem. Hook up LAN to either your switch which connects to a computer or hook up LAN directly to your computer.
8. Power on. Wait about 3 minutes.
9. On the computer that is connected to the switch or NanoPi's LAN port. Go to web browser and enter in http://192.168.2.1 to access your router.
Performance Tweak 1: Please enable packet steering under network > interfaces > global network options. During openWrt 19 this was enabled by default. Now openWrt 21 turns this off by default so you should turn it back on to utilize all of your R4S's cores.
Perrformance Tweak 2: Tweak processor affinity to improve performance even further. References used: 1, 2, 3
As of 2022.08.03 or image "rk3399-sd-friendlywrt-21.02-docker-20220803.img" from FriendWrt. It is no longer necessary to do performance Tweak 2 since it has been configured by default.
## Quick Reference
Binary = hex = cpu core
000001 = hex 1 = cpu core 0 (A53) selected
000010 = hex 2 = cpu core 1 (A53) selected
000100 = hex 4 = cpu core 2 (A53) selected
001000 = hex 8 = cpu core 3 (A53) selected
010000 = hex 10 = cpu core 4 (A72) selected
100000 = hex 20 = cpu core 5 (A72) selected
001111 = hex f = cpu cores 0, 1, 2 and 3 selected
111111 = hex 3f = cpu cores 0, 1, 2, 3, 4 and 5 selected
110000 = hex 30 = cpu cores 4 and 5 selected
Just use binary and covert it to hex, 1 = select that cpu core and 0 = unselect that cpu core.
## Step1: Get CPU Frequencies to confirm that cores 4 and 5 are the faster cores.
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
## Step2: Get IRQ #s (In my example they are 31 for eth0 and 88 for eth1 yours may be different)
grep eth /proc/interrupts
 31:     442034   72493104  337387798          0      47352          0     GICv3  44 Level     eth0
 88:          0          0 1463111677  994946912          0          0   ITS-MSI 524288 Edge      eth1
## Optional Step: List Cores Assigned Current Affinity
cat /proc/irq/31/smp_affinity
cat /proc/irq/88/smp_affinity
## Optional Step: List Cores Assigned Current Queues
cat /sys/class/net/eth0/queues/rx-0/rps_cpus
cat /sys/class/net/eth1/queues/rx-0/rps_cpus
## Step 3: The Performance Tweaks. Putting affinities on Faster A72 Cores. Queues to spread evenly on slower A53 Cores.
#ETH0 irq on core 4(a72 core)
echo -n 10 > /proc/irq/31/smp_affinity
#ETH1 irq on core 5(a72 core)
echo -n 20 > /proc/irq/88/smp_affinity
#ETH0 queues on all 4 A53 cores(0, 1, 2, 3)
echo -n f > /sys/class/net/eth0/queues/rx-0/rps_cpus
#ETH1 queues on all 4 A53 cores(0, 1, 2, 3)
echo -n f > /sys/class/net/eth1/queues/rx-0/rps_cpus
Just keep in mind that only the queues will spread to as many cores as you selected, 
the IRQ won't! You need to pick 1 core for the IRQ of each eth and thats the reason why 
I keeped mine on core 4 for eth0 and core 5 for eth1.If you restart Smart Queue Management or change SQM settings, it will reset the CPU affinity and you will need to reset your settings or re-apply them.
That's it! All that is left is to configure SQM with fq_codel as shown here. There's no need to install luci-app-sqm because the FriendlyWrt image has everything already!
If you're interested in trying the OpenWrt Version this may be of interest to you: https://github.com/quintus-lab/OpenWRT-Rockchip
Pictured: htop of the 6 core NanoPi R4S with SQM on under speedtest of 600Mbps
External Resources for Nano Pi R4S
R4S Benchmarks by Van Tech Corner on Youtube: https://www.youtube.com/watch?v=t5xuTy1xn64
R4S Performance Tweaking: https://forum.openwrt.org/t/nanopi-r4s-rk3399-4g-is-a-great-new-openwrt-device/79143/406
Hardware Option 2: Any x86 Desktop
When I had a 600 mbps connection I tested a SEEED ODYSSEY - X86J4105 which has a CPU Mark Score of ~3000 and it handled SQM with ease. The htop screenshot below tells me it uses at most 37% of my CPU under full load. In theory, we can guess that a CPU mark of 3000 should be able to work for 1000 Mbps connections, if 600 Mbps only uses 37% CPU at most.
TL;DR. It is safe to assume, any desktop PC with a CPU mark of around 3000 or more on cpubenchmark.net can handle SQM at 1000 Mbps.
Pictured: htop of SEEEd Odyssey X86J4105 PC running OpenWrt
Disadvantage of a regular desktop is power usage... Should you decide to use a PC there are a couple requirements.
1. Make sure it has two Gigabit Ethernet ports. If it has one already, you can add a second one with a Mini PCI-E Gigabit Network Adapter. (Amazon referral link).
2. You also want to make sure it has a CPU Mark of 3000 or more. You can check here: https://www.cpubenchmark.net/cpu_list.php
3. Preferably it would be a low power device around < 25 Watts.
As for installation of software. OpenWrt has their own written guide here: https://openwrt.org/docs/guide-user/installation/openwrt_x86
4. After OpenWrt is setup and running you just need to enable SQM like so: https://www.stoplagging.com/openwrt-method-fq_codel-cake/
5. As of openwrt 21, in order to utilize all of your routers multicores  you should enable packet steering under network > interfaces > global network options and enable irqbalance to improve performance even further. https://openwrt.org/docs/guide-user/services/irqbalance
Hardware Option 2.1: Seed Odyssey x86 Mini PC (~$250-$300)
Should you decide to go with the pricey Seed Odyssey they did a write up about running OpenWrt on it thru a USB device: https://wiki.seeedstudio.com/ODYSSEY-X86J4105-Installing-openwrt/
Personally I ran mine on a 16GB M.2 SATA SSD since NVME isn't current supported in the base x86 OpenWrt Image. Instead of flashing a USB drive as instructed by SEEED. I flashed my M.2 SATA drive with balenaEtcher instead.
As of openwrt 21, in order to utilize all of your routers multicores  you should enable packet steering under network > interfaces > global network options and enable irqbalance to improve performance even further. https://openwrt.org/docs/guide-user/services/irqbalance
My current connection is 600Mbps. On the SEEED ODYSSEY - X86J4105 which has a CPU Mark Score of ~3000 and it handles 600Mbps DL with ease. The htop screenshot below tells me it uses at most 37% of my CPU under full load. In theory, we can guess that a CPU mark of 3000 should be able to work for 1000 Mbps connections, if 600 Mbps only uses 37% CPU at most.
Pictured: Seed Odyssey CPU Usage under load at 600Mbps Download
1.3 What Access Point to Get?
I keep hearing raving reviews about the Ubiquiti APs and use one myself. I have extremely stable WiFi with these and never have to reboot them. Ubiquiti also advertises up to 200 concurrent users as well! If you have a recommendation better than these I'd like to know.
Official Link: https://store.ui.com/collections/unifi-network-access-points/products/unifi-ap-6-lite (As of 2/24/2021 free shipping over $100. To get over $100 you can add a filler item.
If you plan on only having one Ubiquti AP I recommend installing via the phone so you don't have to bother with more complicated things like AP Controllers.
If you're on a budget and can't buy a dedicated AP. You can try turning your old router into an access point by putting it into AP mode instead of routing mode. This is important because you should be letting the OpenWrt device do the routing to prevent bufferbloat not your old router.
Facts about WiFi
If you need more coverage you should get more APs not one single AP with a bunch of antennas, because those are marketing gimmicks.
WiFi has limited range due to the physics of their frequency bands.
5Ghz can handle more bandwidth, but will usually be about half the range of 2.4Ghz.
1.4 Contact
If you need help or consultation please join my rocket.chat server at https://chat.stoplagging.com/invite/zaMu6X you can message me @Starfroz by looking me up under the globe icon after registering and logging in.
 
                                                    



