SQM for 1 Gbps Lines With OpenWrt

Pictured Gigabit Switch: TP-Link 8-Port Gigabit Ethernet Switch. Amazon Referral Link: https://www.amazon.com/gp/product/B00K4DS5KU/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=B00K4DS5KU&linkCode=as2&tag=stopl02-20&linkId=afa6ee32eda065c194d2a161f3799c99
Pictured Access Point: Ubiquiti Unifi 6 Lite. Official Link: https://store.ui.com/collections/unifi-network-access-points/products/unifi-ap-6-lite
If Ubiquiti is out of stock you you don't like them TP-Link EAP610 or their Omada lines work just as great: https://amzn.to/3B7aIwj
Pictured OpenWrt Device: NanoPi R4S. Official Link: https://www.friendlyelec.com/index.php?route=product/product&path=69&product_id=288
2023.08.15 Update - I highly recommend the R6S ($140+$30 for power adapter) instead if you have 1Gbps or higher. It's 3x more powerful than a R4S and has dual 2.5G ports. The R4SE ($80+$10 for power adapter) can only handle up to 700Mbps on cake and could be a good alternative if you're not aiming for Gigabit. 
The R6S has been tested to handle up to 1300Mbps on Cake using fiber.google.com/speedtest to the San Francisco, Comcast server on my own Comcast connection (I'm on the 1.2 Gbps DL Plan). Others have confirmed 1500-2000Mbps range with Cak. But in the real world experience that I have 900 Mbps cap seems to have the best results.
NanoPi R6S: https://www.friendlyelec.com/index.php?route=product/product&path=69&product_id=289 
For R6S you will need to source your own 18W PD and USB-C to USB-C PD Cable.
As for the power supply I'd recommend the CanaKit Raspberry Pi power supply over the one FriendlyElec provides: https://amzn.to/3T5XzuP
1.1 Introduction and Why?
The diagram above demonstrates how you would install a more powerful OpenWrt PC or ARM PC as a router into your network.  
The reason why we would want to do this is so we can stop bufferbloat at higher bandwidths with SQM (Smart Queue Management). Currently consumer routers usually can't push past 350 Mbps with luci-app-sqm on because the SQM algorithm cake, uses a lot of CPU processing power. The only way we can get close to 1 Gbits with SQM is by building our own router or using hardware like the NanoPi R4S.
Building your home network infrastructure like this, is more reliable and better than consumer routers which try to put the modem, routing, and wireless all in one.
What is Bufferbloat and why stop it?
It is lag or ping spikes in video games or zoom calls that is caused when you or someone else uses up all your bandwidth. It could be torrenting, 4k streaming, bulk downloads, or even a speedtest. SQM algorithms (fq_codel or cake) which are available on OpenWrt, can completely mitigate these pings and ensures low latency even under full load. Overall, you do sacrifice a little max speed 5-10% for guaranteed low latencies.
1.2 What Hardware For The OpenWrt Router?
If you're new to this I recommend option 1 for price, power consumption, and performance. Advanced users could try the x86 computer route!
Hardware Option 1: The NanoPi R4S / R4SE (~$100 with accessories and case)
The predecessor, the R2S was known to handle ingress up to 465 Mbps, egress up to 750 Mbps (Source). This R4S is more powerful with 6 cores and 4GB of RAM. R4S is also a low power device! I consider this the best bang for your money for Gigabit SQM.
My measured bandwidth without SQM from my ISP is 920Mbps DL and 35Mbps UL. My NanoPi R4S running FriendlyWrt can do SQM with fq_codel (simplest.qos) ranged up to 800 Mbps without issue. With cake (piece_of_cake.qos) the best speeds I could get with SQM rup to 700 Mbps.
Video: htop of the 6 core NanoPi R4S with SQM (fq_codel w/ simplest.qos) The numbers 0-5 represent the load of each CPU core.
Video: htop of the 6 core NanoPi R4S with SQM (cake w/ piece_of_cake.qos). The performance capped out  around 750Mbps on my gigabit connection. Notice the CPU 5 almost hit 100% load. Cake is much more CPU intensive than fq_codel.
As you can see although cake is the slightly better algorithm it is more CPU intensive and seems to cap out around ~700 Mbps. I'd recommend fq_codel if you want to squeeze out the extra ~100 Mbps.
Installation is easy. You just need to flash a microSD card with friendlyWrt. They have a tutorial here: https://wiki.friendlyelec.com/wiki/index.php/NanoPi_R4S#Install_OS
All you have to do to install is...
1. Plug in a microSD card to your computer. 
2. Download rk3399-sd-friendlywrt-5.10-YYYYMMDD.img.zip
3. Get win32diskimager and launch it.
4. On win32diskimager select your image file that you downloaded and select your microSD drive letter. Then flash!
5. After flashing is done eject microSD and unplug.
6. Plug in microSD into NanoPi R4S.
7. Hook up WAN to your modem. Hook up LAN to either your switch which connects to a computer or hook up LAN directly to your computer.
8. Power on. Wait about 3 minutes.
9. On the computer that is connected to the switch or NanoPi's LAN port. Go to web browser and enter in http://192.168.2.1 to access your router.
Performance Tweak: Tweak processor affinity to improve performance even further (This is not needed after the 2022.08.03 firmware. See the blue info box below). References used: 1, 2, 3
As of 2022.08.03 or image "rk3399-sd-friendlywrt-21.02-docker-20220803.img" from FriendWrt. It is no longer necessary to do performance Tweak 2 since it has been configured by default.
## Performance Tweaks Quick Reference
Binary = hex = cpu core
000001 = hex 1 = cpu core 0 (A53) selected
000010 = hex 2 = cpu core 1 (A53) selected
000100 = hex 4 = cpu core 2 (A53) selected
001000 = hex 8 = cpu core 3 (A53) selected
010000 = hex 10 = cpu core 4 (A72) selected
100000 = hex 20 = cpu core 5 (A72) selected
001111 = hex f = cpu cores 0, 1, 2 and 3 selected
111111 = hex 3f = cpu cores 0, 1, 2, 3, 4 and 5 selected
110000 = hex 30 = cpu cores 4 and 5 selected
Just use binary and covert it to hex, 1 = select that cpu core and 0 = unselect that cpu core.
## Step1: Get CPU Frequencies to confirm that cores 4 and 5 are the faster cores.
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
## Step2: Get IRQ #s (In my example they are 31 for eth0 and 88 for eth1 yours may be different)
grep eth /proc/interrupts
 31:     442034   72493104  337387798          0      47352          0     GICv3  44 Level     eth0
 88:          0          0 1463111677  994946912          0          0   ITS-MSI 524288 Edge      eth1
## Optional Step: List Cores Assigned Current Affinity
cat /proc/irq/31/smp_affinity
cat /proc/irq/88/smp_affinity
## Optional Step: List Cores Assigned Current Queues
cat /sys/class/net/eth0/queues/rx-0/rps_cpus
cat /sys/class/net/eth1/queues/rx-0/rps_cpus
## Step 3: The Performance Tweaks. Putting affinities on Faster A72 Cores. Queues to spread evenly on slower A53 Cores.
#ETH0 irq on core 4(a72 core)
echo -n 10 > /proc/irq/31/smp_affinity
#ETH1 irq on core 5(a72 core)
echo -n 20 > /proc/irq/88/smp_affinity
#ETH0 queues on all CPU cores(0, 1, 2, 3)
echo -n 3f > /sys/class/net/eth0/queues/rx-0/rps_cpus
#ETH1 queues on all CPU cores(0, 1, 2, 3)
echo -n 3f > /sys/class/net/eth1/queues/rx-0/rps_cpusIf you restart Smart Queue Management or change SQM settings, it will reset the CPU affinity and you will need to reset your settings or re-apply them.
That's it! All that is left is to configure SQM with fq_codel as shown here. There's no need to install luci-app-sqm because the FriendlyWrt image has everything already!
If you're interested in trying the OpenWrt Version this may be of interest to you: https://github.com/quintus-lab/OpenWRT-Rockchip
External Resources for Nano Pi R4S
R4S Benchmarks by Van Tech Corner on Youtube: https://www.youtube.com/watch?v=t5xuTy1xn64
R4S Performance Tweaking: https://forum.openwrt.org/t/nanopi-r4s-rk3399-4g-is-a-great-new-openwrt-device/79143/406
Hardware Option 2: Any x86 Desktop
When I had a 600 mbps connection I tested a SEEED ODYSSEY - X86J4105 which has a CPU Mark Score of ~3000 and it handled SQM with ease. The htop screenshot below tells me it uses at most 37% of my CPU under full load. In theory, we can guess that a CPU mark of 3000 should be able to work for 1000 Mbps connections, if 600 Mbps only uses 37% CPU at most.
TL;DR. It is safe to assume, any desktop PC with a CPU mark of around 3000 or more on cpubenchmark.net can handle SQM at 1000 Mbps.
Pictured: htop of SEEEd Odyssey X86J4105 PC running OpenWrt
Disadvantage of a regular desktop is power usage... Should you decide to use a PC there are a couple requirements.
1. Make sure it has two Gigabit Ethernet ports. If it has one already, you can add a second one with a Mini PCI-E Gigabit Network Adapter. (Amazon referral link).
2. You also want to make sure it has a CPU Mark of 3000 or more. You can check here: https://www.cpubenchmark.net/cpu_list.php
3. Preferably it would be a low power device around < 25 Watts.
As for installation of software. OpenWrt has their own written guide here: https://openwrt.org/docs/guide-user/installation/openwrt_x86
4. After OpenWrt is setup and running you just need to enable SQM like so: https://www.stoplagging.com/openwrt-method-fq_codel-cake/
5. As of openwrt 21, in order to utilize all of your routers multicores  you should enable packet steering under network > interfaces > global network options and enable irqbalance to improve performance even further. https://openwrt.org/docs/guide-user/services/irqbalance
Hardware Option 2.1: Seed Odyssey x86 Mini PC (~$250-$300)
Should you decide to go with the pricey Seed Odyssey they did a write up about running OpenWrt on it thru a USB device: https://wiki.seeedstudio.com/ODYSSEY-X86J4105-Installing-openwrt/
Personally I ran mine on a 16GB M.2 SATA SSD since NVME isn't current supported in the base x86 OpenWrt Image. Instead of flashing a USB drive as instructed by SEEED. I flashed my M.2 SATA drive with balenaEtcher instead.
As of openwrt 21, in order to utilize all of your routers multicores  you should enable packet steering under network > interfaces > global network options and enable irqbalance to improve performance even further. https://openwrt.org/docs/guide-user/services/irqbalance
My current connection is 600Mbps. On the SEEED ODYSSEY - X86J4105 which has a CPU Mark Score of ~3000 and it handles 600Mbps DL with ease. The htop screenshot below tells me it uses at most 37% of my CPU under full load. In theory, we can guess that a CPU mark of 3000 should be able to work for 1000 Mbps connections, if 600 Mbps only uses 37% CPU at most.
Pictured: Seed Odyssey CPU Usage under load at 600Mbps Download
1.3 What Access Point to Get?
I keep hearing raving reviews about the Ubiquiti APs and use one myself. I have extremely stable WiFi with these and never have to reboot them. Ubiquiti also advertises up to 200 concurrent users as well! If you have a recommendation better than these I'd like to know.
Official Link: https://store.ui.com/collections/unifi-network-access-points/products/unifi-ap-6-lite 
If you plan on only having one Ubiquti AP I recommend installing via the phone so you don't have to bother with more complicated things like AP Controllers.
If you're on a budget and can't buy a dedicated AP. You can try turning your old router into an access point by putting it into AP mode instead of routing mode. This is important because you should be letting the OpenWrt device do the routing to prevent bufferbloat not your old router.
Another option you could try that I've heard are good are the TP-Link Omada EAP610s: https://amzn.to/3RWWTY9 I have no real world experience with these as I don't own any. But they are solid products in the /r/homenetworking community.
Facts about WiFi
If you need more coverage you should get more APs not one single AP with a bunch of antennas, because those are marketing gimmicks.
WiFi has limited range due to the physics of their frequency bands.
5Ghz can handle more bandwidth, but will usually be about half the range of 2.4Ghz.
 
1.4 Advanced Cake Configuration
TheseThis section is for my own reference and these were recommended by the official docs: https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details#sqmqueue_discipline_tab 
Under the Queue Discipline tab of SQM. 
Enable the checkmark for advanced configuration and save& apply.  
This turns on squash_dscp, squash_ingress, ECN on ingress and NOECN on egress. Leave them as defaults as they are good the way they are. (If you have symmetrical fiber then ECN can be enabled on egress. 
Next checkmark and enable "Dangerous Configuration" which is below the "Advanced Configuration" section. We are going to disable triple-isolate and enable per host isolation...  Here's a short explanation. 
To quote the docs, by default, cake will use triple-isolate: “which will first make sure that no internal or internal host will hog too much bandwidth and then will still guarantee for fairness for each host. In that mode, Cake mostly does the right thing. It would ensure that no single stream and no single host could hog all the capacity of the WAN link. However, it can’t prevent a BitTorrent client – with multiple connections – from monopolizing most of the capacity.” You can enable per host isolation, which will identify all source/destination information. 
To enable that, Add the following to the “Advanced option strings” (in the Interfaces → SQM-QoS page; Queue Discipline tab, look for the Dangerous Configuration options): 
For queueing disciplines handling incoming packets from the internet (internet-ingress): nat dual-dsthost ingress 
For queueing disciplines handling outgoing packets to the internet (internet-egress): nat dual-srchost 
For me that means Qdisc options (ingress) I wrote in "nat dual-dsthost ingress" while for 
Qdisc options (egress) I wrote in "nat dual-srchost"
1.5 Contact
If you need help or consultation please join my rocket.chat server at https://chat.stoplagging.com/invite/zaMu6X you can message me @Starfroz by looking me up under the globe icon after registering and logging in.
 
                                                    
