Skip to main content

SQM for up to 1 Gbps+ Lines With OpenWrt

Pictured Gigabit Switch: TP-Link 8-Port Gigabit Ethernet Switch (Amazon Referral Link)

Pictured Access Point - Ubiquiti Unifi 6 Pro (Official Link) Note: If Ubiquiti is out of stock you or if you don't like them, I heard that the TP-Link EAP670 (Amazon Referral Link) from their Omada lines work just as great. I just have never tried it myself as I've been using Ubiquiti APs.

Pictured OpenWrt Device: NanoPi R4SE (Official Link). As for the power supply I'd recommend the CanaKit Raspberry Pi power supply (Amazon Referral Link) over the one FriendlyElec provides. The SE version has eMMC which means you can run off the devices storage instead of running on the MicroSD like it's predecessor the R4S

The new one is the NanoPi R6S (Official Link)  which has a better CPU. You may need to source your own 18W PD and USB-C to USB-C PD Cable.

Note: The NanoPis from Amazon are not from official sources. FriendlyElec would be direct from manufacturer.  

R4SE - Can do cake SQM up to 700 Mbps. Also has official OpenWrt software available.
R6S - Can do cake SQM up to 1500 Mbps
. Only has official FriendlyWrt software for now. Now hear me out I know this youtube video here says it can do 1500 Mbps https://www.youtube.com/watch?v=2bCf8Xchrfc . However when I actually tested on my 1.2 Gbps connection I find that cake could not be set higher than 820-830 Mbps. If I do set it higher than that I get regular high max ping spikes due to 100% CPU usage on one of my R6S cores.

If you're more interested in speeds beyond 1400Mbps+ with cake SQM please refer to this page: https://wiki.stoplagging.com/books/technical-guides/page/sqm-for-beyond-1-gbps-lines-with-openwrt  

Otherwise if you're fine with up to 800Mbps let's proceed!

1.1 Introduction and Why?

The diagram above demonstrates how you would install a more powerful ARM PC, the nanoPi as a router into your network.  Building your home network infrastructure like this, is more reliable and better than consumer routers which try to put the modem, routing, and wireless all in one.

The reason why we would want to do this is so we can stop bufferbloat at higher bandwidths with SQM (Smart Queue Management) turned on. Currently consumer routers usually can't push past 350 Mbps with cake or fq_codel SQM they are limited by their CPU power. Most consumer routers have underpowered CPUs so that's why the NanoPi is a solid choice. It's low power and has a solid CPU that can handle cake at 700-800 Mbps.

What is Bufferbloat and why stop it?

It is lag or ping spikes in video games or zoom calls that is caused when you or someone else uses up all your bandwidth. It could be torrenting, 4k streaming, bulk downloads, or even a speedtest. SQM algorithms (fq_codel or cake) which are available on OpenWrt, can completely mitigate these pings and ensures low latency even under full load. Overall, you do sacrifice a little max speed 5-10% for guaranteed low latencies.

NanoPi R4S / R4SE Performance

My measured bandwidth without SQM from my ISP is 920Mbps DL and 35Mbps UL. My NanoPi R4S running FriendlyWrt can do SQM with fq_codel (simplest.qos) up to 800 Mbps without issue. With cake (piece_of_cake.qos) the best speeds I could get with SQM was up to 700 Mbps.

Video: htop of the 6 core NanoPi R4S with SQM (fq_codel w/ simplest.qos) The numbers 0-5 represent the load of each CPU core.

Video: htop of the 6 core NanoPi R4S with SQM (cake w/ piece_of_cake.qos). The performance capped out out  around 750Mbps on my gigabit connection. Notice the CPU 5 almost hit 100% load. Cake is much more CPU intensive than fq_codel.

As you can see although cake is the slightly better algorithm it is more CPU intensive and seems to cap out around ~700 Mbps. I'd recommend fq_codel if you want to squeeze out the extra ~100 Mbps.

NanoPi R6S Performance

In summary, with cake I was able to push around 800-830 Mbps. If I set it any higher I usually get small lag spikes.  

NanoPi Software Installation

Installation is easy. You just need to flash a microSD card with friendlyWrt. They have a tutorial here for R4SE: https://wiki.friendlyelec.com/wiki/index.php/NanoPi_R4S#Install_OS

And a tutorial here for R6S:  https://wiki.friendlyelec.com/wiki/index.php/NanoPi_R6S#Install_OS_to_eMMC  

All you have to do to install is...
1. Plug in a microSD card to your computer.
2. Download the appropriate image (usually the eflasher) from the FriendlyWrt wiki wiki 
3. Get win32diskimager and launch it.
4. On win32diskimager select your image file that you downloaded and select your microSD drive letter. Then flash!
5. After flashing is done eject microSD and unplug.
6. Plug in microSD into your NanoPi and wait for it to flash (LEDs pictured below for reference)

image.png


7. Hook up WAN to your modem. Hook up LAN to either your switch which connects to a computer or hook up LAN directly to your computer.
8. Power on. Wait about 3 minutes.
9. On the computer that is connected to the switch or NanoPi's LAN port. Go to web browser and enter in http://192.168.2.1 to access your router

That's it! All that is left is to configure SQM with fq_codel as shown . There's no need to install luci-app-sqm because the FriendlyWrt image has everything already! You just need to enable SQM via the the official openWrt guide or my guide.

Either way feel free to improve it further with the advanced cake config section of this page

If you want to fine tune cake further you can see the section below this page:  https://wiki.stoplagging.com/books/technical-guides/page/sqm-for-up-to-800-mbps-lines-with-openwrt#bkmrk-1.4-advanced-cake-co  

1.3 What Access Point to Get?

I keep hearing raving reviews about the Ubiquiti APs and use one myself. I have extremely stable WiFi with these and never have to reboot them. Ubiquiti also advertises up to 200 concurrent users as well! If you have a recommendation better than these I'd like to know.

Ubiquiti Unifi 6 Pro (Official Link)  

If you plan on only having one Ubiquti AP I recommend installing via the phone so you don't have to bother with more complicated things like AP Controllers.

If you're on a budget and can't buy a dedicated AP.  You can try turning your old router into an access point by putting it into AP mode instead of routing mode. This is important because you should be letting the OpenWrt device do the routing to prevent bufferbloat not your old router.

Another option you could try that I've heard are good are the TP-Link EAP670 (Amazon Referral Link). I have no real world experience with these as I don't own any, but I heard they are solid products in the /r/homenetworking community.

Facts about WiFi

If you need more coverage you should get more APs not one single AP with a bunch of antennas, because those are marketing gimmicks.

WiFi has limited range due to the physics of their frequency bands.

5Ghz can handle more bandwidth, but will usually be about half the range of 2.4Ghz.

1.4 Advanced Cake Configuration

This section is for my own reference and these were recommended by the official docs: https://openwrt.org/docs/guide-user/network/traffic-shaping/sqm-details#sqmqueue_discipline_tab  

It's not necessary to do this but if you want even further ping stability under load it might be worthwhile!

Under the Queue Discipline tab of SQM.  
Enable the checkmark for advanced configuration and save& apply.  

This turns on squash_dscp, squash_ingress, ECN on ingress and NOECN on egress. Leave them as defaults as they are good the way they are. (If you have symmetrical fiber then ECN can be enabled on egress.

Next checkmark and enable "Dangerous Configuration" which is below the "Advanced Configuration" section. We are going to disable triple-isolate and enable per host isolation... Here's a short explanation.

To quote the docs, by default, cake will use triple-isolate: “which will first make sure that no internal or internal host will hog too much bandwidth and then will still guarantee for fairness for each host. In that mode, Cake mostly does the right thing. It would ensure that no single stream and no single host could hog all the capacity of the WAN link. However, it can’t prevent a BitTorrent client – with multiple connections – from monopolizing most of the capacity.” You can enable per host isolation, which will identify all source/destination information.

To enable that,
Add the following to the “Advanced option strings” (in the Interfaces → SQM-QoS page; Queue Discipline tab, look for the Dangerous Configuration options):

For queueing disciplines handling incoming packets from the internet (internet-ingress): nat dual-dsthost ingress

For queueing disciplines handling outgoing packets to the internet (internet-egress): nat dual-srchost

For me that means Qdisc options (ingress) I wrote in "nat dual-dsthost ingress" while for
Qdisc options (egress) I wrote in "nat dual-srchost"

1.5 Performance Tweaks (no longer relevant)

Both R4S and R6S machines no longer need the performance tweaks below anymore as it's built into the firmware past August 2022.

Performance Tweak: Tweak processor affinity to improve performance even further (This is not needed after the 2022.08.03 firmware. See the blue info box below). References used: 1, 2, 3

## Performance Tweaks Quick Reference R4S
Binary = hex = cpu core
000001 = hex 1 = cpu core 0 (A53) selected
000010 = hex 2 = cpu core 1 (A53) selected
000100 = hex 4 = cpu core 2 (A53) selected
001000 = hex 8 = cpu core 3 (A53) selected
010000 = hex 10 = cpu core 4 (A72) selected
100000 = hex 20 = cpu core 5 (A72) selected


001111 = hex f = cpu cores 0, 1, 2 and 3 selected
111111 = hex 3f = cpu cores 0, 1, 2, 3, 4 and 5 selected
110000 = hex 30 = cpu cores 4 and 5 selected

Just use binary and covert it to hex, 1 = select that cpu core and 0 = unselect that cpu core.

## Performance Tweaks Quick Reference R6S
Binary = hex = cpu core
00000001 = hex 1 = cpu core 0 (A55) selected
00000010 = hex 2 = cpu core 1 (A55) selected
00000100 = hex 4 = cpu core 2 (A55) selected
00001000 = hex 8 = cpu core 3 (A55) selected
00010000 = hex 10 = cpu core 4 (A76) selected
00100000 = hex 20 = cpu core 5 (A76) selected
01000000 = hex 40 = cpu core 7 (A76) selected
10000000 = hex 80 = cpu core 8 (A76) selected

Combination reference

00001111 = hex 0f = cpu cores 0, 1, 2 and 3 selected
11111111 = hex ff = cpu cores 0, 1, 2, 3, 4, 5, 6, and 7 selected pretty much all of them
11110000 = hex f0 = cpu cores 4, 5, 6, 7 selected
00110000 = hex 30 = cpu cores 4, 5 selected
11000000 = hex c0 = cpu cores 6, 7 selected

## Step1: Get CPU Frequencies to confirm that cores 4 and 5 are the faster cores.
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq

## Step2: Get IRQ #s (In my example they are 31 for eth0 and 88 for eth1 yours may be different)
grep eth /proc/interrupts

 31:     442034   72493104  337387798          0      47352          0     GICv3  44 Level     eth0
 88:          0          0 1463111677  994946912          0          0   ITS-MSI 524288 Edge      eth1

## Optional Step: List Cores Assigned Current Affinity
cat /proc/irq/31/smp_affinity
cat /proc/irq/88/smp_affinity

## Optional Step: List Cores Assigned Current Queues
cat /sys/class/net/eth0/queues/rx-0/rps_cpus
cat /sys/class/net/eth1/queues/rx-0/rps_cpus

## Step 3: The Performance Tweaks. Putting affinities on Faster A76 Cores. Queues to spread evenly on slower A55 Cores.

#ETH0 irq on core 4,5 (a76 core)
echo -n 30 > /proc/irq/31/smp_affinity

#ETH1 irq on core 6,7 (a76 core)
echo -n c0 > /proc/irq/88/smp_affinity

#ETH0 queues on all CPU cores
echo -n ff > /sys/class/net/eth0/queues/rx-0/rps_cpus

#ETH1 queues on all CPU cores
echo -n ff > /sys/class/net/eth1/queues/rx-0/rps_cpus

#ETH2 queues on all CPU cores
echo -n ff > /sys/class/net/eth1/queues/rx-0/rps_cpus

If you restart Smart Queue Management or change SQM settings, it will reset the CPU affinity and you will need to reset your settings or re-apply them.

1.6 Contact

If you need help or consultation please join my rocket.chat server at https://chat.stoplagging.com/invite/zaMu6X you can message me @Starfroz by looking me up under the globe icon after registering and logging in.

image-1609968043493.png

External Resources for Nano Pi R4S

R4S Benchmarks by Van Tech Corner on Youtube: https://www.youtube.com/watch?v=t5xuTy1xn64

R6S Benchmarks by Van Tech Corner on Youtube: https://www.youtube.com/watch?v=2bCf8Xchrfc  

The video claims that R6S can do 1500Mbps... However when I actually tested on my 1.2 Gbps connection I find that on the R6S cake could not be set higher than 820-830 Mbps. If I do set it higher than that I get regular high max ping spikes due to 100% CPU usage on one of my R6S cores.

Similarly, the R4S couldn't push past 700 Mbps when the video says it could do do 

R4S Performance Tweaking: https://forum.openwrt.org/t/nanopi-r4s-rk3399-4g-is-a-great-new-openwrt-device/79143/406