It would be appreciated if you could help me continue to provide valuable network engineering content by supporting my non-profit solitary efforts. Your donation will help me conduct valuable experiments. Click here to donate now.
In part one of this series, I talked about the theoretical aspects, pros, and cons of a Multi-WAN setup. So now in part two, I will show how one can implement this setup in their network setup and this will be a lengthy article.
- A router that is running either enterprise or open-source network OS.
- Like RouterOS.
- Or like VyOS or pfSense for open-source.
- More than one uplink to the same ISP or a different ISP.
- Minimum 1GbE on all your devices’ interfaces/ports.
- Bridging the ISP’s CPE device to ensure there is no double/triple/quadruple NAT situation.
- For this example, I will use RouterOS v6 stable.
- Hardware: RB450Gx4.
- Only IPv4 config will be covered due to the lack of proper IPv6 support in RouterOS v6 stable at the time of this article, but it would essentially be the same thing.
- I will be using PCC and Nth together in order to achieve bandwidth aggregation.
Whatever is shown here using RouterOS can be replicated more or less on any other network OS like VyOS etc. Go through the vendor’s documentation if you are not on RouterOS.
- I will assume you have already taken care of the basic configuration such as securing the router, firewalling, NATting etc.
- Regarding NAT, use masquerade instead of src NAT, the reason being it clears the conn_track table if an interface goes down and hence we can achieve sub 0ms failover effect.
- I will assume you have some basic idea of computer networking and basic ideas on the routing.
- I will assume you have read MikroTik’s documentation on PCC, Nth, Mangle etc.
- Only two uplinks (relevant to the example here).
Let us begin
Step 1 we create two default routes for each uplink with distance attribute to enable the failover effect (meaning if one link goes down, the next immediate link will be used next)
/ip route ###ISP1 has lower distance and hence is the primary link### add check-gateway=ping comment="Default Route for ISP1" distance=1 gateway=pppoe-out1 add check-gateway=ping comment="Default Route for ISP2" distance=2 gateway=pppoe-out2 ###If you have more than two uplinks, you simply increase the distance as required like this: add check-gateway=ping comment="Example Route for ISP3" distance=3 gateway=pppoe-out3###
Step 2, we need to take care of MTU for the WAN interfaces which is applicable to any networking device or OS in this world. If your uplink is using DHCP Client/Static IP address, then by default this is already taken care of with 1500 MTU.
However, with PPPoE, this is not the case and on RouterOS, there are what a user calls “ghost bytes”. I will discuss PPPoE MTU in a future article, but for now, all that you need to do is set the underlying Ethernet interface’s actual MTU to 1520 meaning the interface to which your uplink is connected. As of RouterOS v7.2, MikroTik has fixed the ghost bytes, so please set it to 1508 instead (note I did not bother to update the diagrams below).
Step 3, leave MTU, MRU, MRRU blank to enable auto-negotiation, RouterOS will automatically find the correct MTU value set by your ISP. Along with MRU.
Step 4, now we get started with the multi-WAN rules that will enable load-balancing and bandwidth aggregation without breaking HTTP/HTTPS connections
###Set Passthrough=no to reduce CPU usage for rules that do not need to be re-validated once they've been processed### ###connection-mark=no-mark to prevent re-marking of already marked connections and hence waste CPU cycles### /ip firewall mangle ###Incoming connections through ISP1 must leave through ISP1### add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out1 new-connection-mark=ISP1_conn passthrough=no ###Incoming connections through ISP2 must leave through ISP2### add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out2 new-connection-mark=ISP2_conn passthrough=no ###I am assuming a 50/50 split ratio between the two ISPs# ###We are using dst-address-list=!not_in_internet && dst-address-type=!local to prevent marking LAN-to-LAN traffic### ###We can use PCC to handle HTTP/HTTPS traffic with "both-addresses" attribute to reduce chances of connections being marked more "randomly" which would break the connections as then connections would go through ISP1 and ISP2 more "randomly" and break. However in this case, I used "both-addresses-and-ports" based on my personal experience of traffic working just fine### ###For old school HTTP/HTTPS traffic### ###50% going to ISP1### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local dst-port=80,443 in-interface-list=LAN new-connection-mark=ISP1_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0 protocol=tcp ###50% going to ISP2### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local dst-port=80,443 in-interface-list=LAN new-connection-mark=ISP2_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1 protocol=tcp ###For new school HTTP3 traffic aka QUIC### ###50% going to ISP1### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local dst-port=80,443 in-interface-list=LAN new-connection-mark=ISP1_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/0 protocol=udp ###50% going to ISP2### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local dst-port=80,443 in-interface-list=LAN new-connection-mark=ISP2_conn passthrough=yes per-connection-classifier=both-addresses-and-ports:2/1 protocol=udp ###If you have a third uplink, then the split ratio would be 3/0, 3/1, 3/2### ###Now we will use Nth for non HTTP/HTTPs traffic in order to acheieve bandwidth aggregation### ###50% going to ISP1### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local in-interface-list=LAN new-connection-mark=ISP1_conn nth=2,1 passthrough=yes ###50% going to ISP2### add action=mark-connection chain=prerouting connection-mark=no-mark dst-address-list=!not_in_internet dst-address-type=!local in-interface-list=LAN new-connection-mark=ISP2_conn nth=2,2 passthrough=yes ###Now we will send the marked connections to their appropriate routing table### ###For our marked/split traffic### add action=mark-routing chain=prerouting connection-mark=ISP1_conn in-interface-list=LAN new-routing-mark=to_ISP1 passthrough=no add action=mark-routing chain=prerouting connection-mark=ISP2_conn in-interface-list=LAN new-routing-mark=to_ISP2 passthrough=no ###For the incoming traffic from WAN### add action=mark-routing chain=output connection-mark=ISP1_conn new-routing-mark=to_ISP1 out-interface=pppoe-out1 passthrough=no add action=mark-routing chain=output connection-mark=ISP2_conn new-routing-mark=to_ISP2 out-interface=pppoe-out2 passthrough=no ###Now Finally we add the required routing tables### /ip route add check-gateway=ping comment="Load Balancing Route to ISP 1" distance=1 gateway=pppoe-out1 routing-mark=to_ISP1 add check-gateway=ping comment="Load Balancing Route to ISP 2" distance=1 gateway=pppoe-out2 routing-mark=to_ISP2
That’s it, you now have aggregated bandwidth capability, load balancing capability and HTTP/HTTPS stability using RouterOS.
For P2P networking (assuming all the uplinks have a public IP), you’d need to use a script like this for UPnP to work correctly (enable it on only one WAN interface and let the script handle the rest).
- Use RFC 6890 to build the not_in_internet address list.
- RouterOS will automatically use the main routing table (default routes) should any uplink go down, so the “marked connections” will automatically be routed through whichever uplink is available even though for instance they are marked for ISP2, and ISP2 is down.
In part one, you may have noticed the upload speed is lower than what I claimed and below is the explanation for that.
- ISP1’s MTU = 1460, MRU = 1500
- ISP2’s MTU = 1500, MRU = 1500
The upload bandwidth never reached 300Mbps due to ISP1’s MTU of 1460 which resulted in packet fragmentation and hence affected throughput performance and was able to reach only an average of 170Mbps for upload. ISP2’s MTU is a straight 1500 and was able to reach its advertised speed which I verified by looking at the link rate on the router itself. Since MRU for both links were 1500, download bandwidth was able to reach its advertised speed on both links.
This is why MTU should be taken care of to prevent issues.
Thanks for sharing this. This is working great!
I am a newbie to Mikrotik world & not a networking guy.
I had trouble with accessing my webserver from outside with an earlier config based on some guide. But your config solved this issue.
Wanted your help on few things of you may please?
1. I don’t have the not_in_internet_list setup & have excluded it from Mangle rules. Still the config works. How can that be? Am I missing something here.(I can share my config with you if needed)
2. What would be the hairpin NAT config so that LAN traffic can access webserver internally?
3. I have set up dst-nat for port forwarding on one of the pppoe interface. Can I add the same for my other pppoe interface & make it high availability for users accessing my services?
Thanks in advance.
1. Without that list, LAN to LAN traffic would fail as the traffic would get routed to WAN instead.
3. You can open it up for both WAN interfaces.
Thanks for the response Daryll
I have to convey that for the first time I was able to understand rules & their working apart from the not_in_internet list.
1. I am not able to understand which addresses should be in that list. Does !not_in_internet means in_internet based on plain logic? I have configured DHCP range on 10.10.10.0/24 for LAN interface(Ether-5) & rest are the IP’s assigned by ISP which are handled in PPPOE dialling interfaces.
2. Will setup Hairpin NAT based on the guide
3. I have opened up the WAN interface and found out that my ISP is redirecting port 80 to their Mikrotik switch login page 😐 Have dropped them an email for correction. Will update on this once it is resolved from ISP’s end. One good thing is, I will now try to set MTU as 1520 on 1 ISP for Jumbo Frame support(based on your both articles as I now know about ISP’s hardware)
1. ALL the subnets from RFC6890 should be added to the list.
Thanks for the info Daryll. Added the list. Not sure how to test. But looks good, IMHO.
I have a strange observation with MTU. Whatever MTU I set 1480, 1500, 1520 it reverts to 1500 / 1480 for one ISP & other ISP’s PPPOE authentication fails if MTU is changed from 1480. Is it supposed to work this way?
Follow exactly what I did in the article. I clearly mentioned to not specify the MTU manually on the PPPoE interface.
And even then your ONT needs to support jumbo frames.
guy I have zero knowledge of microtik. I set up configuration by your following instruction. And I still have a problem with some https: website(just some of them just like http://www.teenee.com Its long time loading) and some application such as LINE messenger(sometime can’t download picture and files).
please advice if you have any solution that can fix this issue
and can I use per-connection-classifier=srt-addresses for all classifier
I read from some article and it mention that it can be solve https problem right?
Without seeing your configuration, I can’t tell what you did wrong.
Dump the config with:
/export file myconfig
Email it to [email protected]
thank you very much for your fast reply here is myconfig
Your config is convoluted and wrong, why would you disable NAT Traversal helpers? What’s the point of VLAN 0’s? What’s with removeRoute? Where’s not_in_internet address list?
I would reset it to null config and start again from scratch.
thank you for reply guy
/ip firewall address-list
add address=10.0.0.0/8 list=LAN
removeRoute is from script to auto connect ppp and add route guy here is full script
i add in ppp profile
i have no idea about disable NAT Traversal
i was setup its from null config guy
Where’s RFC6890 subnets in not_in_internet? Why are you using scripts when routes can failover based on reachability? This is NOT null config!
Reset the router and start from scratch.
Why are you using scripts when routes can failover based on reachability?
Its dynamic ip and i have to use script guy i have no problem with pppoe and static ip from wan router
Where’s RFC6890 subnets in not_in_internet?
10.0.0.0/8 is this right guy?
/ip firewall address-list
add address=10.0.0.0/8 list=LAN
i was name it as LAN guy
here is myscript after i cut all /r/n make its easy to read guy thanks
guy after i disabled this 2 mangle rule
add action=mark-connection chain=prerouting comment=”PCC TO HANDLE HTTP/HTTPS ” connection-mark=no-mark disabled=no dst-address-list=!LAN dst-address-type=!local dst-port=80,443 in-interface=bridge-lan new-connection-mark=ncm-ais passthrough=yes per-connection-classifier=src-address:2/0 protocol=tcp
add action=mark-connection chain=prerouting connection-mark=no-mark disabled=no dst-address-list=!LAN dst-address-type=!local dst-port=80,443 in-interface=bridge-lan new-connection-mark=ncm-cat passthrough=yes per-connection-classifier=src-address:2/1 protocol=tcp
as above in example is
###For old school HTTP/HTTPS traffic###
###50% going to ISP1###
its works without any problem guy i don’t understand this
The issue I see with this kind of traffic identification is that you never know in what port is HTTP/HTTPS going to be in every network, so you may break some sites that use a non-standard http/https ports (80,443), another issue I see is that nth isn’t really doing a proper per packet load sharing because packets are being marked by connections and the linux iptables mangle engine will respect that over any packet routing mark, if it really was doing a per-packet load sharing you will be experience lots of protocol breaks because they’re not expecting connections to come from different path or ip address, they will ask? you don’t have an stablished connection with me?, so drop the previous connection because will timeout and stablish a new one, and that is basically a retransmission.
I’ve been using this setup for years and never saw a problem with non-standard HTTP/HTTPs ports, as anyways the Nth rules mark the connection.
The Nth here is not per-packet load balancing, it is simply preferred to traditional PCC as it is more randomised on a per-packet basis for connection marking and hence gives us better load distribution than just traditional PCC. Which can be proven by multi-threaded download/upload.
If you want true native per-packet load balancing, you need your own ASN with a multi-homing IP Transit setup.
You’re welcome to use any variation of the original setup, but for me, I’m using this config in all production cases and no reported cases have been reported for broken traffic or protocols. You can also use WireShark on the WAN interfaces and see that no traffic breaks/drops with my config.
Nice idea for load balance…
But how can I prioritize traffic (like VoIP or ICMP…etc or limit YouTube traffic) with queue trees and still get benefit from your loadbalancing configurations?
I opted for simple queues as it gets the job done for me. You’d have to research implementing queue trees, with something like this: https://forum.mikrotik.com/viewtopic.php?f=23&t=73214
Many thanks for this from another newbie with this use case, cool!
One thing I don’t understand: why is the classifier different for ISP1 for tcp?
[ISP1_conn, tcp]: per-connection-classifier=both-addresses:2/0
[ISP2_conn, tcp]: per-connection-classifier=both-addresses-and-ports:2/1
[ISP1_conn, udp]: per-connection-classifier=both-addresses-and-ports:2/0
[ISP2_conn, udp]: per-connection-classifier=both-addresses-and-ports:2/1
Thank you for pointing this out, it’s a typo. I’ve fixed it.
Nifty, got it. Just trying to puzzle out something similar at home. Thanks!
Thanks for your answer… but can you give me a hint how you do implement traffic prioritize with simple queues and using your load balance script? I mean, how to mangle what is already mangled!!?
I don’t know, I never looked into it.
This setup is working great since I had configured it last time.
I now need to add OpenVPN Server to this setup so that I can connect clients from outside to this network.
I have tried to setup OpenVPN server but client is unable to connect.
Can you help with any specific points that I should look out for or missing in such a scenario?
Thanks in advance.
There are many reasons for why it may not be working. I would suggest posting your issue and config dump on: forum.mikrotik.com
Got it Daryll. Thanks for the prompt response. Cheers.
I was trying to apply defconf filter rules in Firewall & I have observed 2 issues when I enable them:
1. Many packets are dropped in ! in LAN rule i.e. Rule # 5 specifically most of the traffic going to WAN 2
2. Connecting to a website take more time; about a couple of seconds more as compared to when all filter rules are off.
Can you please help understand what is going on? Any security rules that you suggest to secure the device?
Sharing config export : https://pastebin.com/Ttt46Ruh
Thanks a ton man!
First, thank you for taking the time to write this up!
Your comment block inside the code says that ‘both-addresses’ is needed to avoid breaking HTTPS, but you have fixed the typo by changing the property to ‘both-addresses-and-ports’ for all 4 cases. Did you mean to change it to ‘both-addresses’ instead?
Hi, how to QoS if we have implemented the Load Balancing rules already in the mangle?
Use simple queues for QoS of each egress interface.
Okay.. But if we have to identify them by protocols, do we still have to perform a mark connection again? Or we’ll directly use the mark packet and use the previous connection mark (ex: ISP1, ISP2).
I don’t understand your use case. But for routing, the packet mark is always required and the connection marking will be what determines which connections/packets gets marked.
I haven’t used this type of setup in two years. I recommend you test and find out what works best.
just seeing your website here and you are doing really good job..
hope you can keep inspire people.
I have a question for your mangle configuration,
if you do it and let’s say your 1st ISP is down, is the traffic going 100% percent to the 2nd ISP?
I am running the same PCC load balancing, holding on two ISP with 1Gbps and 500 Mbps Bandwidth coming in,
PCC both address with 3/0 and 3/1 to 1st ISP and 3/2 to second ISP.
based on my experience, whenever my 1st ISP down, the fail over is running and the traffic go to my 2nd ISP, but….. it’s only like 33%, since I divide the 66% remaining to 1st ISP (based on PCC configuration I set)
and I have to disable the PCC configuration first to have the 100% traffic going to 2nd ISP,
do you get my point? is there any light you can share with this issue
appreciate your concern and efforts on loving networking man.
100% of traffic should failover. If it doesn’t then something is wrong. Re-do the configuration from the ground up.
###Incoming connections through ISP1 must leave through ISP1###
add action=mark-connection chain=prerouting connection-mark=no-mark in-interface=pppoe-out1 new-connection-mark=ISP1_conn passthrough=no
For the above Rule, I see in some configurations that many people recommend chain=input
Can you tell me the difference in this case?
Input means destination = local host itself i.e. router. It will not mark traffic destined towards the hosts i.e. forward chain. The whole point of prerouting is to be direction-agnostic. Not sure who suggested you to use input, that’s plain stupid. Learn the netfilter packet flow properly.
The configuration is stable for me. I have more than 1000 clients and no problems right now. Thanks a lot! But I have one question, if we have one mangle for split connection (example : fb, yt, instagram) where we can write? I mean which order. Thank you for answer
Can you write same config for Ros 7?
I currently do not have the spare capacity to write fresh config and test for ROSv7. However, I know other people have done it successfully, and would recommend you ask in my public group here for help.