Skip to content

Edge Router & BNG Optimisation Guide for ISPs

Last updated on June 5, 2023

It would be appreciated if you could help me continue to provide valuable network engineering content by supporting my non-profit solitary efforts. Your donation will help me conduct valuable experiments. Click here to donate now.

This guide provides configuration instructions for MikroTik RouterOS, but the principles can be applied to other Network Operating Systems (NOSes) as well. The guide will be updated regularly as new technologies, use cases, and more efficient configurations are discovered.

Many ISPs around the globe use MikroTik RouterOS to provide access to their customers via BNGs over PPPoE and for various other roles such as edge routers. In this guide, I will explore common issues and solutions along with best practices.

This guide is also available on the APNIC Blog, but it is not frequently updated there. I recommend you follow the source here for the most up-to-date information.

A brief history of this project

  • The configuration was first tested and deployed on AS135756 (small-sized ISP) with its proprietor and my peer Mr. Varun Singhania.
  • In 2021-22, I tested the configuration further as a downstream customer on AS132559 (IP Transit provider & medium-sized ISP), where I was able to assess the impact and config changes both as an end-user and a consultant.
  • As of 2022-23 I tested the configuration on my own network (AS149794), including the firewall rules, to ensure it would work in any environment as long as the instructions are followed. The tests confirmed that the configuration does not disrupt layer 4 protocols or cause problems for end-users in the last mile.

A few things to keep in mind

  • RouterOS is based on the Linux Kernel. As of RouterOS v7.7 it still uses legacy iptables for packet filtering instead of nftables, which has a negative impact on performance.
  • The guide will be focused on RouterOS v7 as it is the current version of RouterOS.
  • This guide assumes the reader has a basic understanding of typical use cases and technologies/protocols used in an ISP/Telco production environment.
  • This guide focuses on layer 2-4 configuration (and occasionally up to layer 7) by following various RFCs and BCOPs. It is not a network architecture guide, for which Kevin Myers’s guide is recommended.
  • Most (virtually everything) on this article has been tested on RouterOS v7.6 (stable + 7.6 RouterBOARD firmware).

Basic Router Terminology and overview

  • An edge or border router is an inter-AS router that is used for connecting different networks, such as transit, IXP, or PNIs.
    • It is important to keep an edge router stateless i.e. without connection tracking (stateful firewall filter rules or NAT), to avoid performance issues and vulnerability to DDoS attacks.
    • Do not use an edge router for customer delegation, as it will become stateful.
    • Do not confuse an edge router with a Provider Edge router, which is an MPLS-specific terminology.
  • A core router is not typically present in modern networks that follow a collapsed core topology.
  • BNGs, also known as access layer routers, are used for customer delegation tasks such as PPPoE, DHCP, and CGNAT. They are stateful in nature. Some people may also refer to them as BRAS or NAS (Network Access Servers), all of which are synonyms in my opinion.

General Configuration Changes

Below are the general guidelines that should be applied on all MikroTik devices for optimal performance and security.

  • Upgrade RouterOS and the RouterBOARD firmware to the latest stable (or long-term if available) v7 releases, Use this command to enable firmware auto upgrade: “/system routerboard settings set auto-upgrade=yes”. Remember to reboot the router twice after the RouterOS upgrade to ensure firmware gets automatically upgraded.
  • Implement basic security measures, including reverse path filtering and enabling TCP SYN cookies, for which the latter two are found in IP>Settings.
    • For rp-filter use loose mode when a device is behind asymmetric routing or when in doubt, use strict mode when a device is behind symmetric routing.

IPv6

IPv6 Router Advertisements (RA) are used for SLAAC and in MikroTik it is called Neighbor Discovery (ND) which is a bit confusing as ND is an umbrella encompassing various protocols and behaviours and not only RAs.

IPv6 RA (ND) is enabled by default for all interfaces on RouterOS. This should be disabled to prevent sending RAs randomly out of interfaces that you do not use SLAAC on and for security reasons such as preventing someone from receiving an IPv6 address by connecting a host to a specific port or VLAN along with reducing unnecessary BUM traffic in your network. We disable it using this command:
/ipv6 nd set [ find default=yes ] disabled=yes

You can enable IPv6 RA on a per-interface basis as and when required, i.e. if you set “advertise=yes” for an interface via IPv6>Address, then you need to configure RA/ND for that interface like the example below:
/ipv6 nd add interface=Management_VLAN

Interface Lists

Interface lists help us simplify firewall rule management by enabling us to refer to an entire list in a single rule instead of multiple rules for every interface.

An interface list should only contain layer 3 (L3) interfaces which is an interface with IP addressing attached to it, such as a physical port, L3 sub-interface VLAN, L3 bonding interface or GRE interface.

The following are basic guidelines for which lists to create and what should be included on those lists:

  • WAN” interface list should contain those interfaces used for connecting to transit, PNI, IXP, upstream peering.
  • LAN” interface list should contain those interfaces used for downstream connectivity to your retail customers or IP Transit customers etc. You should include “dynamic” interfaces to account for PPPoE clients on BNGs.
  • Intra-AS” interface list should contain those interfaces used for connecting one device to another device within the same network such as redundant connectivity between two routers horizontally.
  • Management” interface list should contain those interfaces used exclusively for management.
  • Do not add bridge members individually into any list as they are purely Layer 2 (L2) interfaces.

It is however, important to note: When you are using bridges (which is discussed later in this article), the interface placements depend on how you set up the bridge – If you’re using a single bridge with physical/bonding interfaces as bridge members without any VLAN configuration, then the bridge will be a member of “LAN”. But if you are using VLANs on top of the bridge, then place the VLANs into their appropriate LAN/Intra-AS/Management list based on your local network topology. For example:
“Management VLAN” will be in the management list, or VLAN123 will be in the “intra-AS” or “LAN” list.

Figure-1 (LAN Include Dynamic)

Connection Tracking

  • Disable connection tracking on the edge router and enable loose TCP tracking on all routers using the following commands:
    “/ip firewall connection tracking set enabled=no”
    “/ip firewall connection tracking set loose-tcp-tracking=yes”
  • Use the recommended connection tracking timeout values to improve stability and performance, especially for UDP traffic like VoIP and gaming. If necessary, upgrade the router’s RAM to accommodate these values.
Figure-2 (Recommended Connection Tracking Timeout Values)

Miscellaneous

  • Give the router an accurate system clock by enabling the Network Time Protocol (NTP) client and specifying a reliable NTP server such as this example:
    “/system ntp client set enabled=yes server-dns-names=time.cloudflare.com”

MTU

To ensure reliable network performance, it is essential to configure the MTU consistently across all devices in the path in both L2 and L3. Inconsistent MTU configurations can result in dropped frames or strange behaviours. Additionally, it is essential to minimize IP fragmentation, properly deploy RFC4638, and ensure PMTUD is working for both IPv4 and IPv6. This will help to ensure reasonable auto-detected TCP MSS negotiation values.

Jumbo frames are ideally the way to go about MTU configuration as it’s future-proofing your network for whatever protocols you may throw at it. You should encourage your provider, peers, and customers to also configure jumbo frames on their network.

Bigger frames = more data per frame, meaning less frames required to transmit data, less CPU/resource utilisation required as packets per second flow will decrease.

Guidelines

Layer 2 MTU

L2 MTU, also known as the “underlay MTU” should be configured to the maximum supported value on physical interfaces such as Ethernet ports, SFP and wireless interfaces. This applies to any networking hardware, including routers, switches, and hypervisors. The maximum supported value may vary by vendor or model, but that is okay as the L3 MTU will handle the actual packet size negotiation.

However, it is important to note that, you must ensure the interfaces all have consistently maximum values to minimise the number of MTU profiles on the device – The switch chip or ASIC has limited support for n number of MTU profiles which if exceeded could hurt performance.

By properly configuring the L2 MTU, you can run any protocol you want (such as VXLAN, MPLS, VPLS, or WireGuard) and still have an MTU far greater than 1500 for layer 3 packets, thereby avoiding fragmentation completely on the overlay intra-as.
Example:

  • Edge router (L2 MTU 10k) > BNG (L2 MTU 10k) > Switch (L2 MTU 10k) > Wireless AP (L2 MTU 2290) > Customer (L2 MTU for WAN will be 2290 as it is the smallest in the path)
  • Edge router (L2 MTU 9216k) > BNG (L2 MTU 9216k) > Switch (L2 MTU 9k) > OLT (L2 MTU 9k) > Customer (L2 MTU for WAN will be 9k as it is the smallest in the path)

Layer 3 MTU

Configure it to the maximum allowed value on all interfaces including physical ports. If there is any L2 overhead, such as on a layer 3 sub-interface VLAN, the system will automatically subtract from the underlay and will show us the subtracted L2 MTU, for which you can just copy & paste that into the L3 MTU parameter.

The basic gist of this is, we use the maximum allowed L3 MTU on intra-AS interfaces and even inter-AS.

This allows your downstream transit customers to talk to your network and your customers in jumbo frames – For which, you should inform your customer if you’ve enabled jumbo frames for them, their L3 MTU must match your L3 MTU.

But if for example, you are configuring an interface towards your transit or IXP, then you should ask your provider if they support >1500 MTU and configure accordingly. Some transit providers and IXPs supports 9000 MTU, so we take advantage of that when possible.

Some things to be careful of:

  • If using Stacked VLANs (QinQ), both S and C VLANs should have equal L3 MTU.
  • If your customer equipment does not support high jumbo frame sizes greater than 9000, then simply configure your L3 MTU to match theirs, which is usually 9000 (downstream customer port<>your port).

Example:

  • Edge router (L3 MTU 10k) > BNG (L3 MTU 10k) > Switch (L3 MTU 10k) > Wireless AP (L3 MTU 2290) > Customer (L3 MTU for WAN will be 2290 as it is the smallest in the path)
  • Edge router (L3 MTU 9216k) > BNG (L3 MTU 9216k) > Switch (L3 MTU 9k) > OLT (L3 MTU 9k) > Customer (L3 MTU for WAN will be 9k as it is the smallest in the path)

MTU Scripts

You can automate the MTU configuration using the scripts below. Please run each one separately as I didn’t put delays in between preventing synchronisation, but be mindful to manually configure L2, L3 MTU and advertised L2 MTU for VPLS/Other PPP interfaces.

#Run the ethernet MTU script first before the others#
#Script to autoconfigure max L2/L3 MTU on ethernet ports#
/interface ethernet
:foreach i in=[find] do={
  set $i l2mtu=[/interface get $i max-l2mtu]
  set $i mtu=[/interface get $i max-l2mtu]
}
#Script to autoconfigure max L3 MTU on Layer 3 sub-interface VLAN#
/interface vlan
:foreach i in=[find] do={
  set $i mtu=[/interface get $i l2mtu]
}
#Script to autoconfigure max L3 MTU on Bonding interfaces#
/interface bonding
:foreach i in=[find] do={
  set $i mtu=[/interface get $i l2mtu]
}
#Script to autoconfigure max L3 MTU on VXLAN#
/interface vxlan
:foreach i in=[find] do={
  set $i mtu=[/interface get $i l2mtu]
}
#Script to autoconfigure max L2/L3 MTU on Wireless interfaces#
/interface wireless
:foreach i in=[find] do={
  set $i l2mtu=2290
  set $i mtu=2290
}
#

The screenshots below are only for reference as they are of obsolete advice of mine with 1600 MTU, I will not update the screenshots.

Figure-3 [Ethernet MTU (Jumbo Frames on L2, L3 = 1500 for WAN and 1600 for LAN)]
Figure-4 [L3 MTU for Bonding interfaces (L3 = 1500 for WAN and 1600 for LAN)]
Figure-5 (VLAN L3 MTU = 1600)
Figure-6 [QinQ (Stacked VLANs) L3 MTU = 1600 on both]

Linux Bridge Approach

A Linux bridge is a kernel module that acts as a virtual network switch and is used to forward packets between connected interfaces (also known as bridge ports or members). Many network operators do not follow MikroTik’s official guidelines to properly implement L2/3 using a bridge, which results in degraded performance as hardware offloading and/or bridge Fast Path/Fast Forward becomes unusable along with the inability to perform L2 filtering.

To maximize performance benefits and give you L2 filtering capabilities, it is recommended by MikroTik to create a single bridge per device with all downstream (and intra-AS) interfaces (physical, LACP bonding etc) as bridge members. Tagged/untagged VLANs and hybrid VLANs can be configured using bridge VLAN filtering. Refer to vendor guidelines for model-specific configuration instructions.

If you created an LACP bonding interface between two routers (or switches) for redundancy, you can add the bond interface into the same bridge as a bridge member, where in turn either the bridge itself or the L3 sub-interface VLANs will be an interface list member depending on your topology as discussed in the previous interface lists section.

You can also add your management port to the bridge, and segregate it with VLAN as the other ports, to help keep configuration simple.

A separate bridge can also be created as a loopback interface without impacting physical interface performance. You can assign the “.0” IPv4 address to this interface along with the “::” IPv6 address of an IPv6 subnet for management, testing purposes or for using as the loopback IPs with OSPF.

Below is a sample configuration from a CCR1036 router using MikroTik guidelines along with sample interface lists:

#Layer 3 configuration such as IP addressing is attached to these interfaces#
/interface vlan
add interface=bridge1 mtu=10218 name="Main VLAN" vlan-id=20
add interface=bridge1 mtu=10218 name="Management VLAN" vlan-id=10
/interface bridge
add frame-types=admit-only-vlan-tagged name=bridge1 vlan-filtering=yes
#Loopback interface#
add arp=disabled name=loopback protocol-mode=none
/interface bridge port
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether1 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether2 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether3 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether4 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether5 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether6 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether7 pvid=20
add bridge=bridge1 frame-types=admit-only-untagged-and-priority-tagged interface=ether8 pvid=10
/interface bridge vlan
add bridge=bridge1 comment="Main VLAN" tagged=bridge1 vlan-ids=20
add bridge=bridge1 comment="Management VLAN" tagged=bridge1 vlan-ids=10
#Attaching IP addressing to the interfaces#
/ip address
add address=100.64.2.1/24 interface="Main VLAN" network=100.64.2.0
add address=103.176.189.0 comment="Public Loopback" interface=loopback network=103.176.189.0
add address=100.64.3.1/25 interface="Management VLAN" network=100.64.3.0
#Example for interface lists#
/interface list member
add interface="Main VLAN" list=LAN
add interface="Management VLAN" list="Management Interfaces"

R/M(STP)

I will not deep dive into how STP works, as that is outside the scope of a guide post like this one. However, a few quick things to keep in mind:

  • MikroTik allows us to selectively enable/disable STP/BPDU per-port if required. This may be needed in your network with complex layer 2 designs.

Multicast traffic on the bridge

I personally had a few challenges with multicast traffic/IGMP Snooping best practices, for which I had to reach out to MikroTik support for some clarity. Below are a few basic guidelines to follow based on what I gathered from MikroTik docs and their support team. This is of utmost importance for networks that makes use of multicast routing and traffic for their IPTV services and similar.

  • Be mindful of IGMP Snooping (and IGMP Proxy/PIM) limitations such as tagged VLAN, and features depending on your local network topology.
  • Keep in mind that IPv6 SLAAC will break if you enable multicast querier, for which, you need RouterOS v7.7 onwards to work around this.
  • In a layer 2 network if you are using IGMP Snooping, it should be enabled on all the bridges (devices) involved.
  • You can also enable IGMP multicast querier on all the bridges, only one will get elected with the rest acting as failover in case a device fails.
  • If you are using PPPoE then there’s no such thing as true multicast, because whilst it may multicast on layer 3, it will not be true multicast on layer 2 due to the nature of PPPoE which is a tunnel over layer 2. If you are using DHCP (preferably) or IPoE, then this issue does not apply.

IPv4

I have noticed a lot of operators talking about how short they are on IPv4 addresses – Yet for unknown reasons they like to waste 2 extra addresses for every PTP or inter-router link by using a /30. Please, stop doing that and start using /31s for PTP links as per RFC3021.

However, RouterOS v6+v7 does not support /31 natively, the following is how we do it.

Example below:
Prefix: 103.176.189.0/31

#MikroTik to MikroTik PTP#

#Router A#
/ip address
add address=103.176.189.0 interface=ether1 network=103.176.189.1 comment="/31 Example"
#Router B#
/ip address
add address=103.176.189.1 interface=ether1 network=103.176.189.0 comment="/31 Example"
#Cross vendor PTP#

#Router A Cisco/Juniper/Huawei etc#
interface eth2 address 103.176.189.0/31
#Router B MikroTik side#
/ip address
add address=103.176.189.1 interface=ether1 network=103.176.189.0 comment="/31 Example"

IPv6

As per RFC6164, it is advised to use /127s on PTP links to avoid various forms of network attacks described in the RFC.

However, for ease of management and subnetting, I would advise not to subnet longer (smaller) than a /64. Please click here to learn more about IPv6 architecture and subnetting plan.

Note that on MikroTik, /127s do not work with BGP for unknown reasons and hence the longest prefix size we can use would be a /126.

Example below:
Prefix: 2400:7060::/126

#Advertise=no because we aren't using SLAAC#
/ipv6 address
add address=2400:7060::1/126 advertise=no comment="Peering with Transit" interface=ether1

However, if you look closely, you might’ve noticed that I avoided using the initial zeroes leading interface ID “2400:7060::/126″ and instead used “2400:7060::1/126″. The reason for this is, that in some routers, using the “::” (all leading zeroes) interface ID (address) on a link could cause strange behaviours.

Routing loops with RFC6890 space

I have observed that in most of the networks, including my own personal home lab (AS149794), I find a lot of traffic where source IP = my end hosts or CPE WAN IP (either it is CGNAT IP or public IP), but destination IP = unused RFC6890 blocks. This is why I (and MikroTik themselves) created a forward rule to drop RFC6890 from escaping to WAN.

Now let us step back and think about this: The majority of the ISPs do not implement these filter rules, which means that traffic from customers whereby dst-IP=RFC6890 is forwarded from their CPE to the BNGs, and from there the underlying L3/L2 paths will carry it all the way to the edge router, where further, goes towards your transit or peers if there is a default route. If there is no default route or more specific route for any given dst-IP matching RFC6890 blocks, it would simply loop back and forth until the TTL expires, which means wasted resources, CPU and bandwidth when your network is at scale and you have thousands of customers. So in order to solve this with a quick fix, I derived a simple yet effective solution – Route RFC6890 blocks to blackhole.

We route all RFC6890 space to black hole directly on the edge routers for well edge cases, but we will also do the same on the BNGs directly.

It will not impact your use of the private space for any given interface/servers etc – Because remember, more specific prefixes always win and hence your private /24s etc will always be preferred over the less specific /10 for example and hence will be accessible. Someone on the MikroTik forum has discussed this a bit, in the past.

IPv4

#RouterOS v7#
#Copy and paste these on both Edge and BNG routers#
/ip route
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=0.0.0.0/8
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=172.16.0.0/12
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.168.0.0/16
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=10.0.0.0/8
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=169.254.0.0/16
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=127.0.0.0/8
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=224.0.0.0/4
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=198.18.0.0/15
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.0.0.0/24
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.0.2.0/24
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=198.51.100.0/24
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=203.0.113.0/24
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=100.64.0.0/10
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=240.0.0.0/4
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.88.99.0/24
add blackhole comment="Blackhole route for RFC6890 (limited broadcast)" disabled=no dst-address=255.255.255.255/32
#RouterOS v6#
#Copy and paste these on both Edge and BNG routers#
/ip route
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=0.0.0.0/8
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=172.16.0.0/12
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.168.0.0/16
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=10.0.0.0/8
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=169.254.0.0/16
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=127.0.0.0/8
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=224.0.0.0/4
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=198.18.0.0/15
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.0.0.0/24
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.0.2.0/24
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=198.51.100.0/24
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=203.0.113.0/24
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=100.64.0.0/10
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=240.0.0.0/4
add type=blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=192.88.99.0/24
add type=blackhole comment="Blackhole route for RFC6890 (limited broadcast)" disabled=no dst-address=255.255.255.255/32

IPv6

#RouterOS v7#
#Copy and paste these on both Edge and BNG routers#
/ipv6 route
add blackhole comment="Blackhole route for RFC6890" disabled=no dst-address=::1/128
add blackhole comment="Blackhole route for RFC6890" disabled=no dst-address=::/128
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=64:ff9b::/96
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=::ffff:0:0/96
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=100::/64
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2001::/23
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2001::/32
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2001:2::/48
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2001:db8::/32
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2001:10::/28
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=2002::/16
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=fc00::/7
add blackhole comment="Blackhole route for RFC6890 (aggregated)" disabled=no dst-address=fe80::/10
#In RouterOS v6, IPv6 blackhole is not supported#

For BNG

QoS

Note: For 2023 and going forward it is recommended to migrate to FQ_Codel to minimise bufferbloat end-to-end on your network. I will share further details in the coming year as you need to deploy it on your entire network backbone for maximum efficiency, not just on BNG or customer simple queues.

There have been decades-long debates on which algorithm to use, and which method to implement the best possible QoS mechanism.

In my testing, I observed the following:

  • Capping on a per-customer basis using a single simple queue worked best
  • As for the algorithm of choice
    • I pick SFQ due to the observed low jitter/bufferbloat phenomenon on the customer side
  • Bufferbloat tested using this tool
    • Keep in mind, high bufferbloat = bad, low bufferbloat = good

I have not included a screenshot for every algorithm as that’s unnecessary, but the test scenario was simple, SFQ compared to the rest of the algorithms, and the result was SFQ gave the best possible bufferbloat score in my testing.

Figure-7 (Simple Queue + PFIFO resulted in high bufferbloat)
Figure-8 (Simple Queue + SFQ resulted in low bufferbloat)

PPPoE

Issues

  • Packet fragmentation due to non-standard 1500 MTU/MRU
    • Typically, ISPs use 1492 or 1480 or some other strange MTU size
    • Both BNG device and customer router need to make use of hacks like TCP MSS Clamping to work around this
    • PMTUD is simply unreliable as per RFC 8900
      • Gets worse with CGNAT because remote end-points cannot determine the MTU of your PPPoE customer behind it
  • Lack of proper routing for PPPoE Clients (Interfaces or Inter-VLANs)
    • Most assume that using a single profile for different PPPoE Servers running on different interfaces will work fine

Solutions

  • The real long term solution is to migrate to DHCP to completely avoid all performance and MTU issues that are exclusively only an issue on PPPoE and similar encapsulation protocols.
  • Deploy RFC 4638
    • Keep in mind that in a network, MTU affects the whole path of L2/L3 devices whether physical or virtual, as long as you follow the MTU section above, you should be good
    • Simply set MTU and MRU to 1500 inside PPPoE Server on the BNG
      • However, if you are interested in the whole jumbo frames to your peers/PNI/IXP etc – You can configure MTU/MRU to fixed 9000 bytes, the reason for 9000 nytes for inter-AS traffic is explained here
        • In order for this to work correctly you need to strictly follow the MTU section
        • If using Wireless APs, then it would 2290-8=2282 bytes
Figure-9 (PPPoE Server MTU/MRU & TCP MSS Clamping config)
  • Disable (and delete!) TCP MSS Clamping rules inside IP>Firewall>Mangle
    • Why set some arbitrary value when you can let the engine determine automatically to ensure optimal performance?
      • MikroTik has long since allowed automatic TCP MSS ClampingMake use of PPP>Profile>Default* to enable TCP MSS Clamping directly on the PPPoE engine. This will do the work for any customer whose MTU/MRU is less than 1500.
    • On the customer side, not all routers can take advantage of RFC4638, such as TP-Link, Tenda etc. For them, MTU will remain capped at 1492.
      • The 1492 limitation on their end won’t cause issues with packet fragmentation as packets would fragment at the source (their routers) before it exits the interface and hits the BNG and TCP Clamping on PPPoE engine takes care of anything coming in from the outside world toward the customer
      • I have observed 1500 MRU when pinging from the outside world. Suggesting some of these consumer routers support 1500 MRU
      • If they are using MikroTik, pfSense, VyOS etc, they can take advantage of RFC4638 aka 1500 MTU/MRU for their PPPoE Client
      • Some ONT/ONU devices have strange behaviour for MTU negotiation where they simply do not allow RFC4638 to work (even in bridge mode), only a few brands like GX, TP-Link, and Huawei have been found to be flawless in my personal testing.

Verify MTU config

If you have properly configured MTU and MSS Clamping as per the steps above, then you should see the following results when testing from customer-side using this tool:

Figure-10 (MTU and TCP MSS correctly working on the internet)

Extra Note on PPPoE

  • Create a single CGNAT pool on a per BNG basis and you can use it for n Number of PPPoE Servers on n number of interfaces
    /ip pool
    add name=CGNAT_Pool comment="100.64.0.0-9 is reserved for each PPPoE Server Gateway/Profile" ranges=100.64.0.10-100.127.255.255
    • Here we are reserving 100.64.0.0-9 for gateway IPs on a per-interface/PPPoE server basis, assuming we only have 10 VLANs/Interfaces
      • Reserve as per your local requirements
  • Local Address in PPP Profile = Gateway IP address
    • One common mistake is using the router’s public IP from the WAN interface as the local address, which I’ve seen could lead to issues like traceroute failures or some strange packet loss, you should be using an address that does not exist in IP>Address
    • Each PPPoE Server needs unique profile/gateway in order to allow inter-VLAN communication between CPEs (which is needed to allow two customers behind a NATted IP to play a P2P Xbox game with each other on different VLANs) and will also ensure a clean network approach
      • If you have 100 PPPoE Servers, there should be 100 unique PPP Profiles with unique local addresses for each
    • Something like this for two servers:
/ppp profile
add change-tcp-mss=yes local-address=100.64.0.1 name=profile1 remote-address=CGNAT_Pool use-upnp=no
add change-tcp-mss=yes local-address=100.64.0.2 name=profile2 remote-address=CGNAT_Pool use-upnp=no
/interface pppoe-server server
add authentication=pap default-profile=profile1 interface=vlan20 keepalive-timeout=disabled max-mru=1500 max-mtu=1500 one-session-per-host=yes service-name=server1
add authentication=pap default-profile=profile2 disabled=no interface=vlan21 keepalive-timeout=disabled max-mru=1500 max-mtu=1500 one-session-per-host=yes service-name=server2

CGNAT

Issues

  • The majority of ISPs are using RFC1918 subnets for CGNAT and can clash with subnets on the customer site
  • Breaks P2P traffic
  • Kills the end-to-end principle
  • Requires proper NAT traversal for various protocols including IPsec
  • Routing Loops will occur for any traffic coming from the outside destined towards the public IP pools

Solutions

  • Make use of the 100.64.0.0/10 subnet as it’s meant for CGNAT usage to prevent clashing on the customer site
  • Enable the NAT traversal Helpers on the Router like the following inside IP>Firewall>Service Ports
Figure-11 (NAT Traversal Helpers on RouterOS)
  • Use a simple netmap rule with IPsec passthrough (will allow customers to initiate IPsec outbound without issues) configured.
  • Use a single NAT rule for all CGNAT customers on a per BNG basis to reduce CPU usage.
    • /ip firewall nat add
      action=netmap chain=srcnat comment="CGNAT rule" dst-address-list=!not_in_internet ipsec-policy=out,none out-interface-list=WAN src-address-list=cgnat_subnets
      to-addresses=103.176.189.0/25
      • Here cgnat_subnets=address list containing CGNAT subnets aka 100.64.0.0/10
      • dst-address-list=!not_in_internet is self-explanatory, anything destined towards private subnets shouldn’t be NATted towards WAN
        • Customers should be able to talk to each other using their CGNAT IP, Xbox makes use of this and is mentioned in RFC 7021. This is equivalent (sort of) to old school days of everyone having a public IP and hence is reachable
    • Enable port forwarding for entire ranges (netmap algorithm + state tracking will handle what gets mapped where)
      • /ip firewall nat
        add action=netmap chain=dstnat comment="Port Forwarding Solution for CGNAT (TCP)" dst-address=103.176.189.0/25 dst-port=1024-65535 protocol=tcp to-addresses=100.64.0.0/10


        add action=netmap chain=dstnat comment="Port Forwarding Solution for CGNAT (UDP)" dst-address=103.176.189.0/25 dst-port=1024-65535 protocol=udp to-addresses=100.64.0.0/10

Below is what MikroTik support had to say about my port forwarding rules

Figure-12 (MikroTik support suggests my port forwarding rules are correct)
  • Avoid Deterministic NAT, the above configuration allows P2P traffic initiated from the inside to be reachable from the outside with various applications that make use of ephemeral ports/UDP NAT punching/STUN etc
  • We were able to successfully seed the official Ubuntu Torrent behind the CGNAT with the above configuration, which can mean only one thing: P2P networking from in-bound established works!
Figure-13 (BitTorrent Seeding Behind CGNAT)
  • We tried with src nat as action for src NAT chain but it resulted in the NATted public IP constantly changing on the customer side and breaking things

Below is what MikroTik support had to say about netmap vs src nat as action for src nat chain

Figure-14 (Src nat = breaks P2P traffic | Netmap = static mapping per client IP)
  • Now we fix routing loops
    • We will use DST NAT to account for remaining traffic such as ICMP and NAT it to a loopback interface
      • Remember to add the bridge to LAN interface list & add the /31 to lan_subnets address list as well
/interface bridge
add arp=disabled comment="For Static Loop Protection" mtu=1500 name=loopback_1 protocol-mode=none
/ip address
add address=192.168.0.1/31 comment="For Static Loop Protection" interface=loopback_1 network=192.168.0.0
/ip firewall nat
add action=dst-nat chain=dstnat comment="Static Loop Protection" dst-address=103.176.189.0/25 to-addresses=192.168.0.1

Subscription Ratio Recommendation

In my extensive testing and observations, when using the above parameters and steps, I was able to have 200 users behind a /30 without any known complaints from them. BitTorrent worked as expected too, this is likely due to the obvious fact that not all users out of 200 will max out 65k connections and hence use up all the IP:Port combination. Where will you find a CPE that can handle 65k NAT entries anyways?

So tl;dr you can use a /30 per 200 users as long as you follow the steps properly and also to be future-proof and safe, ensure you provide IPv6 as well.

End Result

Figure-15 (Your NAT Table should look as dead simple as this one)

Logging compliances for government and regulatory requirements

For CGNAT logging for compliances purpose, you can use Traffic Flow which also adds additional option for NAT events logging in the configuration.

IPv6

Issues

  • Addressing may not be optimally subnetted/broken down
  • ISP may only have something like a single /48 with 5000 customers downstream which exceeds possible /56s out of the /48
  • Not following the proper guidelines for IPv6 deployment
  • Lack of persistent assignment feature on MikroTik
    • This applies to the majority of ISPs even though they may use Cisco, Juniper etc which supports persistent assignment configuration
  • Not properly ensuring that the customer’s WAN side gets a proper single /64
  • Forcing the customer to have only a single /64 on the LAN side instead of /56
  • MikroTik IPv6 RADIUS does not work correctly

Solutions

  • A proper IPv6 architecture and subnetting plan should be implemented
    • However, the logic is simple
      Ensure customers get /64 WAN side and /56 LAN side for home users
      Ensure customers get /64 WAN side and /48 LAN side for enterprise/SMEs/DC etc
  • Ensure you request for appropriate prefix allocation based on your customer base from your Regional Internet registry/Local Internet registry
  • Follow the proper guidelines and BCOPs
  • I came across a solution for the lack of persistent assignment on MikroTik, simply use the following script and schedule it to run every five minutes:
    #Please don't be stupid enough to set owner=Daryll#
    /system script
    add dont-require-permissions=no name=PPPoE-IPv6-Persistent owner=Daryll policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon source=\
    "/ipv6 dhcp-server binding;\r\
    \n:foreach i in=[find server~\"pppoe\"] do={\r\
    \n make-static \$i;\r\
    \n set \$i comment=[get \$i server];\r\
    \n set \$i server=all;\r\
    \n}"

    Use the scheduler for automating it:
    /system scheduler
    add interval=5m name=PPPoE-IPv6-Persistent-AutoUpdate on-event=PPPoE-IPv6-Persistent policy=ftp,reboot,read,write,policy,test,password,sniff,sensitive,romon start-time=startup

Now I will cover a simple configuration use-case where a BNG has exactly 1000 customers. The goal here is to ensure that the WAN side of each customer gets a /64 and the LAN side gets a /56.

  • Disable redirects
    /ipv6 settings set accept-redirects=no
  • Next need to create two separate pools, one for WAN and one for the LAN side of the customer
    • /ipv6 pool
      add name=Customer-CPE-LAN prefix=2405:a140:8::/46 prefix-length=56
      add name=Customer-CPE-WAN prefix=2405:a140:f:d400::/54 prefix-length=64
      • Here, prefix-length specifies what prefix length the customer gets, which in this case as per standards, we are giving the WAN side a /64 and the LAN side a /56
  • And finally, configure the pools to each PPPoE Profile as below
    /ppp profile
    set *0 dhcpv6-pd-pool=Customer-CPE-LAN remote-ipv6-prefix-pool=Customer-CPE-WAN
    add name=profile2 dhcpv6-pd-pool=Customer-CPE-LAN remote-ipv6-prefix-pool=Customer-CPE-WAN
    • Remote IPv6 prefix is for the WAN side of the customer
    • DHCPv6 PD Pool is for the LAN side of the customer
Figure-16 (PPPoE IPv6 configuration)

That’s it, now the customers will dynamically get a routed /64 and routed /56 for WAN and LAN sides respectively.

Verify IPv6 config

If you have properly configured IPv6 as per the steps above, then you should see the following results when testing from customer-side using this tool:

Figure-17 (IPv6 working correctly)

Routing Loop prevention

If a customer happens to go offline (due to power loss etc), traffic destined for those customers will continue to persist until they time out leading to increased CPU usage. To solve this, we simply route aggregated customer prefixes to blackhole – Because remember in routing, more specific prefixes always win, so should those more specific prefixes go offline, the less specific (aggregated) routes take precedence in which case we are routing to blackhole and hence all pending traffic times out with immediate effect to give us optimal CPU usage.

#RouterOS v7 example#
/ipv4 route
add blackhole comment="Blackhole route for Customer CGNAT pool" disabled=no dst-address=103.176.189.0/25
add blackhole comment="Blackhole route for Customer public pool" disabled=no dst-address=103.176.189.128/25
/ipv6 route
add blackhole comment="Blackhole route for Customer LAN pool" disabled=no dst-address=2405:a140:8::/46
add blackhole comment="Blackhole route for Customer WAN pool" disabled=no dst-address=2405:a140:f:d400::/54
#RouterOS v6 example#
/ip route
add type=blackhole comment="Blackhole route for Customer CGNAT pool" disabled=no dst-address=103.176.189.0/25
add type=blackhole comment="Blackhole route for Customer public pool" disabled=no dst-address=103.176.189.128/25
#In RouterOS v6, IPv6 blackhole is not supported#

Firewall/Security

Issues

  • Blocks inbound ports based on the false logic of “protecting” the customer
    • Port blocking does nothing to improve security, it only breaks legitimate traffic such as apps or games that use various methods for VoIP
    • Malware can make use of port 443 and that is the reality of modern-day malware anyway
  • Net Neutrality Violations
    • Such as blocking TCP/UDP traffic destined towards Cloudflare or Google Anycast DNS
  • Lacks basic DDoS protection
  • Lacks simple bogon filtering
  • Lacks basic rules such as dropping invalid traffic on the input chain
  • Lacks FastTracking for traffic destined towards your NATted pools
  • Connection tracking of customers having a public IPv4 address makes no sense and wastes CPU cycles
  • Incorrect ICMPv4/ICMPv6 filtering rules such as rate limiting fragmentation needed and then wonders why customers are facing strange issues with regards to PMTUD

Solutions

  • Remove most “port blocking” rules
    • Customer Site security should be handled on the customer site such as having proper basic firewalling on their Edge Routers
    • I’ve dropped some ports on the RAW table directly
  • Avoid Net Neutrality Violation unless otherwise enforced by your local state or central government
  • I’ve shared the rule for FastTracking NATted pools
  • I’ve shared the rule for reducing connection tracking impact on customers having public IPv4 address
  • I have crafted ICMPv4/ICMPv6 manually to drop all deperecated ICMP types while accepting all valid ICMP types

Below are the generic firewall rules that should be deployed on the BNG to cover basic security grounds.

IPv4 Firewall

#First we take care of address lists#
/ip firewall address-list
#Enter all local subnets/public subnets applicable to your AS for the specific BNG where you've routed pools for use#
#Example I'm using only a /24 public+private pools for this specific BNG#
add address=103.176.189.0/24 comment="Public Pool" list=lan_subnets
add address=192.168.0.0/24 comment="Local interfaces" list=lan_subnets
#The usual CGNAT pool entire range#
add address=100.64.0.0/10 comment="CGNAT Pool" list=lan_subnets
#Here we will enter the public pool used for giving customers public IP addresses directly, this will be used for no-tracking to boost performance of customers having public IPv4 addresses and reduce load on the CPU of the BNG#
add address=103.176.189.0/25 comment="Public Pool" list=public_subnets
###Required for DDoS protection rules###
add list=ddos-attackers
add list=ddos-targets
###Bogon filtering addresses for each of the rules in RAW/Filter###
add address=0.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=172.16.0.0/12 comment=RFC6890 list=not_in_internet
add address=192.168.0.0/16 comment=RFC6890 list=not_in_internet
add address=10.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=169.254.0.0/16 comment=RFC6890 list=not_in_internet
add address=127.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=224.0.0.0/4 comment=Multicast list=not_in_internet
add address=198.18.0.0/15 comment=RFC6890 list=not_in_internet
add address=192.0.0.0/24 comment=RFC6890 list=not_in_internet
add address=192.0.2.0/24 comment=RFC6890 list=not_in_internet
add address=198.51.100.0/24 comment=RFC6890 list=not_in_internet
add address=203.0.113.0/24 comment=RFC6890 list=not_in_internet
add address=100.64.0.0/10 comment=RFC6890 list=not_in_internet
add address=240.0.0.0/4 comment=RFC6890 list=not_in_internet
add address=192.88.99.0/24 comment="6to4 relay Anycast [RFC 3068]" list=not_in_internet
add address=255.255.255.255 comment=RFC6890 list=not_in_internet
add address=127.0.0.0/8 comment="RAW Filtering - RFC6890" list=bad_ipv4
add address=192.0.0.0/24 comment="RAW Filtering - RFC6890" list=bad_ipv4
add address=192.0.2.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=198.51.100.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=203.0.113.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=240.0.0.0/4 comment="RAW Filtering - RFC6890 reserved" list=bad_ipv4
add address=224.0.0.0/4 comment="RAW Filtering - multicast" list=bad_src_ipv4
add address=255.255.255.255 comment="RAW Filtering - RFC6890" list=bad_src_ipv4
add address=0.0.0.0/8 comment="RAW Filtering - RFC6890" list=bad_dst_ipv4
add address=224.0.0.0/4 comment="RAW Filtering - multicast" list=bad_dst_ipv4 disabled=yes
/ip firewall raw
add action=drop chain=prerouting comment="Drop DDoS src and dst address list" dst-address-list=ddos-targets src-address-list=ddos-attackers
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=tcp
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=udp
#Required at least in India to reduce call spam/scam#
add action=drop chain=prerouting comment="Drop outgoing SIP to block call centre scammers" port=5060,5061 protocol=tcp
add action=drop chain=prerouting comment="Drop outgoing SIP to block call centre scammers" port=5060,5061 protocol=udp
add action=accept chain=prerouting comment="Enable this rule for transparent mode" disabled=yes
#If you are using DHCP, change this to accept#
add action=drop chain=prerouting comment="defconf: Drop DHCP discover" dst-address=255.255.255.255 dst-port=67 in-interface-list=LAN protocol=udp src-address=0.0.0.0 src-port=68
add action=drop chain=prerouting comment="defconf: drop bad src IPs" src-address-list=bad_ipv4
add action=drop chain=prerouting comment="defconf: drop bad dst IPs" dst-address-list=bad_ipv4
add action=drop chain=prerouting comment="defconf: drop bad src IPs" src-address-list=bad_src_ipv4
add action=drop chain=prerouting comment="defconf: drop bad dst IPs" dst-address-list=bad_dst_ipv4
add action=drop chain=prerouting comment="defconf: drop non global from WAN" in-interface-list=WAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="defconf: drop forward to private ranges from WAN" dst-address-list=not_in_internet in-interface-list=WAN
#Remember to properly enter all subnets in the lan_subnet list for both your AS public IPv4 blocks and CGNAT/local subnets#
add action=drop chain=prerouting comment="defconf: drop local if not from default IP range" in-interface-list=LAN src-address-list=!lan_subnets
add action=drop chain=prerouting comment="defconf: drop bad UDP" port=0 protocol=udp
add action=jump chain=prerouting comment="defconf: jump to TCP chain" jump-target=bad_tcp protocol=tcp
add action=jump chain=prerouting comment="defconf: jump to ICMP chain" jump-target=icmp protocol=icmp
#Rule for reducing connection tracking impact for public IPv4 customers, we no longer exlucde RFC6890 bound packets as the route to blackhole rules takes care of that#
add action=notrack chain=prerouting comment="Reduce load on conn_track" in-interface-list=LAN src-address-list=public_subnets
add action=accept chain=prerouting comment="defconf: accept everything else from LAN" in-interface-list=LAN
add action=accept chain=prerouting comment="defconf: accept everything else from WAN" in-interface-list=WAN
add action=drop chain=prerouting comment="defconf: drop the rest"
add action=drop chain=bad_tcp comment="defconf: TCP port 0 drop" port=0 protocol=tcp
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=!fin,!syn,!rst,!ack
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,syn
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,rst
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,!ack
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,urg
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=syn,rst
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=rst,urg
add action=drop chain=icmp comment="Drop Source Quench (Deprecated)" icmp-options=4 protocol=icmp
add action=drop chain=icmp comment="Drop Alternate Host Address (Deprecated)" icmp-options=6 protocol=icmp
add action=drop chain=icmp comment="Drop Information Request (Deprecated)" icmp-options=15 protocol=icmp
add action=drop chain=icmp comment="Drop Information Reply (Deprecated)" icmp-options=16 protocol=icmp
add action=drop chain=icmp comment="Drop Address Mask Request (Deprecated)" icmp-options=17 protocol=icmp
add action=drop chain=icmp comment="Drop Address Mask Reply (Deprecated)" icmp-options=18 protocol=icmp
add action=drop chain=icmp comment="Drop Traceroute (Deprecated)" icmp-options=30 protocol=icmp
add action=drop chain=icmp comment="Drop Datagram Conversion Error (Deprecated)" icmp-options=31 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Host Redirect (Deprecated)" icmp-options=32 protocol=icmp
add action=drop chain=icmp comment="Drop IPv6 Where-Are-You (Deprecated)" icmp-options=33 protocol=icmp
add action=drop chain=icmp comment="Drop IPv6 I-Am-Here (Deprecated)" icmp-options=34 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Registration Request (Deprecated)" icmp-options=35 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Registration Reply (Deprecated)" icmp-options=36 protocol=icmp
add action=drop chain=icmp comment="Drop Domain Name Request (Deprecated)" icmp-options=37 protocol=icmp
add action=drop chain=icmp comment="Drop Domain Name Reply (Deprecated)" icmp-options=38 protocol=icmp
add action=drop chain=icmp comment="Drop SKIP (Deprecated)" icmp-options=39 protocol=icmp
/ip firewall filter
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMP after RAW" protocol=icmp
add action=accept chain=input comment="defconf: accept UDP traceroute" port=33434-33534 protocol=udp
#Example to allow access to router's ports from all interfaces LAN/WAN#
add action=accept chain=input comment="Accept Winbox TCP" dst-port=65000 protocol=tcp
add action=accept chain=input comment="Accept API TCP" dst-port=8728 protocol=tcp
add action=accept chain=input comment="Accept API UDP" dst-port=8728 protocol=udp
add action=accept chain=input comment="Accept SNMP for internal use" dst-port=161 protocol=udp
add action=accept chain=input comment="Accept RADIUS UDP" dst-port=1700,1812,1813 protocol=udp
add action=accept chain=input comment="Accept RADIUS TCP" dst-port=1700,1812,1813 protocol=tcp
#End of example#
add action=drop chain=input comment="defconf: drop all not coming from LAN's interface list/subnets" in-interface-list=!LAN
#PPPoE Clients are excluded as to not bypass queues, if using DHCP excluded src and dst address list of customer pool#
add action=fasttrack-connection chain=forward comment="Rule for NAT Accelaration behaviour (Will reduce CPU usage for NATted traffic)" in-interface=!all-ppp out-interface=!all-ppp
add action=accept chain=forward comment="allow already established connections" connection-state=established,related,untracked
add action=jump chain=forward comment="Jump to DDoS detection" connection-state=new in-interface-list=WAN jump-target=detect-ddos
add action=return chain=detect-ddos dst-limit=50,50,src-and-dst-addresses/10s
add action=return chain=detect-ddos dst-limit=50,50,src-and-dst-addresses/10s protocol=tcp tcp-flags=syn,ack
add action=add-dst-to-address-list address-list=ddos-targets address-list-timeout=10m chain=detect-ddos
add action=add-src-to-address-list address-list=ddos-attackers address-list-timeout=10m chain=detect-ddos
#This rule should be redudant as we are now routing RFC6890 to blackhole directly and hence I am commenting it out#
#add action=drop chain=forward comment="Drop tries to reach not public addresses from LAN" dst-address-list=not_in_internet in-interface-list=LAN out-interface-list=WAN#

IPv6 Firewall

I have now added a rule in the raw table to drop header 0, 43 as per this, now the linked article also suggests dropping header 60, but I decided to not drop header 60 for reasons stated in the re-tweet here – Please note, this only works in ROS v7.4 onwards as there is a bug that was fixed in that version and going forward.

I have now also removed the forward rules completely to improve performance and moved them to the raw table.

/ipv6 firewall address-list
#Enter all the public prefixes that you've routed to this particular BNG#
#We will use this to block spoofed IPv6 coming from customers#
#We will also use this for no-tracking to boost performance of customers having behind the public IPv6 addresses and reduce load on the CPU of the BNG#
#example#
add address=2405:a140:8::/46 comment="CPE-LAN-Pool" list=lan_subnets
add address=2405:a140:c::/54 comment="CPE-WAN-Pool" list=lan_subnets
#Example of any IPv6 you're using on the BNG towards downstream switches/devices/VMs etc#
add address=2405:a140:e::/48 comment="Backbone-Pool" list=lan_subnets
#To prevent breaking link-local#
add address=fe80::/10 comment="Link-local" list=lan_subnets
#Add your BGP peers here, example below#
add address=2400:7000:1::/126 comment="Peering with Transit on VLAN100" list=bgp_peers
#Copy Paste all the following#
add address=::/3 comment="IPv6 invalids" list=not_in_internet
add address=4000::/3 comment="IPv6 invalids" list=not_in_internet
add address=6000::/3 comment="IPv6 invalids" list=not_in_internet
add address=8000::/3 comment="IPv6 invalids" list=not_in_internet
add address=a000::/3 comment="IPv6 invalids" list=not_in_internet
add address=c000::/3 comment="IPv6 invalids" list=not_in_internet
add address=e000::/4 comment="IPv6 invalids" list=not_in_internet
add address=f000::/5 comment="IPv6 invalids" list=not_in_internet
add address=f800::/6 comment="IPv6 invalids" list=not_in_internet
add address=fc00::/7 comment="IPv6 invalids" list=not_in_internet
add address=fe00::/9 comment="IPv6 invalids" list=not_in_internet
add address=fec0::/10 comment="IPv6 invalids" list=not_in_internet
add address=2001::/23 comment="IPv6 invalids" list=not_in_internet
add address=2001:2::/48 comment="IPv6 invalids" list=not_in_internet
add address=2001:10::/28 comment="IPv6 invalids" list=not_in_internet
add address=2001:db8::/32 comment="IPv6 invalids" list=not_in_internet
add address=2002::/16 comment="IPv6 invalids" list=not_in_internet
add address=3ffe::/16 comment="IPv6 invalids" list=not_in_internet
#We will use this to eliminate the need for stateful firewalling on IPv6 to catch spoofed traffic in the raw table instead of forward chain#
add address=2000::/3 list="global_unicast_prefix(es)"
add address=fe80::/10 list=allowed
add address=ff02::/16 comment="multicast" list=allowed
add address=fe80::/10 comment="defconf: RFC6890 Linked-Scoped Unicast" list=no_forward_ipv6
add address=ff00::/8 comment="defconf: multicast" list=no_forward_ipv6
add address=::1/128 comment="defconf: lo" list=bad_ipv6
add address=::ffff:0:0/96 comment="defconf: ipv4-mapped" list=bad_ipv6
add address=::/96 comment="defconf: ipv4 compat" list=bad_ipv6
add address=2001:db8::/32 comment="defconf: documentation" list=bad_ipv6
add address=2001:10::/28 comment="defconf: ORCHID" list=bad_ipv6
add address=2001::/23 comment="defconf: RFC6890" list=bad_ipv6
add address=::/128 comment="defconf: unspecified" list=bad_dst_ipv6
add address=::/128 comment="RAW Filtering" list=bad_src_ipv6
add address=ff00::/8 comment="RAW Filtering" list=bad_src_ipv6
/ipv6 firewall raw
#New rule to drop deprecated header type 0 & 40#
#Works only on ROS v7.4 onwards#
add action=drop chain=prerouting comment="Drop packets with extension header types 0, 43" headers=hop,route:contains
add action=accept chain=prerouting comment="defconf: RFC4291, section 2.7.1" dst-address=ff02::1:ff00:0/104 icmp-options=135:0-255 protocol=icmpv6 src-address=::/128
#Migrated this rule from the foward chain to make it more CPU efficient#
add action=drop chain=prerouting comment="defconf: rfc4890 drop hop-limit=1" hop-limit=equal:1 in-interface-list=!LAN protocol=icmpv6
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=tcp
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=udp
#This is required for traffic whereby the SRC may be Link-local and the DST is GUA for BGP peers particuarly in IXPs#
add action=accept chain=prerouting comment="Accept all ICMPv6 traffic from BGP peers (Required for LL<>GUA packets)" icmp-options=!154:4-5 in-interface-list=WAN protocol=icmpv6 src-address-list=bgp_peers

add action=drop chain=prerouting comment="Drop invalids from WAN" dst-address-list="global_unicast_prefix(es)" in-interface-list=WAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="Drop forwarded invalids from WAN" dst-address-list=not_in_internet in-interface-list=WAN src-address-list="global_unicast_prefix(es)"
add action=drop chain=prerouting comment="Drop invalids from LAN" dst-address-list="global_unicast_prefix(es)" in-interface-list=LAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="Drop forwarded invalids from LAN" dst-address-list=not_in_internet in-interface-list=LAN src-address-list=lan_subnets

#This rule replaces the need for forward chain rule for doing the same thing#
add action=drop chain=prerouting comment="Drop spoofed traffic from LAN going towards Global Unicast" dst-address-list="global_unicast_prefix(es)" in-interface-list=LAN src-address-list=!lan_subnets
add action=accept chain=prerouting comment="defconf: enable for transparent firewall" disabled=yes
add action=drop chain=prerouting comment="defconf: drop bogon IP's" src-address-list=bad_ipv6
add action=drop chain=prerouting comment="defconf: drop bogon IP's" dst-address-list=bad_ipv6
add action=drop chain=prerouting comment="defconf: drop packets with bad src ipv6" src-address-list=bad_src_ipv6
add action=drop chain=prerouting comment="defconf: drop packets with bad dst ipv6" dst-address-list=bad_dst_ipv6
add action=accept chain=prerouting comment="defconf: accept local multicast scope" dst-address=ff02::/16
add action=drop chain=prerouting comment="defconf: drop other multicast destinations" dst-address=ff00::/8
add action=drop chain=prerouting comment="defconf: drop bad UDP" port=0 protocol=udp
add action=drop chain=prerouting comment="defconf: drop bad TCP" port=0 protocol=tcp
add action=jump chain=prerouting comment="defconf: jump to ICMP chain" jump-target=icmpv6 protocol=icmpv6
#Since all filtering for LAN is done in RAW, we do not need to have stateful tracking for LAN, and hence we are notracking all LAN originating/bound traffic after filtering#
add action=notrack chain=output comment="Reduce load on conn_track" in-interface-list=LAN
add action=notrack chain=output comment="Reduce load on conn_track" out-interface-list=LAN
add action=notrack chain=prerouting comment="Reduce load on conn_track" in-interface-list=LAN
add action=notrack chain=prerouting comment="Reduce load on conn_track" dst-address-list=lan_subnets in-interface-list=WAN
add action=accept chain=prerouting comment="defconf: accept everything else from WAN" in-interface-list=WAN
add action=accept chain=prerouting comment="defconf: accept everything else from LAN" in-interface-list=LAN
add action=drop chain=prerouting comment="defconf: drop the rest"
add action=drop chain=icmpv6 comment="Drop FMIPv6 HI + FMIPv6 HAck - Deprecated (RFC5568)" icmp-options=154:4-5 protocol=icmpv6
/ipv6 firewall filter
add action=accept chain=input comment="defconf: accept established,related,untracked" connection-state=established,related,untracked
add action=drop chain=input comment="defconf: drop invalid" connection-state=invalid
add action=accept chain=input comment="defconf: accept ICMPv6" protocol=icmpv6
add action=accept chain=input comment="defconf: accept UDP traceroute" port=33434-33534 protocol=udp
add action=accept chain=input comment="defconf: accept DHCPv6-Client prefix delegation." dst-port=546 protocol=udp src-address=fe80::/10
#Example to allow access to router's ports from all interfaces LAN/WAN#
add action=accept chain=input comment="Accept Winbox TCP" dst-port=65000 protocol=tcp
add action=accept chain=input comment="Accept API TCP" dst-port=8728 protocol=tcp
add action=accept chain=input comment="Accept API UDP" dst-port=8728 protocol=udp
add action=accept chain=input comment="Accept SNMP for internal use" dst-port=161 protocol=udp
add action=accept chain=input comment="Accept RADIUS UDP" dst-port=1700,1812,1813 protocol=udp
add action=accept chain=input comment="Accept RADIUS TCP" dst-port=1700,1812,1813 protocol=tcp
#End of example#
add action=accept chain=input comment="allow allowed addresses" src-address-list=allowed
add action=drop chain=input comment="defconf: drop everything else not coming from LAN" in-interface-list=!LAN
#All forward rules have been migrated to the RAW table for BNGs, so better performance and no stateful tracking required for customers#

For Edge Router

The purpose of the Edge router is to route as fast as possible. So, with that in mind, along with the basic general changes I’ve mentioned at the beginning of this article, the following should also be kept in mind:

  1. No NAT
  2. No connection tracking aka stateful firewalling (filter table on the firewall section)
    • If you enable stateful firewalling on the edge, the router will die in case of DDoS attacks or even just heavy traffic in general
  3. No fancy “features” (like Hotspot, PPPoE)
    • Use your BNG routers for any customer delegation that is required

BGP Optimisation

This is a work in progress section and at this point in time, I am writing based on my experience with Indian ISPs, so if you’re in the EU/US or other locations, you’re probably already implementing the following:

BGP Timers

Based on Huawei documentation here and here, I personally tested the following configuration and observed that BGP negotiation time and stability (during occasional link flaps/packet loss) improved significantly, so I would recommend network operators to set the same timers globally on their networks (for both eBGP and iBGP) – Keepalive time to 20s, Holdtime to 60s.

  • /routing bgp template
    set default as=149794 disabled=no hold-time=1m keepalive-time=20s

Preferably convince your peers to do the same config on their end as well at least for the individual BGP sessions that are between you and them.

Traffic Engineering and loop prevention

  • Always route your aggregated prefixes [Like say you have a /24 or /22 (IPv4) or /32 or /36 (IPv6)] to blackhole for IPv4+IPv6 to prevent layer 3 looping and stop disabling synchronisation on RouterOS v6, it is anyways mandatory on RouterOS v7 to either route to blackhole or have the prefix assigned to an interface
    • This will also reduce CPU usage whenever downstream routers/users/switches go offline and incomplete traffic from remote hosts/networks keeps trying to establish a connection and since it gets routed to blackhole it will immediately timeout and save resources.
      • In other words, there’s no sense in doing things that increase CPU usage (not routing to blackhole)
      • And there is no sense in avoiding loop prevention mechanisms
    • Example config on my own network (AS149794) on RouterOS v7
      /ip route
      add blackhole comment="Blackhole route" disabled=no dst-address=103.176.189.0/24


      /ipv6 route
      add blackhole comment="Blackhole Route" disabled=no dst-address=2400:7060::/32
      add blackhole comment="Blackhole Route" disabled=no dst-address=2400:7060::/48
  • If you have multi-homing transit
    • Always at the very least, request for partial routing table from all the upstream providers you’re connected to. If the router can handle full tables from the upstreams, go for it!
      • This will ensure your router has the best paths to choose from
      • Stop going with the strange concept of taking only default routes from the upstreams and creating asymmetric routing conditions where outgoing traffic is going via Transit A and incoming traffic is coming in via Transit B.
    • Always advertise all your IP pools to all transit providers to help minimise asymmetric routing which in turn leads to high latency and possibly packet loss in rare cases
      • If you need traffic engineering, you can consider BGP based load balancing or local preferences with some automation like Pathvector
  • If you have a single homing setup
    • Still request for partial table/full table whichever fits your router’s specs in order to futureproof in case you plan to go multi-home

Filtering & Security

We only need to do broadly two things for filtering and security:

  1. Implement MANRS throughout your network (and business)
  2. Use the RAW table to drop remaining bogon/rubbish traffic similar to the one used on the BNG and you can also use it for ACL if you need that
    • CPU usage stays minimal when using the RAW table
    • Absolutely nothing on the filter table i.e. no stateful firewalling
      • The only exception here is we can use FastTrack for untracked traffic i.e. stateless traffic to improve IPv4 routing performance

IPv4 Firewall

#Disable conn_track for using FastTrack statelessly#
/ip firewall connection tracking
set enabled=no
/ip firewall address-list
#Enter all local subnets/public subnets applicable to your AS, use the full CIDR notation of the public IPv4 block assigned to you to avoid missing anything out, please avoid something like /30#
add address=103.176.189.0/24 comment="LAN subnets" list=lan_subnets
add address=192.168.0.0/24 comment="LAN subnets" list=lan_subnets
add address=0.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=172.16.0.0/12 comment=RFC6890 list=not_in_internet
add address=192.168.0.0/16 comment=RFC6890 list=not_in_internet
add address=10.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=169.254.0.0/16 comment=RFC6890 list=not_in_internet
add address=127.0.0.0/8 comment=RFC6890 list=not_in_internet
add address=224.0.0.0/4 comment=Multicast list=not_in_internet
add address=198.18.0.0/15 comment=RFC6890 list=not_in_internet
add address=192.0.0.0/24 comment=RFC6890 list=not_in_internet
add address=192.0.2.0/24 comment=RFC6890 list=not_in_internet
add address=198.51.100.0/24 comment=RFC6890 list=not_in_internet
add address=203.0.113.0/24 comment=RFC6890 list=not_in_internet
add address=100.64.0.0/10 comment=RFC6890 list=not_in_internet
add address=240.0.0.0/4 comment=RFC6890 list=not_in_internet
add address=192.88.99.0/24 comment="6to4 relay Anycast [RFC 3068]" list=not_in_internet
add address=255.255.255.255 comment=RFC6890 list=not_in_internet
add address=127.0.0.0/8 comment="RAW Filtering - RFC6890" list=bad_ipv4
add address=192.0.0.0/24 comment="RAW Filtering - RFC6890" list=bad_ipv4
add address=192.0.2.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=198.51.100.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=203.0.113.0/24 comment="RAW Filtering - RFC6890 documentation" list=bad_ipv4
add address=240.0.0.0/4 comment="RAW Filtering - RFC6890 reserved" list=bad_ipv4
add address=224.0.0.0/4 comment="RAW Filtering - multicast" list=bad_src_ipv4
add address=255.255.255.255 comment="RAW Filtering - RFC6890" list=bad_src_ipv4
add address=0.0.0.0/8 comment="RAW Filtering - RFC6890" list=bad_dst_ipv4
add address=224.0.0.0/4 comment="RAW Filtering - multicast" list=bad_dst_ipv4 disabled=yes
/ip firewall raw
add action=accept chain=prerouting comment="Enable this rule for transparent mode" disabled=yes
#If you are using DHCP, change this to accept#
add action=drop chain=prerouting comment="defconf: Drop DHCP discover on LAN" dst-address=255.255.255.255 dst-port=67 in-interface-list=LAN protocol=udp src-address=0.0.0.0 src-port=68
add action=drop chain=prerouting comment="defconf: drop bad src IPs" src-address-list=bad_ipv4
add action=drop chain=prerouting comment="defconf: drop bad dst IPs" dst-address-list=bad_ipv4
add action=drop chain=prerouting comment="defconf: drop bad src IPs" src-address-list=bad_src_ipv4
add action=drop chain=prerouting comment="defconf: drop bad dst IPs" dst-address-list=bad_dst_ipv4
add action=drop chain=prerouting comment="defconf: drop non global from WAN" in-interface-list=WAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="defconf: drop forward to private ranges from WAN" dst-address-list=not_in_internet in-interface-list=WAN
#Remember that lan_subnets here should only include your public ranges not CGNAT#
add action=drop chain=prerouting comment="defconf: drop local if not from default IP range" in-interface-list=LAN src-address-list=!lan_subnets
add action=drop chain=prerouting comment="defconf: drop bad UDP" port=0 protocol=udp
add action=jump chain=prerouting comment="defconf: jump to TCP chain" jump-target=bad_tcp protocol=tcp
add action=jump chain=prerouting comment="defconf: jump to ICMP chain" jump-target=icmp protocol=icmp
add action=accept chain=prerouting comment="defconf: accept UDP traceroute" dst-address-type=local port=33434-33534 protocol=udp
add action=accept chain=prerouting comment="defconf: accept everything else from LAN" in-interface-list=LAN
add action=accept chain=prerouting comment="defconf: accept everything else from WAN" in-interface-list=WAN
add action=drop chain=prerouting comment="defconf: drop the rest"
add action=drop chain=bad_tcp comment="defconf: TCP port 0 drop" port=0 protocol=tcp
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=!fin,!syn,!rst,!ack
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,syn
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,rst
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,!ack
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=fin,urg
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=syn,rst
add action=drop chain=bad_tcp comment="defconf: TCP flag filter" protocol=tcp tcp-flags=rst,urg
add action=drop chain=icmp comment="Drop Source Quench (Deprecated)" icmp-options=4 protocol=icmp
add action=drop chain=icmp comment="Drop Alternate Host Address (Deprecated)" icmp-options=6 protocol=icmp
add action=drop chain=icmp comment="Drop Information Request (Deprecated)" icmp-options=15 protocol=icmp
add action=drop chain=icmp comment="Drop Information Reply (Deprecated)" icmp-options=16 protocol=icmp
add action=drop chain=icmp comment="Drop Address Mask Request (Deprecated)" icmp-options=17 protocol=icmp
add action=drop chain=icmp comment="Drop Address Mask Reply (Deprecated)" icmp-options=18 protocol=icmp
add action=drop chain=icmp comment="Drop Traceroute (Deprecated)" icmp-options=30 protocol=icmp
add action=drop chain=icmp comment="Drop Datagram Conversion Error (Deprecated)" icmp-options=31 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Host Redirect (Deprecated)" icmp-options=32 protocol=icmp
add action=drop chain=icmp comment="Drop IPv6 Where-Are-You (Deprecated)" icmp-options=33 protocol=icmp
add action=drop chain=icmp comment="Drop IPv6 I-Am-Here (Deprecated)" icmp-options=34 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Registration Request (Deprecated)" icmp-options=35 protocol=icmp
add action=drop chain=icmp comment="Drop Mobile Registration Reply (Deprecated)" icmp-options=36 protocol=icmp
add action=drop chain=icmp comment="Drop Domain Name Request (Deprecated)" icmp-options=37 protocol=icmp
add action=drop chain=icmp comment="Drop Domain Name Reply (Deprecated)" icmp-options=38 protocol=icmp
add action=drop chain=icmp comment="Drop SKIP (Deprecated)" icmp-options=39 protocol=icmp
#Filter rules for FastTracking stateless traffic#
/ip firewall mangle
add action=fasttrack-connection chain=prerouting
add action=fasttrack-connection chain=output

IPv6 Firewall

/ipv6 firewall address-list
#Enter the aggregated public prefixes originating from your AS that you use along with link-local fe80::/10#
#example#
add address=2405:a140::/32 comment="AS Prefix" list=lan_subnets
add address=fe80::/10 comment="Link-local" list=lan_subnets
#Add your BGP peers here, example below#
add address=2400:7000:1::/126 comment="Peering with Transit on VLAN100" list=bgp_peers
#Copy Paste all the following#
add address=::/3 comment="IPv6 invalids" list=not_in_internet
add address=4000::/3 comment="IPv6 invalids" list=not_in_internet
add address=6000::/3 comment="IPv6 invalids" list=not_in_internet
add address=8000::/3 comment="IPv6 invalids" list=not_in_internet
add address=a000::/3 comment="IPv6 invalids" list=not_in_internet
add address=c000::/3 comment="IPv6 invalids" list=not_in_internet
add address=e000::/4 comment="IPv6 invalids" list=not_in_internet
add address=f000::/5 comment="IPv6 invalids" list=not_in_internet
add address=f800::/6 comment="IPv6 invalids" list=not_in_internet
add address=fc00::/7 comment="IPv6 invalids" list=not_in_internet
add address=fe00::/9 comment="IPv6 invalids" list=not_in_internet
add address=fec0::/10 comment="IPv6 invalids" list=not_in_internet
add address=2001::/23 comment="IPv6 invalids" list=not_in_internet
add address=2001:2::/48 comment="IPv6 invalids" list=not_in_internet
add address=2001:10::/28 comment="IPv6 invalids" list=not_in_internet
add address=2001:db8::/32 comment="IPv6 invalids" list=not_in_internet
add address=2002::/16 comment="IPv6 invalids" list=not_in_internet
add address=3ffe::/16 comment="IPv6 invalids" list=not_in_internet
add address=2000::/3 list="global_unicast_prefix(es)"
add address=fe80::/10 list=allowed
add address=ff02::/16 comment="multicast" list=allowed
add address=fe80::/10 comment="defconf: RFC6890 Linked-Scoped Unicast" list=no_forward_ipv6
add address=ff00::/8 comment="defconf: multicast" list=no_forward_ipv6
add address=::1/128 comment="defconf: lo" list=bad_ipv6
add address=::ffff:0:0/96 comment="defconf: ipv4-mapped" list=bad_ipv6
add address=::/96 comment="defconf: ipv4 compat" list=bad_ipv6
add address=2001:db8::/32 comment="defconf: documentation" list=bad_ipv6
add address=2001:10::/28 comment="defconf: ORCHID" list=bad_ipv6
add address=2001::/23 comment="defconf: RFC6890" list=bad_ipv6
add address=::/128 comment="defconf: unspecified" list=bad_dst_ipv6
add address=::/128 comment="RAW Filtering" list=bad_src_ipv6
add address=ff00::/8 comment="RAW Filtering" list=bad_src_ipv6
/ipv6 firewall raw
#New rule to drop deprecated header type 0 & 40#
#Works only on ROS v7.4 onwards#
add action=drop chain=prerouting comment="Drop packets with extension header types 0, 43 at network border" headers=hop,route:contains
add action=accept chain=prerouting comment="defconf: RFC4291, section 2.7.1" dst-address=ff02::1:ff00:0/104 icmp-options=135:0-255 protocol=icmpv6 src-address=::/128
#Migrated this rule from the foward chain in BNG to drop these packets on the network edge#
add action=drop chain=prerouting comment="defconf: rfc4890 drop hop-limit=1" hop-limit=equal:1 in-interface-list=!LAN protocol=icmpv6
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=tcp
add action=drop chain=prerouting comment="drop port 25 to prevent spam" port=25 protocol=udp
#This is required for traffic whereby the SRC may be Link-local and the DST is GUA for BGP peers particuarly in IXPs#
add action=accept chain=prerouting comment="Accept all ICMPv6 traffic from BGP peers (Required for LL<>GUA packets)" icmp-options=!154:4-5 in-interface-list=WAN protocol=icmpv6 src-address-list=bgp_peers
add action=drop chain=prerouting comment="Drop invalids from WAN" dst-address-list="global_unicast_prefix(es)" in-interface-list=WAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="Drop forwarded invalids from WAN" dst-address-list=not_in_internet in-interface-list=WAN src-address-list="global_unicast_prefix(es)"
add action=drop chain=prerouting comment="Drop invalids from LAN" dst-address-list="global_unicast_prefix(es)" in-interface-list=LAN src-address-list=not_in_internet
add action=drop chain=prerouting comment="Drop forwarded invalids from LAN" dst-address-list=not_in_internet in-interface-list=LAN src-address-list=lan_subnets
add action=accept chain=prerouting comment="defconf: enable for transparent firewall" disabled=yes
#Drop anything from your network going towards the public internet if source addresses does not match your allocated pools#
add action=drop chain=prerouting comment="Drop spoofed traffic from LAN going towards Global Unicast" dst-address-list="global_unicast_prefix(es)" in-interface-list=LAN src-address-list=!lan_subnets
add action=drop chain=prerouting comment="defconf: drop bogon IP's" src-address-list=bad_ipv6
add action=drop chain=prerouting comment="defconf: drop bogon IP's" dst-address-list=bad_ipv6
add action=drop chain=prerouting comment="defconf: drop packets with bad src ipv6" src-address-list=bad_src_ipv6
add action=drop chain=prerouting comment="defconf: drop packets with bad dst ipv6" dst-address-list=bad_dst_ipv6
add action=accept chain=prerouting comment="defconf: accept local multicast scope" dst-address=ff02::/16
add action=drop chain=prerouting comment="defconf: drop other multicast destinations" dst-address=ff00::/8
add action=drop chain=prerouting comment="defconf: drop bad UDP" port=0 protocol=udp
add action=drop chain=prerouting comment="defconf: drop bad TCP" port=0 protocol=tcp
add action=accept chain=prerouting comment="defconf: accept UDP traceroute" dst-address-type=local port=33434-33534 protocol=udp
add action=jump chain=prerouting comment="defconf: jump to ICMP chain" jump-target=icmpv6 protocol=icmpv6
add action=accept chain=prerouting comment="defconf: accept everything else from WAN" in-interface-list=WAN
add action=accept chain=prerouting comment="defconf: accept everything else from LAN" in-interface-list=LAN
add action=drop chain=prerouting comment="defconf: drop the rest"
add action=drop chain=icmpv6 comment="Drop FMIPv6 HI + FMIPv6 HAck - Deprecated (RFC5568)" icmp-options=154:4-5 protocol=icmpv6

Firewall Explanation

I will keep this concise as stated earlier I suggest you study and understand how iptables function in general and study the packet flow to know what rule does what: With that being said, I will break it down into simpler points

  • I used this and this as the source for building the base for the firewall
    • MikroTik has ensured to conform to various RFCs and taken the efforts to not break any legitimate protocol/traffic
  • IPv6 firewall rules are trickier and more complex, but rest assured that the rules in this article do not break any protocol/standard nor do they impact customer’s end-to-end reachability
  • We are dropping spoofed traffic
    • The RAW rules drop anything coming from WAN that’s spoofed (RFC 6890 addresses)
    • The RAW rules drop anything coming from LAN that does not match your public prefixes/internal subnets (aka lan_subnets address list), meaning any spoofing traffic is dropped from exiting your network
    • Here’s an APNIC blog post detailing more on this subject
  • Next, we are dropping bad traffic such as TCP/UDP port 0 or bad TCP flags
  • The filter rules are pretty self-explanatory

Strange Anomalies

These are some strange behaviours that I could not explain. If you have further information, please reach out to me.

  1. NAT Leak
    • For example, let’s say we CGNAT 100.64.0.0/24 to customers with 103.176.189.0/25. Now, it’s common sense that anything WAN bound will have a source IP belonging to the /25 on the other end of the NAT. But nope, this isn’t always the case. What I have observed is, sometimes (meaning all the time if you have thousands of customers), the source IP would be the CGNAT subnet and the destination IP would be public, hence it “escapes” from the NAT engine.
    • This behaviour is NOT exclusive to MikroTik. I have observed the same thing on Ubuntu 20.04/Debian based distros, where the source IP is the NAT subnet and it escapes to the WAN interface with the destination IP being real-live public IPs
      • Solution: We just drop anything coming from the BNG that’s not public using the Edge Router, this is already taken care of in my configuration above, you just need to follow the instructions
    • I have been unable to find documentation or bug reports on this behaviour
  2. Netmap vs Src Nat
    • Publicly available documentation suggests simple definitions for both
      Src NAT = 1:Many binding
      Netmap = 1:1 binding
      • But for whatever reason, when using src NAT as the action for a public prefix, it keeps on changing the “NATted” public IP and hence the source IP on the WAN for the customers. This results in traffic breaking or triggering DDoS protection on sites like Cloudflare protected ones
      • And for whatever reason, even though Netmap is meant for 1:1, it works for 1:Many bindings and it does not result in the constant changing of source IP for the customers
    • I have not found any technical information on why these behaviours occur or why netmap even works in the first place for 1:Many bindings
Published inISPNetworking

69 Comments

  1. Rupam Kumar Sharma Rupam Kumar Sharma

    Such a detailed address of the issues that is so important while in implementation…

    Thanks

  2. Hi Daryll, well done!!!, it was the key for fix a problem I had been triyng to fix in a customer (ISP).
    I would like you could read my problem: https://forum.mikrotik.com/viewtopic.php?f=2&t=176378
    I repeat, That problem its fixed now, thanks of you!.

    I used the following command of your article (with a little modifications):
    /ip firewall nat
    add action=netmap chain=srcnat comment=”NETMAP PPPoE” out-interface=sfp1-Internet src-address-list=Clientes_NAT to-addresses=PUBLIC/32

    I don’t understand what is the difference using “srcnat action masquerade” (witch it wasn’t working) and using “Netmap” (witch for shure it runned perfectly fine at the first moment that I put it). I want to learn/understand why this way is working.

    Thanks a lot.
    Regards from Argentina

    • I hope this helps you.

      Just note that you are missing parameters in your rule, re-check from my article again.

    • For the NAT Leak issue, it mostly looks like speculation on that thread in my opinion. No factual/documented information yet, but I’ll keep an eye on it.

  3. Stefan Müller Stefan Müller

    unfortunately, that is the case. When the conclusion are a bit more solid I will email support anyway, may they resolve the mystery

  4. Good morning,
    could you share what was updated?
    thx 🙂

    • Well, it depends on when you last visited the site? 🙂

      Added:
      QoS/Bandwidth management suggestion, IPv6 for BNGs with PPPoE, IPv6 tweaks, IPv6 firewalling for both Edge and BNGs, slightly tweaked the IPv4 firewall rules for both, MTU section is finalised, CGNAT section is finalised. That’s about it I think.

  5. that is true :), it was end if July.
    I received an notification from WordPress yesterday that a new post was added.
    As there was not any, I guessed it was due to the update of the blog.
    I don’t know if notification are sent as well if the blog is updated

  6. Jeff Jeff

    First off, thank you so much for this! Im currently in the middle of a major network upgrade for our ISP and this post has been absolute gold. Ive learned a ton.

    Anyways, I have a quick question about the Firewall DDoS protection jump chain.

    add action=jump chain=forward comment=”Jump to DDoS detection” connection-state=new in-interface-list=WAN jump-target=detect-ddos

    Why is it only on the forward chain and not the input chain as well? The Mikrotik help page has it this way as well (forward chain only) but Ive been running it on the input and forward chains for a while now. I havent had any issues but im curious if there is there any particular reason why you do not have DDoS on input chain?

    • Input chain means the router itself:
      1. There’s nothing it will do when the DDoS traffic is hitting the router, the link will still choke. You need to have proper DDoS mitigation from/with your upstream

      2. I’ve tested it for fun on the input chain and it ended up breaking traffic that’s destined towards the router itself such as DNS lookups.

      Hence there’s no point in applying those rules in the input chain. They are in the forward chain in order to protect your downstream users.

  7. Jeff Jeff

    Hey Daryll, got another question for you.

    I noticed that when disabling connection tracking on the Edge Router, the Mikrotik puts an auto RAW rule with action=no track for the prerouting and output at the very bottom of the RAW rule table.

    But with the prerouting action = no track being at the bottom, no packets are hitting that rule. So I assume the firewall is still tracking connections as they pass the RAW rules above it.

    The Mikrotik will not allow me to drag that rule to the top, but I can drag my other RAW rules below it and I can then see packets hitting that prerouting rule. But when I reboot the router, it places those rules back at the bottom again?

    Are you seeing the same thing? Or are your prerouting action= no track rules stay at the top? Im on long-term v6.48.6 btw

    • I believe MikroTik does conn_track disable by the means of no tracking via the raw table, but more likely than not, what you’re seeing is a bug. As long as the packets are hitting your other raw rules, it isn’t an issue. Check the connection tracking tab, if there’s nothing there, then we know for sure, connection tracking is disabled. And as long as you’re seeing the expected routing performance throughput, we can also safely assume connection tracking was disabled.

  8. Jeff Jeff

    So I just tried deleting all the rules and making sure I disable the connection tracking before input RAW rule set again. The Mikrotik still places those no track rules at the end of the rule set after reboot again. Is this a bug or is this a misunderstanding on my part?

    • I just tested it out on my personal router running v7.1.1 as I wrote this, I’m unable to replicate what you saw after rebooting. I’m 100% sure it’s a bug.

      I’d suggest a netinstall once with the latest long-term and ensure /system routerboard firmware is also running the latest long-term (rebooted twice for it to work).

  9. Jeff Jeff

    No I definitely have connections there. The reason I rebooted my router was to see if those connections would disappear after moving the rules. Ill submit bug to Mikrotik support Thanks for the clarification

  10. Jeff Jeff

    Yea, its a 6.48.6 bug. Im seeing it on 4 of my CCR1036’s and I duplicated it on my home RB4011. Ill be submitting this to support. Thanks again!

  11. Jeff Jeff

    Question: If you disable connection tracking, is there any real difference between a Filter rule vs a RAW rule? I get wanting to keep all FW rules to an absolute minimum but if connection tracking is disabled, then from a performance perspective, it would be the same other than now a filter rule gives you more flexibilty in terms of being able to block at the input or forward chain where as a RAW rule is more generic.

    For example, if I still wanted to keep an ACL whitelist for input chain to router for security reasons, my rule would look like this.

    /ip firewall filter
    add action=drop chain=input comment=”Drop ALL except from TRUSTED” src-address-list=!TRUSTED

    • First, a caveat, the filter table cannot work without connection tracking, it is by definition stateful and hence needs state tracking.

      Edge/Border routers are not supposed to have connection tracking enabled. They are routers meant to route and forward traffic inter-AS as seamlessly as possible and not filter nor act as a firewall. The most we can do on the edge of a network is drop bogon traffic as we know for a fact, they should never enter a network, to begin with, and serve no functional purpose. (I will update the IPv6 raw table to drop some headers on the edge as per 2022 practices that serve have no functional purpose as well)

      If you enable conn_track on the edge, the performance impact will be visible downstream to the customers and your eBGP router will just randomly reboot once customers saturate the conn_track table. On top of that, you’d be creating a butterfly effect of ugly NAT keep-alive or just keep-alive traffic to now choke not only the BNGs but also the eBGP routers and impact performance even further.

      The raw table gives us the ability to still firewall without the performance impact of stateful tracking. In other words, it’s stateless firewalling.

      So if you want to ACL access to the router, you can still use RAW like:
      /ip firewall raw
      add action=drop chain=prerouting comment=”Drop ALL except from TRUSTED” src-address-list=!TRUSTED dst-address-list=[Your list containing IPs of the interface/router etc]

      However, although this is the most optimal option available on MikroTik, it is not the currently accepted standards or best practices, as the world moves to eBPF/XDP (while MikroTik is playing catch-up for the last 10 years):
      https://blog.cloudflare.com/how-to-drop-10-million-packets/

      You can also find in the above article some data that shows no-track (conn_track disabled) outperforms conn_track enabled.

      At the end of the day, if a high-performance Edge/Border router is what a network needs, it’s certainly something MikroTik cannot deliver at this point in time.

  12. JJT JJT

    Under your IPv6 raw rules, is there supposed to be a !lan_subnets drop rule for Edge Router? I dont see it.

    • Create it if I missed it on the rules. Drop anything that’s not the public prefixes allocated to your network.

      Edit: fe80::/10 should also be a member of lan_subnets to avoid breaking link-local.

  13. Jeff Jeff

    Thats the answer I was looking for. Thank you!

    • I’ve tweaked the !lan raw IPv6 rules. Now it makes more sense and removes the need for forward chain rule on BNG and simplifies it for the edge router.

      However, should IANA ever make changes to the IPv6 blocks, you’d need to update this manually.

      /ip fi raw
      add action=drop chain=prerouting comment="Drop spoofed traffic from LAN going towards Global Unicast" dst-address=2000::/3 in-interface-list=LAN src-address-list=!lan_subnets

  14. Jeff Jeff

    Ahhhh….so to reinforce your point….and regarding that earlier bug we found, I noticed that those FW Raw ‘no track’ rules disappear inside the Raw table when I disabled my lone FW input rule with connection tracking disabled. All makes sense now.

  15. Jeff Jeff

    I see you have updates but its difficult to know what they are and where. Would it be possible to give some kind of changelog and/or highlight improvements/changes you have made?

    • I would need a systematic approach or maybe some WP plugin that can do the job. Do you know any? Writing a manual changelog for documentation this big is too much of a tedious task really.

  16. Riktam Basak Riktam Basak

    can you suggest a budget RADIUS sever other than Radius Manager

  17. Jeff Jeff

    Yea, I hear ya, just throwing the idea out there. The bold explanations do help quite a bit. The BGP Optimization section is a nice addition. I learn more each time I go through it.

    • Help spread the word, and share this article with other network operators and engineers, it benefits the ecosystem if everyone deployed best practices end-to-end.

      If you know somebody who can convert this guide into a Cisco and Juniper equivalent, that’d be great too.

  18. Jeff Jeff

    Absolutely! Myself and another poster (who introduced me to this blog) on r/mikrotik on Reddit as well as the Mikrotik forums take every chance we get to share this with others. Keep up the excellent work. Its very much appreciated!

    BTW, you have some minor typos you might wanna fix when you get a chance:

    The address list “global_unicast_prefix(es)” in your IPv6 raw rule doesnt paste properly in terminal

    add action=drop chain=prerouting comment=”Drop invalids from WAN” dst-address-list=global_unicast_prefix(es) in-interface-list=WAN src-address-list=not_in_internet

    Had to drop the parenthesis inside the address list name to get it to paste in terminally correctly like this “global_unicast_prefixes”

    • MikroTik bug. I think I need to add quotations. I’ll fix it.

  19. Jeff Jeff

    And one last thing:

    I see you’re doing away with the IPv4 ICMP raw filtering. Do you no longer see a benefit to filtering by ICMP types? Also, I do not see any further ICMP accept rules. Is that somehow accepted in the implied “accept everything else from LAN/WAN” rules?

    • Yes. The kernel by default rates limit ICMP/ICMPv6 anyways and hence those rules are redundant and a waste of CPU. All ICMP/ICMPv6 is accepted, let the kernel handle rate limiting.

      Don’t miss the new RFC6890 section 🙂

  20. Jeff Jeff

    Yea I see the RFC6890 blackhole section, I think that part is awesome. I was doing that with my public subnet but using it for the RFC6890 is an excellent idea.

    In regards to ICMP, I get the rate limits but what about the allowed ICMP types? Arent there some deprecated and malicious ICMP types that should not be allowed? Or I guess in this case, you only allow specific ICMP types?

    • Yeah, I will edit the RFC6890 section and say “Inject these rules into any router/L3 switch that has a routing table” – Makes perfect sense if you really think about it.

      There are deprecated ICMP types, yes, but I haven’t seen any hard evidence of them doing any damage if they aren’t blocked, and even if somehow they could, again they are rate limited. So why waste CPU power anyway. As long as everything else is properly configured, the network should be secure.

      You’d need ICMP filtering maybe for DoD/DARPA stuff or something, but eh, not at ISP level in my opinion. I’ve removed all ICMP filtering in my own network and my home routers as well.

  21. Jeff Jeff

    Previously, I couldnt believe how much garbage this rule was collecting.

    #This rule should be redudant as we are now routing RFC6890 to blackhole directly and hence I am commenting it out#
    #add action=drop chain=forward comment=”Drop tries to reach not public addresses from LAN” dst-address-list=not_in_internet in-interface-list=LAN out-interface-list=WAN#

    It was every single router regardless of type of network, there was always tons of garbage. And I couldnt believe there was that much of it, everywhere. My guess is random misconfigurations and/or crap device code.

    I then implemented the blackhole routes and WOW….its mostly all gone now.

    And I say “mostly” because there’s one small caveat I noticed. One of my sites has a failover which is a double NAT through another provider. With the failover WAN IP being in the 192.168.x.x subnet, bandit packets are still hitting the above rule. Which makes sense if you think about it. Its a minor issue, not that big a deal, especially on a site of its kind. But thought it was worth mentioning.

    • It’s not just misconfiguration. For unknown reasons my personal Windows 11, Debian Based, iOS, macOS devices all originate such packets. I never found an explanation.

      That’s expected behaviour in your specific site:
      More specific route is always preferred over less specific route. I’d leave it be, not much harm that could happen there.

  22. Anav Anav

    Hi Daryll. In terms of netmap. The way I understand it in my laymens terms is that if one has a subnet of fixed public IPs being netmapped to a larger group of private IPs, what happens is what I call a slice or jump pattern of assignment. Initially I thought Okay for a 256 block of public iPs, the first private 256 private IPs are assigned tot he first public IP. Wrong, Its the 1,257,513 etc private IPs that get assigned to the public IP. So its fair to say that the same block of private IPs (via slices or jumps) always gets the same public IP. Hope that helps.

    • I already knew the netmaps ensure 1:1 mapping, i.e 1.1.1.1 netmapped to 100.64.0.7 will persistently stay the same until reboot or similar. Which is perfect for P2P/STUN/ICE/WebRTC/TURN. But the question is: Why does netmap public/24 works with private/8 for example? The Linux Man page suggests it shouldn’t.

      Edit: Wait a minute, this is “Anav” from the MikroTik forums, ain’t it? I’m leaving just going to leave this here.

  23. Johan Johan

    Hello
    Thats a great work
    I have a question, what is the real purpose of loose tcp tracking?
    Is it other tracking with the original connection track ?

    • Loose tracking = yes means don’t pick up already established connections twice (or more). Saving CPU and resources.

  24. Jeff Jeff

    Question regarding IPv6:

    The biggest reason I have yet to deploy it yet is due to Mikrotik’s limitation in being able to simultaneously queue both IPv4 and IPv6.

    How are you doing it? Is it easier since most ISP’s in India use PPPOE?

    I just ran across the below from one of the big Mikrotik consultants using RADIUS via DHCP.

    https://stubarea51.net/2022/03/30/webinar-deploying-ipv6-for-wisps-and-fisps/

    • With PPPoE, it is easier. But I think you would need to give persistent IPv6 PD assignments (which you should be doing anyways), and then Queue on a per prefix basis where a customer is behind each of them.

      But if you’re going the DHCPv6 route – With Tik, there’s a problem. It can only hand out PD, but not addresses. Which means the customer will receive a /56 or shorter prefixes for LAN, but their WAN (Link prefix) will be null, unless you use a /64 on a per interface basis with SLAAC and configure the CPE to pick it up via SLAAC for WAN. But even then you’ll have a problem. SLAAC in Tik is not managed via RADIUS – So you won’t know which customer was assigned which address and so on.

      I’d suggest talking with stubarea51 consulting firm and let me know if you find a solution. I’ll add it to my guide.

      Matter of fact if you’re already doing DHCPv4, let me know the whole procedure (via emails), I’ll add that to my guide too – Like how did you set up DHCP Option 82, MAC binding, security. Did you use VRFs maybe? To repeat RFC1918 for different VLANs etc?

  25. Steven Steven

    Hi Daryll
    Thanks for this article
    I have an idea about routing loop, What about add RFC6890 in routing rules with lookup only table, ip rout here only take this table to blackhole ?

    • The whole point is to route less specifics to blackhole. Which is applicable to both RFC680 blocks and also public pools.

      What is lookup supposed to serve? I don’t see the need to possibly (if I understood you) create a blackhole only table?

  26. Steven Steven

    Here is an example

    /routing table
    add disabled=no fib name=Blackhole
    /routing rule
    add action=lookup-only-in-table disabled=no dst-address=192.168.0.0/16 table=Blackhole
    add action=lookup-only-in-table disabled=no dst-address=10.0.0.0/8 table=Blackhole
    add action=lookup-only-in-table disabled=no dst-address=172.16.0.0/12 table=Blackhole
    add action=lookup-only-in-table disabled=no dst-address=255.255.255.255/32 interface=BNG table=Blackhole
    /ip route
    add blackhole comment=Blackhole disabled=no distance=1 dst-address=0.0.0.0/0 gateway=”” pref-src=”” routing-table=Blackhole scope=30 suppress-hw-offload=no \
    target-scope=10

    • I don’t see any reason to use a separate table. If anything this would probably increase CPU usage as now it has to manually lookup for each subnet.

  27. Johannes Johannes

    Hi Daryll,

    First of all: thank you so much for this extensive, well documented piece of work!
    I’m used to building networks with cisco and juniper equipment but fell in love with mikrotik stuff, built my own little ipv6 only AS with it and am now in the process of optimizing basic stuff. That’s when I arrived here – and rethought my firewalling from the ground up thanks to your great work! So thanks again!

    I do have one question though: Running ROS7 and changed from stateful firewall filter rules to modified versions of your raw filters. While this did the trick to get rid of IPv4 connection tracking, I still have “living” entries in ipv6/firewall/connections – without any filter rules. And there is no ipv6/firewall/connection/tracking submenu to disable it manually. Am I missing something or is it not possible to turn off connection tracking for IPv6?

    Second, more general question: you seem to have updated this blog entry on June 29th. Is there some kind of changelog I could refer to for your changes? 😉 Otherwise I might have to download this entry to a local git repo to stay up to date 🙂

    Thanks again for your terrific work and best regards from Germany,

    Johannes

    • IPv6 connection tracking is automatically disabled on both ROSv6 and ROSv7 if there is nothing in /ipv6 firewall filter or mangle or nat(66). If it’s still tracking, then I suggest you contact MikroTik support, as that sounds like a bug.

      Minor typos were fixed on June 29th. I cannot keep changelogs of such a large documentation, but if you know of any WordPress plugin that can automate the job of a change log, then please by all means, do share, I’ll make use of it.

      I’m glad my guide was of use to your network. Furthermore, I hope you follow all the BCPs and BCOPs for your network to ensure a fully conforming and homogenous network!

  28. Mehhmet van der Loyer Mehhmet van der Loyer

    Are you positive you didn’t get this the wrong way round? Loose tracking = off, i.e. strict tracking enabled, will cause already established connections not picked up?
    Can you please expand on this?
    I would expect this setting to correspond to the netfilter loose tracking mode which has less sanity checks around NEW state packets, penultimate FIN packets, etc.

    • Loose tracking enabled = “non-strict” tracking. Lose tracking disabled = strict tracking. I am positive.

      You find elaboration here.

  29. while this was very good, sfq does not fix bufferbloat. fq_codel/cake do. the test you used measures fq not queue depth (aqm). try a packet capture on that test.

    • Hi Dave

      A few points to note and consider:
      1. This article dates back to 2021 back on RouterOS v6 when fq_codel/CAKE did not exist on MikroTik
      2. I did not claim that SFQ “fixes” bufferbloat, only that it reduced
      3. I do not have the time to test QoS any further, if you have hard data/configuration used on a BNG device serving a minimum of at least 200 users, feel free to share the config, screenshots, and guidelines. I will consider adding that to the article and credit such a section to your name.

      Furthermore, I’m not a specialist in QoS, so I’m not sure what you mean by “fq not queue depth (aqm)”, but at the time of testing, the device had around 1k customers with at least minimum 1Gig traffic going in/out.

  30. There was a lot of uptake of fq_codel after it arrived in mikrotik. Very long thread over here about sfq vs cake in particular:

    https://forum.mikrotik.com/viewtopic.php?t=179307#p885613

    As for a guide, I will try to find someone deploying. Basically on cpe we are seeing cake ack-filter bandwidth XMbit diffserv4 on a simple queue on the up, and I don’t really know what is used on the bng side (seeing preseem/libreqos/cambium/

    Apologies for misconstruing your statement. FQ is what the tests measure you used, queue depth (managed via AQM), is bufferbloat. FQ bypasses the queue-building flows.

    • Yes, I know MikroTik added fq_codel/CAKE etc in RouterOS v7, but back in 2021, RouterOS v7 was not production-ready and hence was not tested. I lack the time to implement it on my own personal network as well, but hopefully, I’ll get to it eventually.

      Most network operators in APAC that are using MikroTik a lot for BNGs, just use whatever MikroTik defaults come with for the queuing algorithm, so SFQ, PFIFO etc. They do not spend in-depth testing and research into what method works best for their network, at least on BNG level – This was the main reason for SFQ as I observed it “just works” even if not perfect. There’s a psychological factor at play, details here.

      Ah, you mean, the “bufferbloat” test from DSLReports? Well, I can update that with a more proper web-based bufferbloat test site, sure, but I’ll need the “guide” though (config, data, screenshots, explanation to the reader etc).

      Feel free to email me directly for further discussion or message via my Telegram (both are on the left sidebar at the top).

  31. Mr_Black Mr_Black

    Hi Daryll
    i had an idea about nat.
    Why you add not_in_internet at dst-address
    is not enough to add the same src-address in dst like this example ?
    /ip firewall nat add
    action=netmap chain=srcnat comment=”CGNAT rule” dst-address-list=!cgnat_subnets ipsec-policy=out,none out-interface-list=WAN src-address-list=cgnat_subnetsto-addresses=103.176.189.0/25

    • not_in_internet address list contains all the RFC6890 subnets including the CGNAT subnet range in aggregated format.

      cgnat_subnets only contains either supernets or just the /10 subnet.

      That is why we need not_in_internet.

  32. Jeff Jeff

    In your Edge Router Firewall section, I notice you’re now fasttracking stateless traffic.

    #Filter rules for FastTracking stateless traffic#
    /ip firewall filter
    add action=fasttrack-connection chain=input
    add action=fasttrack-connection chain=forward
    add action=fasttrack-connection chain=output

    But since connection tracking is disabled on Edge Routers, isnt all traffic in essence Fast Tracked by default? It sounds kind of redundant to me but im obviously missing something.

    • FastTrack never works “by default”, FastPath does under limited conditions, for which if you’re using firewall address lists, FastPath is out the window as well.

      And hence for this reason, we manually “FastTrack” the traffic through the rules above. But I recently found it’s logically more efficient to do this in the mangle table and cleaner. I’ll update the article itself, but here’s the snippet:
      /ip firewall mangle
      add action=fasttrack-connection chain=prerouting
      add action=fasttrack-connection chain=output

  33. Singu Singu

    Hi Daryll.. I’ve followed your guide in MTU about configuring RFC 4638. I’ve set the MTU and MRU to 1500 in the PPPoE Server of Mikrotik, but I can only achieve a maximum of 1492 MTU. I’ve double checked already the MTU of OLT and Mikrotik with the OLT MTU set to maximum value of 2000 and Mikrotik router MTU of 1500 and L2 MTU of 2000. I’m not sure what’s going on as for why is it not working. The ONUs are Huawei HG8145V5 or maybe the modem is not capable of RFC 4638

    • The BNG L3 MTU should be 2000 on the physical port to match OLT L2/L3 MTU 2000. The customer MikroTik router L2 MTU maxed out but L3 will be 2000 to match OLT. ONUs should be bridge mode, but some ONUs even in bridge mode don’t support baby jumbo frames nor RFC4638. Also if you have switches between BNG and OLT, they all need proper MTU config. Follow MTU section sample.

      Migrate to DHCPv4/v6 to avoid MTU problems.

Leave a Reply

Your email address will not be published. Required fields are marked *

© Daryll Swer and daryllswer.com, 2023. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Daryll Swer and daryllswer.com with appropriate and specific direction to the original content.