This post was co-authored by Verisign Distinguished Engineer Mike Hollyman and Verisign Director – Engineering Hasan Siddique. It is based on a lightning talk they gave at NANOG 87 in February 2023, the slides from which are available on the NANOG website.
At Verisign, we believe that continuous improvements to the safety and security of the global routing system are critical for the reliability of the Internet. As such, we’ve recently begun implementing Resource Public Key Infrastructure (RPKI) within our technology ecosystem as a step toward building a more secure routing system. In this post, we share our ongoing journey toward RPKI adoption and the lessons we’ve learned as an operator of critical Internet infrastructure.
While RPKI is not a silver bullet for securing Internet routing, the practical adoption of RPKI can deliver significant benefits. This will be a journey of deliberate, measured, and incremental steps toward a larger goal, but we believe the end result will be more than worth it.
Why RPKI and Why Now?
Under the Border Gateway Protocol (BGP) – the Internet’s de-facto inter-domain routing protocol for the last three decades – local routing policies decide where and how Internet traffic flows. But each network independently applies its own policies on what actions it takes, if any, with data that connects through its network. For years, ‘routing by rumor’ served the Internet well. However, our growing dependence upon the global Internet for sensitive and critical communications means that Internet infrastructure merits a more robust approach for protecting routing information. Preventing route leaks, mis-originations, and hijacks is a first step.
Verisign was one of the first organizations to join the Mutually Agreed Norms for Routing Security (MANRS) Network Operator Program in 2017. Ever since the establishment of the program, facilitating routing information – via an Internet Routing Registry (IRR) or RPKI – has been one of the key ‘actions’ of the MANRS program. Verisign has always been fully supportive of MANRS and its efforts to promote a culture of collective responsibility, collaboration, and coordination among network peers in the global Internet routing system.
Just as RPKI creates new protections, it also brings new challenges. Mindful of those challenges, but committed to our mission of upholding the security, stability, and resiliency of the Internet, Verisign is heading toward RPKI adoption.
Adopting RPKI ROV and External Dependencies
In his March 2022 blog post ‘Routing Without Rumor: Securing the Internet’s Routing System’, Verisign EVP & CSO, Danny McPherson, discussed how “RPKI creates new external and third-party dependencies that, as adoption continues, ultimately replace the traditionally autonomous operation of the routing system with a more centralized model. If too tightly coupled to the routing system, these dependencies may impact the robustness and resilience of the Internet itself.” Danny’s post also reviewed the importance of securing the global Internet BGP routing system, including using RPKI to help overcome the hurdles that BGP’s implicit trust model presents.
RPKI Route Origin Validation (ROV) is one critical step forward in securing the global BGP system to prevent mis-originations and errors from propagating invalid routing information worldwide. RPKI ROV helps move the needle towards a safer Internet. However, as Danny pointed out, this comes at the expense of creating a new external dependency within the operational path of Verisign’s critical Domain Name System (DNS) services.
RPKI Speed Bumps
At NANOG 87, we shared our concerns on how systemic and circular dependencies must be acknowledged and mitigated, to the extent possible. The following are some concerns and potential risks related to RPKI:
- RPKI has yet to reach the operational maturity of related, established routing protocols, such as BGP. BGP has been around for over 30 years, but comparatively, RPKI has been growing in the Internet Engineering Task Force (IETF) Secure Inter-Domain Routing Operations (SIDROPS) working group for only 12 years. Currently, RPKI Unique Prefix-Origin Pairs are seen for just over 40% of the global routing prefixes, and much of that growth has occurred only in the last four years. Additionally, as the RPKI system gains support, we see how it occasionally fails due to a lack of maturity. The good news is that the IETF is actively engaged in making improvements to the system, and it’s rewarding to see the progress being made.
- Every organization deploying RPKI needs to understand the circular dependencies that may arise. For example, publishing a Route Origin Authorization (ROA) in the RPKI system requires the DNS. Additionally, there are over 20 publishing points in the RPKI system today with fully qualified domain names (FQDNs) in the .com and .net top-level domains (TLDs). All five Regional Internet Registries (RIRs) use the .net TLD for their RPKI infrastructure.
- Adopting RPKI means taking on additional, complex responsibilities. Organizations that participate in RPKI inherit additional operational tasks for testing, publishing, and alerting of the RPKI system and ultimately operating net-new infrastructure; however, these 24/7 services are critical when it comes to supporting a system that relates to routing stability.
- Ample resources are required to adequately monitor RPKI deployment. Real-time monitoring should be considered a basic requirement for both internal and external RPKI infrastructure. As such, organizations must allocate technical engineering resources and support services to meet this need.
Additional considerations include:
- the shared fate dependency that is, when all prefixes are signed with ROAs
- long-term engineering support
- the operational integration of RPKI systems
- operational experience of the RIRs as they now run critical infrastructure to support RPKI
- overclaiming with the RIR certification authorities
- lack of transparency for operator ROV policies
- inconsistency between open-source RPKI validator development efforts
- the future scale of RPKI
These items require careful consideration before implementing RPKI, not afterward.
To better manage potential risks in our journey towards RPKI adoption, we established ‘day zero’ requirements. These included firm conditions that must be met before any further testing could occur, including monitoring data across multiple protocols, coupled with automated ROA/IRR provisioning.
The deliberate decision to take a measured approach has proved rewarding, leaving us better positioned to manage and maintain our data and critical RPKI systems.
Investing in engineering cycles in building robust monitoring and automation has increased our awareness of trends and outages based on global and local observability. As a result, operations and support teams benefit from live training on how to respond to RPKI-related events. This has helped us improve operational readiness in response to incidents. Additionally, automation reduces the risk of human error and, when coupled with monitoring, introduces stronger guardrails throughout the provisioning process.
Balancing Our Mission with Adopting New Technology
Verisign’s core mission is to enable the world to connect online with reliability and confidence, anytime, anywhere. This means that as we adopt RPKI, we must adhere to strict design principles that won’t risk sacrificing the integrity and availability of DNS data.
Our path to RPKI adoption is just one example of how we continuously strive for improvement and implement new technology, all while ensuring we protect Verisign’s critical DNS services.
While there are obstacles ahead of us, at Verisign, we strongly advocate for consistent, focused discipline and continuous improvement. This means our course is set – we are firmly moving toward RPKI adoption.
Our goal is to improve Internet routing security programs through efforts such as technology implementation, industry engagement, standards development, open-source contributions, funding, and the identification of shared risks that need to be understood and managed appropriately.
Implementing RPKI at your own organization will require broad investment in your people, processes, and technology stack. At Verisign specifically, we have assigned resources to perform research, increased budgets, completed various risk management tasks, and allocated significant time to development and engineering cycles. While RPKI itself does not address all security issues, there are incremental steps we can collectively take toward building a more resilient Internet routing security paradigm.
As stewards of the Internet, we are implementing RPKI as the next step in strengthening the security of Internet routing information. We look forward to sharing updates on our progress.
Mike Hollyman is a Distinguished Engineer at Verisign.
Adapted from the original post which appeared on the Verisign Blog.