|Summary||Description of the techniques used to pinpoint delay and forwarding anomalies in traceroute results.|
Understanding data plane health is essential to improving Internet reliability and usability. For instance, detecting disruptions in peer and provider networks can identify repairable connectivity problems. Currently this task is time consuming as it involves a fair amount of manual observation, as an operator has poor visibility beyond their network’s border. In this paper we leverage existing public RIPE Atlas measurement data to monitor and analyze network conditions; creating no new measurements. We demonstrate a set of complementary methods to detect network disruptions using traceroute measurements, and to report problems in near real time. A novel method of detecting changes in delay is used to identify congested links, and a packet forwarding model is employed to predict traffic paths and to identify faulty routers and links in cases of packet loss. In addition, aggregating results from each method allows us to easily monitor a network and correlate related reports of significant network disruptions, reducing uninteresting alarms. Our contributions consist of a statistical approach to providing robust
estimation of Internet delays and the study of hundreds of thousands link delays. We present three cases demonstrating that the proposed methods detect real disruptions and
provide valuable insights, as well as surprising findings, on the location and impact of the identified events.