DatCat℠Internet Measurement Data Catalog

|  Home |  Browse |  Search |  Help |
You are not logged in. | Log in | Create an Account
Note: for a better experience, enable javascript and stylesheets in your browser.
Contact us
Path to data: Browse > Select Data > Select Packages > Select Locations >

Collection: CAIDA Code-Red Worm Dataset
non-sensitive summaries on worm spread

Jump to: Description | Annotations | Citation | Record Details

Collection Contents

Collection Details

SummaryInformation useful for studying the spread of the Code-Red worms, as observed by the UCSD Network Telescope in 2001, including infection start and end times, infection durations, latitude, longitude, Autonomous System (AS) and country locations for infected computers. The dataset consists of 2 parts: a July dataset, which covers July 19-20 and an August dataset, which covers July 30 to August 19. Possible uses include modeling and visualization of worm propagation. Statistics: 359,104 infected IP addresses in the July dataset and 4,478,473 infected IP addresses in the August dataset.
MotivationTo provide a set of data useful for studying the Code-Red worms. The data does not contain sensitive information and therefore can be made publicly available.
Data Start Time2001-07-19 00:01:12.242 UTC (+0000)
Data End Time2001-08-19 06:00:01.354 UTC (+0000)
Data Duration31 days 05:58:49.112 (2699929.112 s)
CreatorsCAIDA Network Telescope Project - Code-Red
Primary contact(none)
Keywordsbackground radiation, blackhole address space, CAIDA, Code-Red, Code-Redv2, CodeRed, CodeRedII, CodeRedv2, darknet, Internet worm, network telescope, passive, security, summary, worm
Used in publications(none)
Member of collections(none)
Description
This dataset contains information useful for studying the spread of the Code-Red version 2 and CodeRedII worms. The dataset consists of a publicly available set of files that contain summarized information that does not individually identify infected computers.

The first Code-Red worm (CRv1) began to infect hosts running unpatched versions of Microsoft's IIS webserver on July 12th, 2001. This version of the worm used a static seed for its random number generator. Around 10:00 UTC July 19th, a random seed variant (CRv2) appeared and spread. On August 4th, a new worm began to infect hosts exploiting the same vulnerability as the original Code-Red worm. Although the new worm shared no code with the first, it contained in its source code the string "CodeRedII" and was thus named CodeRed II.

Caveats that apply to this data:

  • The .ida vulnerability utilized by the Code-Red worms was exploited via TCP connections to port 80. Because the UCSD Network Telescope did not respond to connection attempts, it received no payload that could be used to positively identify worm packets. All TCP SYN packets to port 80 received are included in these summaries, including non-worm traffic.
  • The DHCP Effect significantly impacts this dataset, particularly after the first 24 hours of each cycle of worm spread. Computers with changing IP addresses cause an order of magnitude difference between the number of IP addresses active in any 2 hour period and the number of IP addresses active in a week. This dataset does not include IP addresses, so keep in mind that each start/end time or duration does not necessarily uniquely identify an infected computer. It identifies only a newly active IP address, with no information about whether that IP address represents a computer previously known to be infected.
CAIDA Code-Red data is divided into a July and August dataset that differ in the sources of data used and which worms they contain.

Data included in these datasets:

  • Start and end time distributions: the cumulative distribution function (cdf) of the absolute/relative start/end times of each infected computer.
  • Duration distribution: the cumulative distribution function (cdf) of the infection duration of computers.
  • Country distribution: country location of infected computers, as determined by Digital Envoy's NetAcuity service.
  • A summary table with start and end time, top-level domain, country, latitude, longitude, and AS number, for each monitored IP address. The table of the July dataset also has AS names.

Data Use Restrictions

The data cannot be redistributed. Every six months, a summary of research findings must be reported back to CAIDA. A copy of all publications (including presentations) must be sent to CAIDA. The following citation must be used in publications:
  • Documents (including web pages and papers) must include this citation:
    The CAIDA Dataset on the Code-Red Worms - July and August 2001,
    David Moore and Colleen Shannon,
    http://www.caida.org/data/passive/codered_worms_dataset.xml.
  • Presentations: All users who create a publicly available presentation using data from this dataset must provide CAIDA with a copy of the publication and must use the full name of the dataset in the presentation:
    The CAIDA Dataset on the Code-Red Worms
For further restrictions and recommendations, see the Code-Red Worms dataset page.
For more informationhttp://www.caida.org/data/passive/codered_worms_dataset.xml

Annotations

Citation

Please use the following BibTeX citation to cite this collection. Some parts are optional or may need to be edited. To use the \url{...} command for nice URL formatting, you must call \usepackage{url} in the LaTeX preamble.

@MISC{/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset,
  title = "{CAIDA Code-Red Worm Dataset (collection)}",
  author = "{CAIDA Network Telescope Project - Code-Red}",
  note = "\url{http://imdc.datcat.org/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset} (accessed on 2010‑02‑09)",
  abstract = "non-sensitive summaries on worm spread"
}

Record Details

Handleimdc.datcat.org/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset
ContributorCAIDA Automated Data Contributor
Contributed2006-05-31 20:35:36.915 UTC (+0000)
Last Modified2006-05-31 20:36:24.272 UTC (+0000)

Software version 1.7.26
Page generated at 2010‑02‑09 04:04:20 UTC
Request processed in 0.076 seconds
CAIDA Cooperative Association for Internet Data Analysis