Internet Measurement Data Catalog
DatCat, developed and run by CAIDA, is an Internet Measurement Data Catalog (IMDC), a searchable registry of information about network measurement datasets. It serves the global network research community by allowing anyone to find, annotate, and cite data contributed by others, and soon by allowing anyone to contribute new data.
The goals of DatCat IMDC are:
Finding data to use in network research has historically been difficult. By serving as a shared global resource where anyone can find the data needed for network analysis, DatCat mitigates a significant barrier to research.
Instead of relying on the data contributor alone to document the data, DatCat allows any researcher to annotate datasets with problems, features, or missing information they discover in the data, thereby increasing the utility of the datasets.
Reproducibility of results is a cornerstone of good science, but requires that the researcher's data is available to others. Similarly, to get the most meaningful comparison of analysis methodologies and algorithms, researchers must test them against the same data. By putting their data in DatCat or using data already in DatCat, and then citing the IMDC Handle in their published results, researchers can make it easier for others to obtain their data and validate their results or perform alternate analyses on the same data.
Note that IMDC does not store the data (or tools) itself, but only metadata, that is, descriptions of the data and instructions for obtaining it. The storage of the data itself remains in the hands of the contributor. As such, it may or may not be freely available; it might, for example, reside on a password-protected server, or require asking the owner of the data. IMDC does not dictate the terms of availability of the data, it just helps you with the first step of finding the data.
DatCat development was supported by grant ANI-0137121 from the Advanced Networking Infrastructure program of the National Science Foundation.
See also the IMDC white paper.
Information in IMDC is organized as Objects, each of which describes a real-world object or idea. For the purposes of finding and obtaining data, the most important types of objects are:
The following typographic and styling conventions are used throughout DatCat. Their exact appearance will depend on your web browser.
| example | meaning | linked to |
|---|---|---|
| imdc | The name of an object in the catalog | The object's detail page |
| passive data | A description of a set of objects | Search results matching the description |
| Object Types | Help topic | The page for the help topic |
| info@datcat.org | Email address | A mailto:link |
| simple search | Other internal reference | An internal page |
| CAIDA | An external reference | An external site |
Object names within IMDC are not necessarily unique. Some objects (e.g.,
Locations) do not even have names. To uniquely identify objects,
IMDC assigns each one a persistent Handle, for example
.
IMDC Handles are designed to last forever, making them ideal
for use in citing data in a research paper.
Appending imdc.datcat.org/data/1-0123-K=foobar.pcap
to the beginning of a handle will
produce the URL of the object's detail page.
http://
Informally, the syntax of an IMDC Handle is:
| meaning: | instanceName | / | objectType | / | version | - | objectID | - | checksum | = | objectName |
|---|---|---|---|---|---|---|---|---|---|---|---|
| example: | imdc.datcat.org | / | data | / | 1 | - | 0123 | - | K | = | foobar.pcap |
imdc.datcat.org. In the future, there may be other instances of IMDC systems with different names. Within an IMDC instance, the instanceName is sometimes omitted, in which case the current instance is implied.
data,
package,
location, etc.
1. If changes to the handle syntax are made in the future, handles generated after that point may contain a new version number, but all handles generated previously will remain valid.
0 and 1
are the digits zero and one, not letters.
Any logged-in user can annotate an IMDC object with a "note" containing additional information. Additionally, any user with permission to contribute can use the XML submission interface to add other types of annotations to IMDC objects. An annotation consists of a key (e.g. "note") and a value; the key describes how to interpret the value. A user does not need to own an object to annotate it, although there are restrictions on certain types of annotations. There is a set of predefined keys in a standard namespace, and contributors can also define new keys in their own namespace. Thus annotations allow great flexibility in the type of information that can be associated with an object, not limited by the built-in fields designed into IMDC objects or ownership of the objects.
Although creating an account is not a prerequisite for browsing DatCat, we strongly encourage it. Creating an account is easy and free, and allows you to annotate catalog entries and customize your interaction with DatCat.
You must have an account to contribute information to IMDC.
IMDC is designed to work with any browser that supports standard HTML. IMDC does not require graphics, cookies, javascript, or CSS, but it will take advantage of those features if available to make the interface more convenient, faster, and generally more pleasant, so we recommend using a browser with those features enabled. All infrastructural IMDC text is in ASCII English, although user-contributed text may contain other languages and character sets. If you find a problem, let us know.