DatCat℠Internet Measurement Data Catalog

Log in | Create an Account
Search for in
Enter one or more word stems or quoted phrases. Wildcards “*” and “?” are allowed.
Contact us

Overview

What is DatCat?

DatCat, developed and run by CAIDA, is an Internet Measurement Data Catalog (IMDC), a searchable registry of information about network measurement datasets. It serves the global network research community by allowing anyone to find, annotate, and cite data contributed by others, and allowing anyone to contribute new data collections.

The goals of DatCat IMDC are:

  • To facilitate searching for and sharing of data among researchers

    Finding data to use in network research has historically been difficult. By serving as a shared global resource where anyone can find the data needed for network analysis, DatCat mitigates a significant barrier to research.

  • To enhance documentation of datasets via a public annotation system

    Instead of relying on the data contributor alone to document the data, DatCat allows any researcher to annotate datasets with problems, features, or missing information they discover in the data, thereby increasing the utility of the datasets.

  • To advance network science by promoting reproducible research

    Reproducibility of results is a cornerstone of good science, but requires that the researcher's data is available to others. Similarly, to get the most meaningful comparison of analysis methodologies and algorithms, researchers must test them against the same data. By putting their data in DatCat or using data already in DatCat, and then citing the IMDC Handle in their published results, researchers can make it easier for others to obtain their data and validate their results or perform alternate analyses on the same data.

Note that IMDC does not store the data (or tools) itself, but only metadata, that is, descriptions of the data and instructions for obtaining it. The storage of the data itself remains in the hands of the contributor. As such, it may or may not be freely available; it might, for example, reside on a password-protected server, or require asking the owner of the data. IMDC does not dictate the terms of availability of the data, it just helps you with the first step of finding the data.

DatCat development was supported by grant ANI-0137121 from the Advanced Networking Infrastructure program of the National Science Foundation.

See also the IMDC white paper.

Catalog Organization

Information in IMDC is organized as Objects, each of which describes a real-world object or idea. For the purposes of finding and obtaining data, the most important types of objects are:

Collection
The core of IMDC is the Collection object. A Collection describes a set of data files with a common purpose and/or collected as part of a single effort. When searching for data, Collections are almost always where you want to start.
Location
Location objects represent the method for obtaining a collection. Often this will be a URL linked directly to an external server hosting the collection, but it can also be text instructions (e.g., for collections that require human approval or agreement to an AUP).
For more detail and a complete list of object types, see the Object Types documentation.

Display Conventions

The following typographic and styling conventions are used throughout DatCat. Their exact appearance will depend on your web browser.

example meaning linked to
imdc The name of an object in the catalog The object's detail page
wireless data collections A description of a set of objects Search results matching the description
Object Types Help topic The page for the help topic
info@datcat.org Email address A mailto: link
advanced search Other internal reference An internal page
CAIDA An external reference An external site

Handles

Object names within IMDC are not necessarily unique. Some objects (e.g., Locations) do not even have names. To uniquely identify objects, IMDC assigns each one a persistent Handle, for example imdc.datcat.org/collection/1-0123-L=FooNet-Packet-Traces. IMDC Handles are designed to last forever, making them ideal for use in citing data in a research paper. Appending http:// to the beginning of a handle will produce the URL of the object's detail page.

Informally, the syntax of an IMDC Handle is:

meaning: instanceName / objectType / version - objectID - checksum = objectName
example: imdc.datcat.org / collection / 1 - 0123 - L = FooNet+Packet+Traces
instanceName
Currently, the only valid instance name is imdc.datcat.org. In the future, there may be other instances of IMDC systems with different names. Within an IMDC instance, the instanceName is often omitted, in which case the current instance is implied.
objectType
The type of the object: collection, location, etc.
version
Version of the IMDC Handle syntax; must be 1. If changes to the handle syntax are made in the future, handles generated after that point may contain a new version number, but all handles generated previously will remain valid.
objectID
A short alphanumeric code, unique among objects of the same type within an IMDC instance. Vowels are not used, so you can be sure that characters that look like 0 and 1 are the digits zero and one, not letters.
checksum
A single character code used to detect malformed handles
objectName
The URL-encoded name of the object. This is intended only as an aid for humans; as far as the system is concerned, it is optional.

Annotations

Any logged-in user can annotate any IMDC object with a "note" containing additional information.

User Accounts

Although creating an account is not a prerequisite for browsing DatCat, we strongly encourage it. Creating an account is easy and free, and allows you to annotate catalog entries and customize your interaction with DatCat.

You must have an account to contribute information to IMDC.

Browser Issues

IMDC is designed to work with any browser that supports standard HTML. IMDC does not require graphics, cookies, javascript, or CSS, but it will take advantage of those features if available to make the interface more convenient, faster, and generally more pleasant, so we recommend using a browser with those features enabled. All infrastructural IMDC text is in ASCII English, although user-contributed text may contain other languages and character sets. If you find a problem, let us know.