DatCat℠Internet Measurement Data Catalog

|  Home |  Browse |  Search |  Help |
You are not logged in. | Log in | Create an Account
Contact us

Overview

Quick Start

For a quick introduction on using the IMDC web interface to find and obtain internet measurement data, see Tutorial: Getting Data.

What is DatCat?

DatCat, developed and run by CAIDA, is an Internet Measurement Data Catalog (IMDC), a searchable registry of information about network measurement datasets. It serves the global network research community by allowing anyone to find, annotate, and cite data contributed by others, and soon by allowing anyone to contribute new data.

The goals of DatCat IMDC are:

  • To facilitate searching for and sharing of data among researchers

    Finding data to use in network research has historically been difficult. By serving as a shared global resource where anyone can find the data needed for network analysis, DatCat mitigates a significant barrier to research.

  • To enhance documentation of datasets via a public annotation system

    Instead of relying on the data contributor alone to document the data, DatCat allows any researcher to annotate datasets with problems, features, or missing information they discover in the data, thereby increasing the utility of the datasets.

  • To advance network science by promoting reproducible research

    Reproducibility of results is a cornerstone of good science, but requires that the researcher's data is available to others. Similarly, to get the most meaningful comparison of analysis methodologies and algorithms, researchers must test them against the same data. By putting their data in DatCat or using data already in DatCat, and then citing the IMDC Handle in their published results, researchers can make it easier for others to obtain their data and validate their results or perform alternate analyses on the same data.

Note that IMDC does not store the data (or tools) itself, but only metadata, that is, descriptions of the data and instructions for obtaining it. The storage of the data itself remains in the hands of the contributor. As such, it may or may not be freely available; it might, for example, reside on a password-protected server, or require asking the owner of the data. IMDC does not dictate the terms of availability of the data, it just helps you with the first step of finding the data.

DatCat development was supported by grant ANI-0137121 from the Advanced Networking Infrastructure program of the National Science Foundation.

See also the IMDC white paper.

Catalog Organization

Information in IMDC is organized as Objects, each of which describes a real-world object or idea. For the purposes of finding and obtaining data, the most important types of objects are:

Data
The core of IMDC is the Data object. A Data object describes a dataset in a single file in its most natural working form, even if the data is not made available directly in that form.
Data Collection
A set of Data objects with a common purpose and/or collected as part of a single effort. When searching, it is often most convenient to search for Collections as a unit rather than searching through thousands of individual Data objects.
Package
A Package object describes a collection of one or more data files, in a form that can be downloaded or otherwise made available. Package objects usually represent compressed archives of data files, but can be as simple as a single uncompressed data file, if that file is the downloadable form.
Location
Location objects represent the method for obtaining a package. Often this will be a URL linked directly to the package (external to IMDC), but it can also be text instructions (e.g., for packages that require human approval or agreement to an AUP).
For more detail and a complete list of object types, see the Object Types documentation.

Display Conventions

The following typographic and styling conventions are used throughout DatCat. Their exact appearance will depend on your web browser.

example meaning linked to
imdc The name of an object in the catalog The object's detail page
passive data A description of a set of objects Search results matching the description
Object Types Help topic The page for the help topic
info@datcat.org Email address A mailto: link
simple search Other internal reference An internal page
CAIDA An external reference An external site

Handles

Object names within IMDC are not necessarily unique. Some objects (e.g., Locations) do not even have names. To uniquely identify objects, IMDC assigns each one a persistent Handle, for example imdc.datcat.org/data/1-0123-K=foobar.pcap. IMDC Handles are designed to last forever, making them ideal for use in citing data in a research paper. Appending http:// to the beginning of a handle will produce the URL of the object's detail page.

Informally, the syntax of an IMDC Handle is:

meaning: instanceName / objectType / version - objectID - checksum = objectName
example: imdc.datcat.org / data / 1 - 0123 - K = foobar.pcap
instanceName
Currently, the only valid instance name is imdc.datcat.org. In the future, there may be other instances of IMDC systems with different names. Within an IMDC instance, the instanceName is sometimes omitted, in which case the current instance is implied.
objectType
The type of the object: data, package, location, etc.
version
Version of the IMDC Handle syntax; must be 1. If changes to the handle syntax are made in the future, handles generated after that point may contain a new version number, but all handles generated previously will remain valid.
objectID
A short alphanumeric code, unique among objects of the same type within an IMDC instance. Vowels are not used, so you can be sure that characters that look like 0 and 1 are the digits zero and one, not letters.
checksum
A single character code used to detect malformed handles
objectName
The URL-encoded name of the object. This is intended only as an aid for humans; as far as the system is concerned, it is optional.

Annotations

Any logged-in user can annotate an IMDC object with a "note" containing additional information. Additionally, any user with permission to contribute can use the XML submission interface to add other types of annotations to IMDC objects. An annotation consists of a key (e.g. "note") and a value; the key describes how to interpret the value. A user does not need to own an object to annotate it, although there are restrictions on certain types of annotations. There is a set of predefined keys in a standard namespace, and contributors can also define new keys in their own namespace. Thus annotations allow great flexibility in the type of information that can be associated with an object, not limited by the built-in fields designed into IMDC objects or ownership of the objects.

User Accounts

Although creating an account is not a prerequisite for browsing DatCat, we strongly encourage it. Creating an account is easy and free, and allows you to annotate catalog entries and customize your interaction with DatCat.

You must have an account to contribute information to IMDC.

Browser Issues

IMDC is designed to work with any browser that supports standard HTML. IMDC does not require graphics, cookies, javascript, or CSS, but it will take advantage of those features if available to make the interface more convenient, faster, and generally more pleasant, so we recommend using a browser with those features enabled. All infrastructural IMDC text is in ASCII English, although user-contributed text may contain other languages and character sets. If you find a problem, let us know.


Software version 1.7.24
Page generated at 2008‑08‑20 02:22:25 UTC
Request processed in 0.0020 seconds
CAIDA Cooperative Association for Internet Data Analysis