DatCat℠Internet Measurement Data Catalog

Log in | Create an Account
Search for in
Enter one or more word stems or quoted phrases. Wildcards “*” and “?” are allowed.
Contact us

Contributing Metadata

News

2014-08-19: Individual Data and Packages are no longer supported. Users attempting to visit Handles for any old Data or Package will be redirected to the relevant Collection. Information from all old Data and Package objects was merged into the relevant Collections. To help you verify that no important information was lost, the previous version of the catalog is temporarily still available at legacy.datcat.org.

2013-02-05: The new DatCat submission system is now available. Where the old system required you to document every Data file and Package of a Collection in great detail using special tools, the new system allows you to describe a Collection as a whole by filling out a couple relatively simple web forms. The offline submission generation tools are currently not supported on the new system.

Who can contribute?

Anyone with a login account can contribute metadata.

What should I contribute?

Contribute metadata about any network measurement data you have that you are potentially willing to make available to network researchers. For specific recommendations on what type of metadata to include, refer to CAIDA's web page on How to Document a Data Collection.

But I don't want to give unlimited access to my data.

IMDC does not store the data itself, only metadata to help researchers identify and find the data. Storage of the data is up to you, leaving you in complete control of access to the data.

How do I contribute?

You must be logged in to make a submission. To begin, visit the Submit link at the top of any page, choose to start with a Collection or Publication, and fill out the form.

Submissions are made in "staging areas", which allow you to insert, edit, view, and delete objects in the catalog before making them visible to other users. You may have multiple staging areas, but objects in one staging area can not reference objects in other staging areas. When you are satisfied with your submitted staged objects, you can Activate it to make it visible to other users. Once activated, objects can still be edited via the web interface, but can never be deleted.

If you are contributing data on behalf of some organization, consider using a role account instead of a personal account to make your submission.

Guidelines on writing submissions

To help people find your data (and that is why you're contributing, after all) try to fill in as much information as possible. But remember that it is often better to contribute incomplete information than nothing at all, so don't let incomplete information stop you from contributing; you can always come back later and edit your contributions to add more information.

Fields may contain text in any language, provided that your browser uses the correct character encoding. Valid encodings include "UTF-8" and "ISO-8859-1". But remember that the primary language of DatCat is English, and some users may be unable to view non-western text, so you should try to avoid using non-western text in key fields (like "Name"), and provide western alternatives for any non-western text you use in other fields.

Before continuing, be sure you are familiar with the Object Type documentation.

Common Fields
Name (required) (3 to 256 characters)
The object's full name, which will be displayed to other users. If the name is not unique, it may be displayed with a numeric suffix to make it unique. Objects should be given names that are descriptive, but brief. Names on most objects are limited to 256 characters.
Creators
A list of contacts who created the real-world item described by this object, which does not necessarily include the contact who is contributing it to IMDC. If appropriate contact objects do not exist, you can create contacts without logins to use as creators.
Primary contact
The preferred contact for answering inquiries about the real-world item described by this object, particularly if that contact differs from the creator(s). For example, the primary contact could be a role contact (with a role email address) corresponding to a team, and the creators could be person contacts corresponding to members of the team.
Short description (required) (up to 128 characters)
The short description is the only free-form text that appears in search results, so try to write text that will describe the most important features of the object in a way that will give a broad categorization of the object as well as distinguish the object from similar objects. There is no need to repeat information that is included in the object's name or visible elsewhere on a search result page (e.g., a Collection's format and date).
Description (allows markup) (up to 4000 characters)
The description field can be up to 4000 characters, so be as descriptive as you can. The description appears only on an object's detail page. Please write something here, even if you use Description URL to point at a description on another web page.
Home page (up to 1018 characters)
This should be the URL (including “http://” or other scheme) of a web page that describes the object, if there is one. You might want to use this if you already have an existing web page, or if many objects share a large identical description.
Keywords
A comma-separated list of words or phrases that users are likely to use when trying to find your object. For collections, it is not necessary to include the file format, since that can be searched separately. When appropriate, try to use keywords that are already being used, but also feel free to make up your own new keywords for ideas not covered by existing keywords.
Private ID (up to 256 characters)
If you are contributing information to IMDC that already exists in a separate database, you can use the private_id field of IMDC objects to store an identifier from that other database in order to record that relationship. IMDC will warn you if you attempt to insert objects with duplicate Private IDs. IMDC lets you search for your objects by private ID and get mappings of your private IDs to IMDC Handles. An object's Private ID is normally visible only to the object's contributor and IMDC administrators, although it may be transmitted unencrypted.
State
“active” objects are visible to everyone; “staged” objects are visible only to the contributor and reviewers, and can be deleted.
collection
The core of IMDC is the Collection object. A Collection is a set of closely related data files, usually collected as part of a single effort. A Collection may contain other Collections or Publications to indicate that it contains all the data contained by the others. For example, Collections named “F-root DNS traces” and “A-root DNS traces” might each contain hundreds of data files representing traces taken at the respective root DNS servers, while “Root DNS traces” might contain both of those Collections and thus indirectly contain all data contained by those Collections. Normally a Collection should contain either a non-zero file_count or one or more sub-collections, although during the contribution process a Collection is allowed to be empty temporarily.
Collection name (required) (up to 128 characters)
A good name for a collection object usually includes some indication of the purpose of the collection, and perhaps some form of the type of data, date, and source location.
Motivation (up to 400 characters)
Describe the reason you thought it was useful to gather data objects together into this single Collection. This field is displayed only on the Collection's detail page.
Summary (required) (up to 800 characters)
The summary should describe the most important features of the Collection, to help users decide if the collection is potentially interesting to them. This may include descriptions of any or all of its contents, purpose, creators, location, timing, or how it was created. The summary will appear alone (without the description) in the “more information” view of Collection search results and may appear on the Browse page. Technical details should be put in the description, not the summary.
Description (allows markup) (up to 4000 characters)
The description is displayed after the summary on a collection's detail page, so should not repeat the summary verbatim, but should expand upon it. If you include statistics or other dated information for a collection that may grow in the future, be sure to mention the date on which that information was correct.
Data start time
The earliest time represented by data in the collection.
Data duration
The time period between the first and last data in the collection, even if there are gaps in the coverage. Use 0 seconds to indicate that the collected data represent a single instant; use the value “ongoing” to indicate that addition of new data files to the Collection is ongoing.
Data file count
Data file size
The number and size of data files included directly in this collection. I.e., do not include any data files already included in cataloged child-collections. For example, if annual collections “FooBar 2012”, “FooBar 2013”, etc. are assembled into a super-collection “FooBar”, then the annual collections should have file counts and sizes, but the super-collection should not.
format
The Format of data files included in this collection. Many common formats are already defined in DatCat, but if there is no definition for the one you need, you can contribute your own Data Format object.
Geographic location (up to 4000 characters)
Network location (up to 4000 characters)
Logistic location (up to 4000 characters)
Fill in whatever is relevant to the data. For example, the network or logistic location is important to a packet trace or active probe, but a Route-Views snapshot of the global routing table doesn't really have a location. To maximize the usefulness of geographic location for searching, be complete; e.g. enter “San Diego, California, US” instead of just “San Diego”. Line breaks in these fields are preserved in display.
Platform (up to 4000 characters)
The relevant hardware, software, and OS used to collect the data in this collection. Line breaks in this field are preserved in display.
Creation process (allows markup) (up to 4000 characters)
Describe the procedure used to collect data in this collection. Ideally, include all configuration parameters, and quote command lines verbatim if possible. General information common to all data files with the same format should usually be relegated to the Data Format's description.
publication
A Publication in IMDC describes a scholarly paper, article, or other publication, but more importantly for IMDC, it organizes the data used by the publication. Thus, it can be considered a kind of specialized Collection. Like a Collection, a Publication may contain other Collection or Publication objects in order to incorporate their contents.
Publication title (required) (up to 128 characters)
The full title of the paper, including correct capitalization.
Publication date (required) (up to 10 characters)
The date of publication, in “YYYY”, “YYYY-MM”, or “YYYY-MM-DD” format.
Venue (required) (up to 128 characters)
The name of the conference, journal, magazine, or other venue where the publication was published. Do not include the date; there is a separate field for that.
Summary (required) (up to 800 characters)
The summary should describe the Publication in a sentence or two; it should be shorter than the abstract. The summary will appear in several places where the abstract will not: the “more information” view of Publication search results, and possibly on the Browse page.
Data description (allows markup) (up to 4000 characters)
The abstract is displayed after the summary on a publication's detail page.
Data start time
The earliest time represented by data used by the publication.
Data duration
The time period between the first and last data in the collection, even if there are gaps in the coverage. Use 0 seconds to indicate that the used Data objects represent a single instant. Unlike a Collection, a Publication's data duration should not be marked as ongoing, since a publication could not have used data that did not exist at the time of publication.
format
The Format of data files included in this collection. Many common formats are already defined in DatCat, but if there is no definition for the one you need, you can contribute your own Data Format object.
Geographic location (up to 4000 characters)
Network location (up to 4000 characters)
Logistic location (up to 4000 characters)
Fill in whatever is relevant to the data. For example, the network or logistic location is important to a packet trace or active probe, but a Route-Views snapshot of the global routing table doesn't really have a location. To maximize the usefulness of geographic location for searching, be complete; e.g. enter “San Diego, California, US” instead of just “San Diego”. Line breaks in these fields are preserved in display.
Platform (up to 4000 characters)
The hardware, software version, and OS used to collect the data, if it's relevant to the data collection. Line breaks in this field are preserved in display.
Creation process (allows markup) (up to 4000 characters)
Describe the procedure used to collect data used in this publication. Include all configuration parameters, and quote command lines verbatim if possible.
format
A Data Format (e.g., “pcap”) describes the file format used by raw data files, and a Package Format (e.g., “tar-gzip”) describes the file format used to bundle multiple data files together into a single file.
suffixes
A set of suffixes commonly used on the names of files of this format (e.g., “.pcap” or “.tar.gz”). Remember to include the leading period (if any).
contact
Contact objects describe a person or role. Contacts are used to describe the creators of collections and other cataloged items. Every IMDC login account is also a Contact.
Full name (required) (3 to 256 characters)
For a person, use the full name (e.g., the same name you would use as the author of a paper). For a role, include some form of the organization name, so that the role name is meaningful out of context (e.g., “FooNet Data Manager”, not just “Data Manager”).
location
A Location object represents a URL or other method for downloading or obtaining a single Collection of data.
Download URL (up to 1018 characters)
A URL linked directly to a directory where the data in the collection can be downloaded. If there is no such link, give instructions in download procedure instead. If there is a direct URL, but it is password-protected or otherwise restricted, you can specify it here, but you should also set availability to “restricted” and give further instructions in download procedure. If you have a URL of a paper describing the data, please put that in the URL field of the Collection or Publication.
Availability (required)
“Free” if anyone may obtain the collection without restrictions, or “Restricted” if obtaining the collection requires a password or agreement to an AUP or has other restrictions. If “restricted”, you must give instructions in download procedure on how to obtain permission.
Download procedure (allows markup) (up to 1018 characters)
Additional text instructions on how to obtain the collection, if necessary. This can be omitted only if there is a download URL and there are no special instructions or requirements (i.e., availability is “free”). If your instructions include a URL, remember to use markup.
Geographic location (up to 4000 characters)
Logistic location (up to 4000 characters)
The location of the download server from an organizational viewpoint, e.g. “CAIDA” or “University of Freedonia”. These can help users predict what kind of performance to expect from the server. To maximize the usefulness of geographic location, be complete; e.g. enter “San Diego, California, US” instead of just “San Diego”. Line breaks in these fields are preserved in display.

Markup

Some large text fields allow the contributor to include markup in the text. There are two choices for type of markup:
none
Line breaks are preserved, and every other sequence of whitespace is collapsed to a single space.

HTML
Not true HTML, but a variation that allows a subset of standard HTML tags and attributes, plus the following additional tags:
imdcnum
Formats a number with thin spaces between sets of three digits for improved readability (in browsers with standards-compliant CSS support; in others, imdcnum has no effect). This is preferred over the use of commas or periods, which can be confused with decimal marks. For example, <imdcnum>87654321</imdcnum> will be rendered as 87654321 instead of 87654321.

imdcref
Creates a link to the detail page of an IMDC object. Exactly one of the self, handle, or xid attributes must be specified. When IMDC generates a page containing a <imdcref>, it looks up the name then, and uses the name as the link text.

For example, if an object has a marked-up field containing

<imdcref handle="/contact/1-0002-H">

then the generated detail page will contain HTML that renders as Ken Keys and links to the detail page for /contact/1-0002-H.

Another example: to contribute a new object that mentions its own handle in a marked-up field, despite the fact that you don't know what its handle will be yet, you could write something like

My handle is <imdcref self asHandle>

End tag: none

Attributes:

self
specifies that the link should refer to the object to which this field belongs.
handle=handle
specifies the IMDC Handle of the IMDC object to which the link will refer.
xid=xid
specifies the xid of the IMDC object to which the link will refer. The xid attribute is valid only in XML submission files, and must refer to an xid defined before the end of the current transaction.
asHandle
display the object's full Handle instead of the object's name.


imdcsearch
Creates a link to a search result page. The objtype attribute must be specified, along with one or more other attributes to specify the search criteria. String search criteria allow * and ? wildcards; numeric search criteria require equality in order to match.

For example, if an object has a marked-up field containing

<imdcsearch objtype="collection" keywords="wireless">wireless data</imdcsearch>

then the generated detail page will contain HTML that renders as wireless data and links to a search result page as described.

Attributes marked as taking objref values in the list below refer to other objects. The value of these attributes may begin with handle: or name: to indicate that the rest of the value specifies the other object by handle or name. If neither of these prefixes is used, the entire value is treated as a name. For example,

<imdcsearch objtype="collection" contributor="handle:/contact/1-001W-1">Collections contributed by CAIDA</imdcsearch>

will generate HTML that renders as Collections contributed by CAIDA and links to a search result page of all collections whose contributor has the handle "/contact/1-001W-1".

End tag: required

Attributes:

objtype=type
The type of object to search for: contact, dformat (data format), pformat (package format), collection, location, or annotation.
name=string
search by object's name
contributor=objref
search for objects with the specified contributor
creator=objref
search for objects with the specified creator
short_desc=string
search by object's short description
desc=string
search by object's description
email=string
for contact only: search by contact's email
cc=string
for contact only: search by contact's 2-letter ISO country code
org=string
for contact only: search by contact's organization
type=type
for contact only: person or role
for formats only: text, binary, or mixed
suffix=string
for formats only: search by format's file suffix
format=objref
for collection only: search for objects with the specified format
size=number
for collection only: search by object's size in bytes
geographic_loc=string
for collections only: search by object's geographic location
logistic_loc=string
for collections only: search by object's logistic location
network_loc=string
for collections only: search by object's network location
keywords=string[,string]...
for format and collection only: search for objects that have a keyword matching all of the strings in the comma-separated list
showform=1
link to a search form instead of a search result page. The form will be pre-filled-in with values according to the other imdcsearch attributes.

The accepted standard HTML tags are: a, b, big, blockquote, br, cite, code, dd, dfn, dl, dt, em, h4, h5, h6, hr, i, kbd, li, ol, p, pre, samp, small, strong, sub, sup, table, td, th, tr, tt, ul, var. The accepted standard HTML attributes are: colspan, href, rowspan, span.

Particularly bad cases of invalid markup will be rejected at contribution time. Less bad cases will be silently and automatically cleaned up.