|Summary||This dataset describes a measurement of Web API (DOM) browser features used on the open web. Each site in the Alexa 10k was visited using an automated measurement technique described in Snyder et al.|
|Duration||29 days 00:00:00 (2505600.0 s)|
|Network location||University of Illinois at Chicago|
|Platform||Ubuntu 16.04.1 LTS|
|Primary contact||Peter Snyder|
|Creators||Peter Snyder, Chris Kanich, Cynthia Taylor, Lara Ansari|
|Keywords||alexa 10k, privacy, web browser|
The structure of the dataset is documented in per-
The dataset records the use of Web API features in the Alexa 10k. Each domain is represented by a row in the *domains* table, and each Web API feature is represented by a row in the *features* table. The W3C (or similar standards organization) document defining the each feature is defined by a record in the *standards* table.
Each site in the Alexa 10k was visited multiple times under different configurations (first in an unmodified browser, then again with popular browser extensions installed, etc). Each of these different configurations (or test conditions) is represented by a row in the *conditions* table.
We visited each domain in the Alexa 10k under each browser configuration 5 times. Each of these visits is a row in the *crawls* table, and each of the pages / urls visited during each visit to a domain is a row in the *pages* table. Each Web API feature used by each page is a row in the *features_
Finally, the reported security vulnerabilities we were able to associate with each standard of Web API features is described in the *cves* table.
The below is a abbreviated discussion of the methodology used to generate this data set. A full detailed description can be found in Snyder et al.
We first identified 1,390 distinct Web API features implemented in Firefox 46.0.1, and tied them to their 75 defining standards documents. This was done by reviewing the Firefox source and inspecting the DOM implemented in the browser.
Next, we built a browser extension that instruments the browser to count the number of times each of these browser features is used by a website.
This browser extension also implements a spidering technique to attempts to elicit all the functionality used on unauthenticated parts of a website. The extension interacts with the visited browser page by clicking on elements in the page, filling in form elements and scrolling the page, among other common user activities. The extension watches the URLs that would have been visited by these actions (ex clicking on links) and then selects 3 URLs with dissimilar paths to visit. The extension then visits these URLs and repeats the process.
The above described spidering technique is repeated depth-
We then installed the above spidering-
Finally, we visited each domain in the Alexa 10k five times with each browser, resulting in 96,609 domains being visited (some sites were unreachable, either globally or from our measurement location). The features used by each domain were recorded and included in the present data set.
1 data files (480 MiB)