Skip to Main Content

WoS XML

Access guide to the Web of Science Raw Data (XML)

About

The Web of Science (WoS) XML data provides the raw data behind the Web of Science Database for the years 1900-2022.

Terms and Conditions for Access and Use

  • The Web of Science XML data is intended for non-commercial, academic research.
  • The data is restricted to the use by faculty, staff, students, and researchers at the Georgia Institute of Technology.  As the Data’s publisher must provide prior approval for use or storage on devices physically located outside of the United States, you must first seek and receive written approval from the Library’s Data Scientist Librarian for any use of the Data outside the United States.  Commercial use of the data or derivatives is strictly prohibited.
  • The data and derivatives may not be shared outside of Georgia Institute of Technology including other universities, institutions, government agencies, or corporate entities.
  • You may no longer use the data set if your affiliation with Georgia Tech ends, including graduation, retirement, resignation, or termination.
  • Where usersquote and excerpt Licensed Information in their work as permitted by the Agreement, they must appropriately cite and credit Clarivate as the source. Attribution to Clarivate and use of the Licensed Information must not categorize or identify Clarivate as an ‘expert’ in any context and to ensure Licensed Information is not misrepresented or taken out of context. Without our prior written consent, the Licensed Information shall not be filed with any securities authorities. 

This is a large and complex data set, and the GT LIbrary will continue to evolve its support for the product.  We are currently at Phase 0.

Raw Data

  • We make raw data access via compressed files (Dropbox) and uncompressed files via Active Directory.
  • A signed researcher agreement is required and you can request access tby contacting jay.forrest@library.gatech.edu.
  • End-users are responsible for abiding by the terms and conditions for access and use and developing their own infrastructure and code to analyze the file.  (See code examples below).

Sandbox Data

  • WoS XML data is one of the files available in ProQuest TDM Studio as Web of Science } Metadata - TDM studio allows you to pull slices of data that you can analyze with your own code. 
    • Link to TDM Studio: https://tdmstudio.proquest.com/home
    • Register using your @gatech.edu account.

 

Code Examples