Web of Science CHTC Tutorial

What follows is a guide to deploying a codebase developed by the UW-Madison Library Technology Group (LTG) that is designed for researchers interested in performing citation analysis on the Clarivate Web of Science (WOS) dataset. Because it contains millions of article records and billions of cited references within those articles, this dataset is large enough to pose challenges to standard models of analysis while also facilitating large scale analysis of citation patterns.

This workflow solves the scale issues by capitalizing on the powerful computing resources available at UW-Madison’s Center for High Throughput Computing (CHTC). The combination of the code base and the CHTC resources provides researchers with the precision and power to locate particular items from a massive dataset while maintaining complete metadata detail for every record.

Who would use this code?

The process outlined in this guide is meant as a general introduction for any researcher interested in performing citation analysis with computational tools. The code base is designed to extract a subset of article records from the WOS dataset and then trace each reference within each article. This allows researchers to find highly specific items that are related to one another from within the dataset’s massive network of citations.

The inputs

Custom search results through the WOS user interface
Search results serve as the criteria for matching records with those in the WOS dataset

The outputs

Selection of article records from out of the full dataset
Fullest form of metadata records for those article records
Full metadata records for the references that are cited by the original article records
References to cited articles that are unambiguous and linked by IDs

The value of the outputs

The outputs of the analysis have vast potential for revealing large scale citation patterns extending from the present back through the year 1900. Researchers can analyze citation chains extending back over a century while preserving all metadata to every record in each chain. The results can thus simultaneously accommodate broad network analysis as well as the contents of specific article records.

UW-Madison Libraries

Additional Options

Website Search

Catalog Search

Database Search

Journal Search

Article Search

UW-Digital Collections Search

Libraries

Web of Science CHTC Tutorial

Who would use this code?

The inputs

The outputs

The value of the outputs

Additional Options

Chat with a specific library

Web of Science CHTC Tutorial

Who would use this code?

The inputs

The outputs

The value of the outputs