Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
What follows is a guide to deploying a codebase developed by the UW-Madison Library Technology Group (LTG) that is designed for researchers interested in performing citation analysis on the Clarivate Web of Science (WOS) dataset. Because it contains millions of article records and billions of cited references within those articles, this dataset is large enough to pose challenges to standard models of analysis while also facilitating large scale analysis of citation patterns.
This workflow solves the scale issues by capitalizing on the powerful computing resources available at UW-Madison’s Center for High Throughput Computing (CHTC). The combination of the code base and the CHTC resources provides researchers with the precision and power to locate particular items from a massive dataset while maintaining complete metadata detail for every record.
The process outlined in this guide is meant as a general introduction for any researcher interested in performing citation analysis with computational tools. The code base is designed to extract a subset of article records from the WOS dataset and then trace each reference within each article. This allows researchers to find highly specific items that are related to one another from within the dataset’s massive network of citations.
The outputs of the analysis have vast potential for revealing large scale citation patterns extending from the present back through the year 1900. Researchers can analyze citation chains extending back over a century while preserving all metadata to every record in each chain. The results can thus simultaneously accommodate broad network analysis as well as the contents of specific article records.