Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
You will need to set up a working location on your local computer. This will be a folder/directory that will contain copies of the two primary code bases used in this tutorial. For the remainder of this tutorial we will assume that you have created a new folder called web-of-science-export within your computer’s Desktop folder. We will then refer to this location using the Unix-style path convention ~/Desktop/web-of-science-export
.
If you are a Windows user, before cloning the wos-explorer, it is possible there will be some line feed encoding settings that can prevent your scripts from running correctly with the Unix-based CHTC systems and the Bash commands. To make sure your version of wos-explorer
is using the correct encoding, run these two commands (you will not see anything happen, but if there is no error message it has likely worked):
git config --global core.autocrlf false
git config --global core.eol lf
On your computer, navigate in your terminal program to the project folder:
cd ~/Desktop/web-of-science-export
Clone the CHTC Recipes project:
git clone https://github.com/UW-Madison-Library/chtc-recipes.git
We will only be using the recipe/project in the sub-folder wos-findbyexport
.
The CHTC Recipes themselves are collections of scripts for the CHTC’s Condor high throughput system. This code base includes Unix/Linux bash scripts, Python scripts and Condor scripts. The Python code requires the use of a custom Python library developed by the Libraries called the Web of Science Explorer. When running these scripts on the CHTC’s computing cluster, each compute node in the cluster will run the code using a Docker container. This container is predefined by the Libraries and includes all the dependencies required, namely Python 3 and the Web of Science Explorer package.
If you would like to use the Python code base on your own computer while testing out your own custom scripts, you will need to clone or download the project from GitHub and install it from the source code. Cloning and installing the wos-explorer
package is a relatively quick and easy process.
Navigate back to the web-of-science-export
directory and clone the wos-explorer
project:
cd ~/Desktop/web-of-science-export git clone https://github.com/UW-Madison-Library/wos-explorer.git
Next we will build this code base as a package so that it can be used by the CHTC recipe. Note that the python
command below is aliased to Python version 3. The Web of Science Explorer package and these tutorials have only been tested using Python 3.
cd wos-explorer python setup.py sdist
A compressed tar file will appear in the dist directory that can be installed using the Python pip
command:
cd dist/ pip install wos_explorer-0.8.0.tar.gz
Note that the version number in the generated Python package may be different. You can then import objects into your own Python scripts:
from wos_explorer.article_collection import ArticleCollection
To get started, log into the Web of Science database on the UW–Madison Library page.
To login to the CHTC server double check you are logged into the WiscVPN and then enter the credentials you received during your CHTC onboarding meeting:
ssh <netid>@submit-1.chtc.wisc.edu
Enter your password when prompted (it will not appear on the screen). Note that your submit server version number may vary. For example, your CHTC account may have been assigned to the submit-2
server. If you have trouble logging in, you may need to check with the CHTC Research Facilitators to verify that their servers are configured to allow connections from the VPN pool that you are using. The general UW-Madison VPN pool should permit access.
Once you are logged in you will see “CHTC” spelled out in oversized characters.
Next make the directory where you will upload the wos-findbyexport scripts.
mkdir wos-findbyexport
Type exit to return to your own computer.
Once back on your computer, navigate to the directory in which you have the wos-findbyexport
scripts and use a secure FTP command to copy them to the CHTC submit server. A command like this will suffice:
scp -r ./* <netID>@submit-1.chtc.wisc.edu:/home/<NetID>/wos-findbyexport
It will prompt you for your password. As it copies the files from your machine to the CHTC server they will print to the screen.
You are now ready to run the jobs on the CHTC server. For your reference, here is the link to the CHTC instructions on starting your job on the CHTC server:
The first step is to login the submit server again, navigate to the wos-findbyexport directory you created and run the Condor submit command followed by the .dag
file you want it to execute:
condor_submit_dag wos-findbywosexport.dag
The .dag
file will schedule which jobs to run, so for now, all you need to do is check in on the process periodically to make sure it continues to run correctly.
There are several commands that allow you to check on the process, each with their own features. The most basic one is:
condor_q
The CHTC has created an extended guide on how to evaluate your jobs as they run using variations on the condor_q command. This guide lists the commands you can run to evaluate multiple aspects of the jobs as they run or to view them in certain formats to fit your needs.
Learning About Your Jobs Using condor_q
Once you are sure the process has completed, you can begin viewing the output files to check the results. You will be looking for the JSON output files located in the CHTC’s staging file system. You can access the staging file system using the transfer.chtc.wisc.edu
server.
Because the CHTC servers are not intended for storage, it is best practice to download your output files and then remove them from the server. You can use any FTP program to do this. To fetch the files using the Unix secure copy utility, run the following command in your terminal window:
scp <USERNAME>@transfer.chtc.wisc.edu:/staging/<USERNAME>/findbyexportfile-matches/*.gz .
After you have downloaded all the necessary files remove everything from the submit server.