Imagery Viewer with data layers on the Permafrost Discovery Gateway

The Command Line

Although the terminal lacks a friendly GUI, it processes commands faster than navigating your IDE’s file directory and it’s highly useful for working with large datasets across machines. It’s unforgiving with capitalization, typos, and undo-ing commands, but so worth it.

The following are my favorite terminal commands for branching repositories, counting files in large data directories, running scripts in the background, and more.

General Commands & tmux

General commands to navigate directories and shuffle them around. Modifying large dataset structures and can overwhelm an IDE, so it’s best done in the terminal.

Additionally, running commands with tmux allows processes to continue running in the background, even if the connection to the server is lost from VScode. It also allows multiple processes to be run simultaneously with different tmux sessions, so you can transfer a directory in one terminal while running scripts, or just close your laptop while a script runs with no concern about your laptop falling asleep.

Use	Command
vertically list files and directories, with date created	`ls -l`
list all files and directories, including hidden ones	`ls -a`
remove directory and all contents recursively	`rm -r {DIRECTORY}`
move file or directory	`mv file_name new_path/`
rename file or directory in current directory	`mv file_name new_name`
check number of files in the current directory	`ls -1 \| wc -l` (pay attention to ‘l’ versus ‘1’ here)
count files of any kind, recursively, from current dir	`find . -type f \| wc -l`
count files with a certain extension, recursively	`find . -type f -name "*.{EXTENSION}" \| wc -l`
check total data storage in `documents` directory	download package with: `curl https://sh.rustup.rs -sSf \| sh`, then install package with: `cargo install dirstat-rs`, then run: `ds documents`
create symbolic link to folder	`ln -s /path/to/folder {LINK NAME}`
create new tmux session	`tmux`
exit `tmux` session & allow it to run in the background	`ctrl` + `b` `d`
enter into a specific `tmux` session	`tmux a -t {SESSION ID}`
check all active `tmux` sessions	`tmux ls`
kill `tmux` session	`tmux kill-session -t {SESSION ID}`

Count number of files in current directory, recursively, and show how many files are within each subdirectory (run all as one command): find . -maxdepth 1 -type d -print0 \| sort -z \| while IFS= read -r -d '' dir; do n=$(find "$dir" -maxdepth 1 -type f \| wc -l); printf "%4d : %s\n" "$n" "$dir"; done

GitHub

Use	Command
remove requirement to enter GitHub credentials on server	`git config --global credential.helper`, `"cache --timeout=100000000"`
switch into branch `develop`	`git checkout develop`
push to branch `develop`	`git add {FILES}`, `git commit -m "{MESSAGE}"`, `git push origin develop`
create new branch	`git checkout -b {NewBranchName}`
create new branch from the `develop` branch	switch to `develop` branch, pull updates `git pull`, then: `git checkout -b {NewBranchName} develop`
print all branches in repo	`git branch -a`
check current branch and how files differ from remote	`git status`
check recent commits	`git log`
merge `develop` branch into `main`	push changes from `develop`, then: `git checkout main`, then: `git merge develop`
display repo’s branching & commit history as a tree	`git log --graph`

File & Directory Transfers

Options:

scp - best for small amounts of files and minimum complexity in commands
rsync - best for large amounts of files, especially with complex directory hierarchies, works well for transfers within a computer, between computers, or between a computer and Google Drive
globus - has a UI, best for large amounts files or directories between servers without using the commandline or a script, but you will need both the source and destination to have globus endpoints

Examples

Use scp to copy file or directory from local machine to a remote machine

scp /path/to/local/file/or/directory/ username@server.host.ucsb.edu:/path/to/destination

Copy all feather files from current directory to an account on the “Taylor” server

scp ./*.feather jscohen@taylor.bren.ucsb.edu:/Users/jscohen/data_features

Use rsync to copy directory from one directory on a local machine to another directory on the same local machine

rsync -av /path/to/source/directory /path/to/destination/directory

Note: Options -av includes -v which sets rsync to communicate its progress throughout the transfer, and -a which is the --archive option which combined the options -rlptgoD which stand for:

Option	Meaning
`-r`, `--recursive`	include recursive directories
`-l`, `--links`	copy symlinks as links
`-p`, `--perms`	preserve permissions
`-t`, `--times`	preserve times
`-g`, `--group`	preserve group
`-o`, `--owner`	preserve owner
`-D`	same as `--devices` and `--specials`, also transfer special files such as symbolic links, named sockets, and fifos (pipes)

Use rsync to add complexity to the command, like the following options:
--exclude to omit certain files from the transfer
--update to skip files that are newer in the destination location
--remove-source-files to delete the files from the source directory after they are transferred to destination

Terminal Commands for Big Data Workflows & GitHub

The Command Line

General Commands & tmux

GitHub

File & Directory Transfers

Examples