Learning Objectives

Following this assignment students should be able to:

  • use version control to keep track of changes to code
  • collaborate with someone else via a remote repository

Reading

Lecture Notes


How To

The exercises in this assignment should be worked through along with the Version Control lecture notes. Start at the beginning of the lecture notes and do the exercises where they are linked to in the notes.

Exercises

  1. Set Up Git (15 pts)

    Let’s say that you’re working on analyzing fish scale size data one day. Unfortunately you weren’t using version control and your cat jumped all over your keyboard and managed to replace your analysis code with:

    asd;fljkzbvc;iobv;iojre,nmnmbveaq389320pr9c9cd
    
    ds8
    a
    d8of8pp
    

    before somehow hitting Ctrl-s and overwriting all of your hard word.

    Determined to not let this happen again you’ve committed to using git for version control.

    Install Git for your operating system following the setup instructions. Then create a new repo at the Github organization for the class:

    1. Navigate to Github in a web browser and login.
    2. Click the + at the upper right corner of the page and choose New repository.
    3. Choose the class organization (e.g., dcsemester) as the Owner of the repo.
    4. Fill in a Repository name that follows the form FirstnameLastname.
    5. Select Private.
    6. Select Initialize this repository with a README.
    7. Click Create Repository.

    Next, set up a project for this assignment in RStudio with the following steps:

    1. File -> New Project -> Version Control -> Git
    2. Navigate to your new Git repo -> Click the Clone or download button -> Click the Copy to clipboard button.
    3. Paste this in Repository URL:.
    4. Leave Project directory name: blank; automatically given repo name.
    5. Choose where to Create project as subdirectory of:.
    6. Click Create Project.
    7. Check to make sure you have a Git tab in the upper right window.
  2. First Solo Commit (15 pts)

    This is a follow up to Set Up Git.

    In fish-analysis.R, add a comment above the creation of fish_data_cat describing what this code does.

    Commit this change to version control with a good commit message. Then check to see if you can see this commit in the history.

  3. Second Solo Commit (15 pts)

    This is a follow up to First Solo Commit.

    You discover that the device used to measure the scale length of the fish in Gaeta_etal_CLC_data.csv is not accurate for those smaller than 1 mm. Use dplyr to remove the fish with a scalelength of less than 1 mm from fish_data_cat. The new dataset will have 4,029 rows.

    Commit this change to version control with a good commit message.

  4. Commit Multiple Files (15 pts)

    This is a follow up to Second Solo Commit.

    After talking to a colleague, you realize that Gaeta_etal_CLC_data.csv is only the first in a series of similar files that you will receive. To help keep track of files, you decide to number them. Rename the Gaeta_etal_CLC_data.csv file to Gaeta_etal_CLC_data_1.csv manually, using the Files tab in RStudio. You’ll also need to change the first line of fish-analysis.R so that the script will still work.

    To include all of these changes in a single commit, stage both data files and the saved R script and then commit with a good commit message.

    Git will initially think you’ve deleted Gaeta_etal_CLC_data.csv and created a new file Gaeta_etal_CLC_data_1.csv. But once you click on both the old and new files to stage them, git will recognize this by making the two files into one and marking this with an R.

  5. Pushing Changes (20 pts)

    Now that you’ve set up your GitHub repository for collaborating with your colleague and made some changes, you’d better get them some work so they can see what you’re doing.

    1. To look at the relationship between the length of each fish’s body and the size of its scale across the different lakes sampled in these data, create a scatterplot with length on the x-axis and scalelength on the y-axis, then color the points using lakeid.
    2. Commit this change.
    3. Once you’ve committed the change click the Push button in the upper right corner of the window and then click OK when git is done pushing.
    4. You should be able to see the changes you made on Github.
    5. Email your teacher to let them know you’ve finished this exercise. Include in the email a link to your Github repository.
  6. Pulling and Pushing (20 pts)

    This is a follow up to Pushing Changes.

    STOP: Make sure you sent your teacher an email following the last exercise with a link to your Github repository and wait until your teacher has told you they’ve updated your repository before doing this one.

    While you were working on your plot of size among lakes, your colleague (who has suddenly developed some pretty impressive computational skills) wrote some code to generate a histogram of scale lengths. To get it you’ll need to pull the most recent changes from Github.

    1. On the Git tab click on the Pull button with the blue arrow. You should see some text that looks like:

      From github.com:ethanwhite/gryffindorforever
         1e24ac8..815e600  master     -> origin/master
      Updating 1e24ac8..815e600
      Fast-forward
       testme.txt | 1 +
       1 file changed, 1 insertion(+)
      create mode 100644 youareawesome.txt
      
    2. Click OK.
    3. You should see the new lines of code in your fish-analysis.R.

      ggplot(fish_data_cat, aes(x = scalelength, fill = length_cat)) +
        geom_histogram()
      
    4. Modify this code to look at narrower ranges of scale size classes by setting the bins argument to 80.
    5. Save this plot as scale_hist_by_length.jpg using ggsave.
    6. Commit the new code and resulting .jpg file by adding both files to the stage and committing with a good commit message, then push this to GitHub.