+ - 0:00:00
Notes for current slide
Notes for next slide

Version Control:

A Practical Introduction to Git and Github

Joseph V. Casillas

From Proposal to Publication:
Pathways to Open Science | 07-14-2021

1 / 45

What are we going to do today?

3 / 45

What are we going to do today?

  1. What is version control and why do I care?

  2. Get familiar with Git/Github

3 / 45

get familiar with git/github understand conceptually what it is, what it does, what it is for learn how it can be a tool for our research (safety measures, promote open science) open doors for collaboration

What is version control?

4 / 45

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later

5 / 45
6 / 45
7 / 45

This process should be version controlled!

7 / 45
8 / 45

This project should be version controlled!

8 / 45

So what is version control?

9 / 45

So what is version control?

Recording changes to a file
or set of files over time so
that you can recall specific
versions later

9 / 45
9 / 45
9 / 45
9 / 45
9 / 45

Version control

How do we do it for reproducible research?

  • Git

  • Github

  • Gitlab

  • Bitbucket

10 / 45

What you'll need...

11 / 45


What you'll need

  • Git
  • GitHub account
  • Github Desktop

Git

12 / 45


What you'll need

  • Git
  • GitHub account
  • Github Desktop

GitHub

  • It's free

  • Use an academic email if you have one (more perks)

  • Give consideration to your username

    • avoid spaces, uncommon characters
    • shorter is better than longer
    • usually your real name (or related to it)
  • You should have this already 🙏

13 / 45


What you'll need

  • Git
  • GitHub account
  • Github Desktop

GitHub Desktop

  • This will make interacting with Git much easier

  • You can download the app here: https://desktop.github.com

  • You should have this already 🙏

14 / 45

What is Git?

15 / 45

Git

What is it?


  • Open source version control system
16 / 45

Git

What is it?


  • Open source version control system
  • Keeps track of changes you make to files
16 / 45

Git

What is it?


  • Open source version control system
  • Keeps track of changes you make to files
  • Conceptually similar to dropbox/box/google drive/
    one drive/etc., "but for nerds"
16 / 45

Git

How does it work?

  • Git keeps track of incremental changes in project files

  • We can think of it like taking snapshots of a repository (project) 📷

  • We access Git via the command line

17 / 45

Git

How does it work?

  • Git keeps track of incremental changes in project files

  • We can think of it like taking snapshots of a repository (project) 📷

  • We access Git via the command line

17 / 45
18 / 45

What is Github?

19 / 45

Github

What is it?


  • An online interface to Git

  • A place to store repositories (projects)

  • Owned by microsoft 😞

20 / 45

Walk-through

21 / 45

profile, repos, public, private not to late to catch up

Git

a version control system that let's you manage and keep track of your source code history via repositories

Github

a cloud based hosting service that lets you manage Git repositories

22 / 45
23 / 45

Exercise I

24 / 45

Exercise I

Your first repo

1. Create new repository

  • Option a) Click the New icon anywhere

  • Option b) Click on the profile icon in the
    upper right hand corner, from the
    dropdown menu select Your repositories,
    click the New icon

25 / 45

Exercise I

Your first repo

2. Name your repo my-repo and
check the box for Add a README file

25 / 45

Exercise I

Your first repo

3. Click Create repository

25 / 45
25 / 45

Exercise I

Your first repo

4. Click README.md

25 / 45

Exercise I

Your first repo

5. Click the pencil icon and edit the file by adding some text

25 / 45

Exercise I

Your first repo

6. Scroll to the bottom and click Commit changes

25 / 45

Exercise I - Review

  1. Create new repository
    • Option a) Click the New icon anywhere
    • Option b) Click on the profile icon in the upper right hand corner, from the dropdown menu select Your repositories, click the New icon
  2. Name your repo my-repo and
    check the box for Add a README file
  3. Click Create repository
  4. Click README.md
  5. Click the pencil icon and edit the file by adding some text
  6. Scroll to the bottom and click Commit changes
26 / 45

Exercise I - Review

Key ideas

  • Repositories are projects

  • We commit changes to the repository

  • README files have special status on Github

Tips

  • Remember to think carefully about names

  • Avoid spaces and uncommon characters (if possible)

27 / 45

Exercise II

28 / 45
29 / 45
29 / 45
29 / 45
29 / 45
29 / 45

Walk-through

30 / 45

Exercise II

Cloning

  1. Clone my-repo repo on to your computer
    (Github desktop: File > Clone repository)

  2. Edit the README.md file using a text editor
    and put some text in the body

  3. Save everything, commit the changes to origin with
    an informative message

  4. Push changes from your local (cloned) repo to origin (the cloud)

  5. Check github to see if it worked

31 / 45

Exercise II - Review

Key ideas

  • We clone repositories from origin to our computers (local copies)

  • We work like normal and commit changes

  • When we finish we push changes from our local copy to origin

  • The changes are then visible on Github

Tips

  • Always think carefully about where projects live

  • Practice this procedure (clone > edit > commit > push) with dummy repos to get the hang of it

  • Don't be afraid to burn the house down

32 / 45

What this means...

33 / 45
33 / 45
33 / 45
33 / 45
33 / 45
33 / 45
33 / 45

Exercise III

34 / 45

Exercise III

Collaborating with forks

  1. Fork the github_practice repo

  2. Clone repo to your computer

  3. Open the README.md file

  4. Edit README.md

  5. Commit changes

  6. Push changes to origin (your copy on Github)

  7. Submit a pull request

35 / 45
36 / 45
37 / 45
38 / 45
39 / 45

Walk-through

40 / 45

Exercise III

Collaborating with forks

  1. Fork the github_practice repo

  2. Clone repo to your computer

  3. Open the README.md file

  4. Edit README.md

  5. Commit changes

  6. Push changes to origin (your copy on Github)

  7. Submit a pull request

41 / 45

Exercise III - Review

Key ideas

  • We can fork other peoples repos

  • We can then clone our forked copy and make changes

  • Edit > commit > push

  • We can send our modifications to the original repo via pull requests

Tips

  • Learn from other people by forking their projects 👍

  • It is extremely fulfilling to contribute to other people's projects 😄

  • If you get stuck/make a mistake you can always burn the house down 🔥

42 / 45

Git/Github

Metaphors and terminology

  • repo
  • commit
  • clone
  • push
  • pull
  • fork
  • pull request

there are more

43 / 45
44 / 45

Next steps...

  • Branches

  • Merge conflicts

  • The command line

45 / 45


Thank you!






Hint: continue for more tips and tricks!

45 / 45

More about projects

45 / 45

Workflows and projects

What does your workflow currently look like?

What do you organize for?

How do you do it?

Where do your files live?

45 / 45

RStudio projects

If your projects include R scripts, RMarkdown documents, HTML slides, etc., you should be using RStudio projects

45 / 45

RStudio projects

If your projects include R scripts, RMarkdown documents, HTML slides, etc., you should be using RStudio projects

  • What are they?
45 / 45

RStudio projects

If your projects include R scripts, RMarkdown documents, HTML slides, etc., you should be using RStudio projects

  • What are they?

.rproj files create an independent RStudio environment that limits the scope of your R session to the project in question.

45 / 45

RStudio projects

If your projects include R scripts, RMarkdown documents, HTML slides, etc., you should be using RStudio projects

  • What are they?

.rproj files create an independent RStudio environment that limits the scope of your R session to the project in question.

  • Why use them?
45 / 45

RStudio projects

If your projects include R scripts, RMarkdown documents, HTML slides, etc., you should be using RStudio projects

  • What are they?

.rproj files create an independent RStudio environment that limits the scope of your R session to the project in question.

  • Why use them?
  • They simplify organizing projects
  • They integrate well with github
  • They promote a project-oriented workflow
45 / 45

What's in a project anyway?

45 / 45

Directory structure

Student

  • class notes
  • articles
  • misc. documents
  • final project
  • homework
45 / 45

Directory structure

Student

  • class notes
  • articles
  • misc. documents
  • final project
  • homework
45 / 45
45 / 45

Directory structure

Researcher

  • data
  • scripts
  • manuscript
  • slides
  • READMEs
45 / 45

Directory structure

Researcher

  • data
  • scripts
  • manuscript
  • slides
  • READMEs
45 / 45
45 / 45

Directory structure

A note on READMEs

  • Every folder/directory should have a README.md file that explains...
    • the purpose of the folder
    • the contents of the folder
    • any keys/glossaries necessary to understand the contents of the folder
    • timestamps
    • etc.
45 / 45

Say my name

45 / 45

Naming conventions

Why?

  • You probably don't spend much time thinking about how you name files and documents

  • You should

  • The idea is to follow a few simple guidelines that will make facilitate organizing your projects and make the file structure easily searchable

45 / 45

Naming conventions

How?

  • Use descriptive names

    • Bad: Experiment
    • Good: qualifying_paper_1
  • no capitals, no spaces (use _ or -)

    • Bad: Experiment for syntax class
    • Good: semantic_predictability_exp_1
  • no non-standard characters

    • Bad: Analysis of ð for ICPhS 2019
    • Good: spirantization_icphs_2019
45 / 45

Naming conventions

How?

  • Use descriptive names

    • Bad: Experiment
    • Good: qualifying_paper_1
  • no capitals, no spaces (use _ or -)

    • Bad: Experiment for syntax class
    • Good: semantic_predictability_exp_1
  • no non-standard characters

    • Bad: Analysis of ð for ICPhS 2019
    • Good: spirantization_icphs_2019
  • Numbers: use 2 places minimum, year-month-day
    • Bad: Analysis 1
    • Better: 01_analysis
    • Best: 2019-02-20_data_download
45 / 45

Naming conventions

How?

  • Use descriptive names

    • Bad: Experiment
    • Good: qualifying_paper_1
  • no capitals, no spaces (use _ or -)

    • Bad: Experiment for syntax class
    • Good: semantic_predictability_exp_1
  • no non-standard characters

    • Bad: Analysis of ð for ICPhS 2019
    • Good: spirantization_icphs_2019
  • Numbers: use 2 places minimum, year-month-day
    • Bad: Analysis 1
    • Better: 01_analysis
    • Best: 2019-02-20_data_download
45 / 45

Github integration

45 / 45
45 / 45
45 / 45

Tips and tricks

45 / 45

Tips and tricks

Nested projects

  • You can have RStudio projects inside of larger projects
  • Only the larger project is a repo
  • Ex.
    • Project: dissertation (← repo)
    • Nested project: production_semantic_processing_las
    • Nested project: prod_perc_bilabials_jphon
45 / 45

Tips and tricks

Nested projects

  • You can have RStudio projects inside of larger projects
  • Only the larger project is a repo
  • Ex.
    • Project: dissertation (← repo)
    • Nested project: production_semantic_processing_las
    • Nested project: prod_perc_bilabials_jphon

Alfred

  • Productivity app
  • Only on Mac (I think), free
  • Works like Spotlight search, but customizable
  • Set to search for .rproj files
  • Allows you to quickly open and close RStudio projects
45 / 45

Advanced

45 / 45

Review

  • There are different types of workflows when using github
  • Standard way:
    • Create repo
    • Clone local copy
    • Make changes
    • Commit/push changes to remote
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45

Review

  • There are different types of workflows when using github
  • Standard way:
    • Create repo
    • Clone local copy
    • Make changes
    • Commit/push changes to remote
  • When collaborating:
    • Fork repo
    • Clone local copy
    • Make changes
    • Commit/push changes to remote
    • Submit pull request to origin
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45

Useful workflow when collaborating with many people

45 / 45
45 / 45
45 / 45

But...

...this isn't the most common method

45 / 45

But...

...this isn't the most common method

...or the best method

45 / 45
45 / 45
45 / 45

What gives?

  • In the programming world committing to master is a nono. Why?

  • Most developers use git as version control for software

  • The master branch is usually left for public releases

  • As such it should always work

45 / 45

What's the problem

  • You might break something

  • You might get complicated merge conflicts

Person A is working on the same file as Person B, they both make changes and Person A tries to submit a pull request to master on a file that is different from when (s)he started

45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
  • Master is the definitive branch
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
  • Master is the definitive branch
  • We create new branches to experiment, make edits, create new features, etc., before committing them to master
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
  • Master is the definitive branch
  • We create new branches to experiment, make edits, create new features, etc., before committing them to master
  • When you create a branch off of master you’re essentially making a copy, or snapshot of master at a specific point in time (like making a copy of a word file, i.e., essay_final_version3_for_real.docx)
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
  • Master is the definitive branch
  • We create new branches to experiment, make edits, create new features, etc., before committing them to master
  • When you create a branch off of master you’re essentially making a copy, or snapshot of master at a specific point in time (like making a copy of a word file, i.e., essay_final_version3_for_real.docx)
  • A branch should be used for making 'one logical change', usually to add a feature in software development
45 / 45

What's the solution?

Branches

  • Using branches gives us a way to work on different versions of a repository at one time
  • By default a new repo has one branch called master
  • Master is the definitive branch
  • We create new branches to experiment, make edits, create new features, etc., before committing them to master
  • When you create a branch off of master you’re essentially making a copy, or snapshot of master at a specific point in time (like making a copy of a word file, i.e., essay_final_version3_for_real.docx)
  • A branch should be used for making 'one logical change', usually to add a feature in software development
  • If a collaborator makes changes to master while you are working on a branch, you can pull in those changes
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45
45 / 45

Yeah, but... academic writing is different

  • Using branches in many cases may be overkill, depending on what your working on, i.e., slides for class
45 / 45

Yeah, but... academic writing is different

  • Using branches in many cases may be overkill, depending on what your working on, i.e., slides for class
  • But there are concrete examples where it makes sense to use branches...
    • when collaborating with others
    • when working on large projects
    • when making revisions to a manuscript
    • when making changes to your website, cv, conference presentation/poster
    • whenever breaking something small will have a large impact on the project
    • whenever the possibility of "going back" needs to be controlled and easily achieved
45 / 45
45 / 45
45 / 45
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

  • Working as outside collaborator: fork, commit/push to remote,
    pull request to master
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

  • Working as outside collaborator: fork, commit/push to remote,
    pull request to master

(low danger)

45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

  • Working as outside collaborator: fork, commit/push to remote,
    pull request to master

(low danger)

Better way

  • Working solo: branch, commit/push, pull request
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

  • Working as outside collaborator: fork, commit/push to remote,
    pull request to master

(low danger)

Better way

  • Working solo: branch, commit/push, pull request
  • Working as collaborator, branch, commit/push, pull request
45 / 45

Summary - workflows

Old way

  • Working solo: commit/push to master

(medium danger)

  • Working as collaborator: commit/push to master

(danger!)

  • Working as outside collaborator: fork, commit/push to remote,
    pull request to master

(low danger)

Better way

  • Working solo: branch, commit/push, pull request
  • Working as collaborator, branch, commit/push, pull request
  • Working as outside collaborator: fork, branch, commit/push, pull request
45 / 45
45 / 45
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow