GIT

From Coopernix
Jump to: navigation, search



Wikipedia français[edit]

Git possède deux structures de données : une base d'objets et un cache de répertoires. Il existe quatre types d'objets :

  • l'objet blob (pour binary large object désignant un ensemble de données brutes), qui représente le contenu d'un fichier  ;
  • l'objet tree (mot anglais signifiant arbre), qui décrit une arborescence de fichiers. Il est constitué d'une liste d'objets de type blobs et des informations qui leur sont associées, tel que le nom du fichier et les permissions. Il peut contenir récursivement d'autres trees pour représenter les sous-répertoires ;
  • l'objet commit (résultat de l'opération du même nom signifiant « valider une transaction »[1]), qui correspond à une arborescence de fichiers (tree) enrichie de métadonnées comme un message de description, le nom de l'auteur, etc. Il pointe également vers un ou plusieurs objets commit parents pour former un graphe d'historiques ;
  • l'objet tag (étiquette) qui est une manière de nommer arbitrairement un commit spécifique pour l'identifier plus facilement. Il est en général utilisé pour marquer certains commits, par exemple par un numéro ou un nom de version (2.1 ou bien Lucid Lynx).

La base des objets peut contenir n'importe quel type d'objets. Une couche intermédiaire, utilisant des index (les sommes de contrôle), établit un lien entre les objets de la base et l'arborescence des fichiers.

Chaque objet est identifié par une somme de contrôle SHA-1 de son contenu. Git calcule la somme de contrôle et utilise cette valeur pour déterminer le nom de fichier de l'objet. L'objet est placé dans un répertoire dont le nom correspond aux deux premiers caractères de la somme de contrôle. Le reste de la somme de contrôle constitue alors le nom du fichier pour cet objet.

Git enregistre chaque révision dans un fichier en tant qu'objet blob unique. Les relations entre les objets blobs sont déterminées en examinant les objets commit. En général, les objets blobs sont stockés dans leur intégralité en utilisant la compression de la zlib. Ce principe peut rapidement consommer une grande quantité de place disque ; de ce fait, les objets peuvent être combinés dans des archives, qui utilisent la compression différentielle (c'est-à-dire que les blobs sont enregistrés sous la forme de différences par rapport aux autres blobs).


Git (Template:IPAc-en)[2] is a distributed version-control system for tracking changes in any set of files, originally designed for coordinating work among programmers cooperating on source code during software development.[3] Its goals include speed, data integrity, and support for distributed, non-linear workflowsTemplate:Clarify.[4][5][6]

Wikipedia anglais[edit]

Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development.[7] Since 2005, Junio Hamano has been the core maintainer. As with most other distributed version-control systems, and unlike most client–server systems, every Git directory on every computer is a full-fledged repository with complete history and full version-tracking abilities, independent of network access or a central server.[8] Git is free and open-source software distributed under GNU General Public License Version 2.

History[edit]

Git development began in April 2005, after many developers of the Linux kernel gave up access to BitKeeper, a proprietary source-control management (SCM) system that they had been using to maintain the project since 2002.[9][10] The copyright holder of BitKeeper, Larry McVoy, had withdrawn free use of the product after claiming that Andrew Tridgell had created SourcePuller by reverse engineering the BitKeeper protocols.[11] The same incident also spurred the creation of another version-control system, Mercurial.

  • Take Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision.[6]
  • Support a distributed, BitKeeper-like workflow.[6]
  • Include very strong safeguards against corruption, either accidental or malicious.[5]

These criteria eliminated every version-control system in use at the time, so immediately after the 2.6.12-rc2 Linux kernel development release, Torvalds set out to write his own.[6]

The development of Git began on 3 April 2005.[12] Torvalds announced the project on 6 April and became self-hosting the next day.[13][12] The first merge of multiple branches took place on 18 April.[14] Torvalds achieved his performance goals; on 29 April, the nascent Git was benchmarked recording patches to the Linux kernel tree at the rate of 6.7 patches per second.[15] On 16 June, Git managed the kernel 2.6.12 release.[16]

Torvalds turned over maintenance on 26 July 2005 to Junio Hamano, a major contributor to the project.[17] Hamano was responsible for the 1.0 release on 21 December 2005 and remains the project's core maintainer.[18]

Releases[edit]

List of Git releases:[19]

  • Protocol version 2 is now the default
  • Some new config tricks
  • Updates to git sparse-checkout
  • Introducing init.defaultBranch
  • Changed-path Bloom filter
  • Experimental SHA-256 support
  • Negative refspecs
  • New git shortlog tricks

Characteristics[edit]

Git's design is a synthesis of Torvalds's experience with Linux in maintaining a large distributed development project, along with his intimate knowledge of file-system performance gained from the same project and the urgent need to produce a working system in short order. These influences led to the following implementation choices:[20]

Strong support for non-linear development
Git supports rapid branching and merging, and includes specific tools for visualizing and navigating a non-linear development history. In Git, a core assumption is that a change will be merged more often than it is written, as it is passed around to various reviewers. In Git, branches are very lightweight: a branch is only a reference to one commit. With its parental commits, the full branch structure can be constructed.Template:Synthesis inline
Distributed development
Like Darcs, BitKeeper, Mercurial, Bazaar, and Monotone, Git gives each developer a local copy of the full development history, and changes are copied from one such repository to another. These changes are imported as added development branches and can be merged in the same way as a locally developed branch.[21]
Compatibility with existent systems and protocols
Repositories can be published via Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), or a Git protocol over either a plain socket or Secure Shell (ssh). Git also has a CVS server emulation, which enables the use of existent CVS clients and IDE plugins to access Git repositories. Subversion repositories can be used directly with git-svn.[22]
Efficient handling of large projects
Torvalds has described Git as being very fast and scalable,[23] and performance tests done by Mozilla[24] showed that it was an order of magnitude faster than some version-control systems; fetching version history from a locally stored repository can be one hundred times faster than fetching it from the remote server.[25]
Cryptographic authentication of history
The Git history is stored in such a way that the ID of a particular version (a commit in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being noticed. The structure is similar to a Merkle tree, but with added data at the nodes and leaves.[26] (Mercurial and Monotone also have this property.)
Toolkit-based design
Git was designed as a set of programs written in C and several shell scripts that provide wrappers around those programs.[27] Although most of those scripts have since been rewritten in C for speed and portability, the design remains, and it is easy to chain the components together.[28]
Pluggable merge strategies
As part of its toolkit design, Git has a well-defined model of an incomplete merge, and it has multiple algorithms for completing it, culminating in telling the user that it is unable to complete the merge automatically and that manual editing is needed.[29]
Garbage accumulates until collected
Aborting operations or backing out changes will leave useless dangling objects in the database. These are generally a small fraction of the continuously growing history of wanted objects. Git will automatically perform garbage collection when enough loose objects have been created in the repository. Garbage collection can be called explicitly using git gc.[30]
Periodic explicit object packing
Git stores each newly created object as a separate file. Although individually compressed, this takes a great deal of space and is inefficient. This is solved by the use of packs that store a large number of objects delta-compressed among themselves in one file (or network byte stream) called a packfile. Packs are compressed using the heuristic that files with the same name are probably similar, without depending on this for correctness. A corresponding index file is created for each packfile, telling the offset of each object in the packfile. Newly created objects (with newly added history) are still stored as single objects, and periodic repacking is needed to maintain space efficiency. The process of packing the repository can be very computationally costly. By allowing objects to exist in the repository in a loose but quickly generated format, Git allows the costly pack operation to be deferred until later, when time matters less, e.g., the end of a workday. Git does periodic repacking automatically, but manual repacking is also possible with the git gc command. For data integrity, both the packfile and its index have an SHA-1 checksum inside, and the file name of the packfile also contains an SHA-1 checksum. To check the integrity of a repository, run the git fsck command.[31]

Another property of Git is that it snapshots directory trees of files. The earliest systems for tracking versions of source code, Source Code Control System (SCCS) and Revision Control System (RCS), worked on individual files and emphasized the space savings to be gained from interleaved deltas (SCCS) or delta encoding (RCS) the (mostly similar) versions. Later revision-control systems maintained this notion of a file having an identity across multiple revisions of a project. However, Torvalds rejected this concept.[32] Consequently, Git does not explicitly record file revision relationships at any level below the source-code tree.

These implicit revision relationships have some significant consequences:

  • It is slightly more costly to examine the change history of one file than the whole project.[33] To obtain a history of changes affecting a given file, Git must walk the global history and then determine whether each change modified that file. This method of examining history does, however, let Git produce with equal efficiency a single history showing the changes to an arbitrary set of files. For example, a subdirectory of the source tree plus an associated global header file is a very common case.
  • Renames are handled implicitly rather than explicitly. A common complaint with CVS is that it uses the name of a file to identify its revision history, so moving or renaming a file is not possible without either interrupting its history or renaming the history and thereby making the history inaccurate. Most post-CVS revision-control systems solve this by giving a file a unique long-lived name (analogous to an inode number) that survives renaming. Git does not record such an identifier, and this is claimed as an advantage.[34][35] Source code files are sometimes split or merged, or simply renamed,[36] and recording this as a simple rename would freeze an inaccurate description of what happened in the (immutable) history. Git addresses the issue by detecting renames while browsing the history of snapshots rather than recording it when making the snapshot.[37] (Briefly, given a file in revision N, a file of the same name in revision N − 1 is its default ancestor. However, when there is no like-named file in revision N − 1, Git searches for a file that existed only in revision N − 1 and is very similar to the new file.) However, it does require more CPU-intensive work every time the history is reviewed, and several options to adjust the heuristics are available. This mechanism does not always work; sometimes a file that is renamed with changes in the same commit is read as a deletion of the old file and the creation of a new file. Developers can work around this limitation by committing the rename and the changes separately.

Git implements several merging strategies; a non-default strategy can be selected at merge time:[38]

  • resolve: the traditional three-way merge algorithm.
  • recursive: This is the default when pulling or merging one branch, and is a variant of the three-way merge algorithm. Template:Quote
  • octopus: This is the default when merging more than two heads.

Data structures[edit]

Git's primitives are not inherently a source-code management system. Torvalds explains:[39]

From this initial design approach, Git has developed the full set of features expected of a traditional SCM,[40] with features mostly being created as needed, then refined and extended over time.


Git has two data structures: a mutable index (also called stage or cache) that caches information about the working directory and the next revision to be committed; and an immutable, append-only object database.

The index serves as a connection point between the object database and the working tree.

The object database contains five types of objects:[41][31]

  • A blob (binary large object) is the content of a file. Blobs have no proper file name, time stamps, or other metadata (A blob's name internally is a hash of its content.). In git each blob is a version of a file, it holds the file's data.
  • A tree object is the equivalent of a directory. It contains a list of file names, each with some type bits and a reference to a blob or tree object that is that file, symbolic link, or directory's contents. These objects are a snapshot of the source tree. (In whole, this comprises a Merkle tree, meaning that only a single hash for the root tree is sufficient and actually used in commits to precisely pinpoint to the exact state of whole tree structures of any number of sub-directories and files.)
  • A commit object links tree objects together into history. It contains the name of a tree object (of the top-level source directory), a timestamp, a log message, and the names of zero or more parent commit objects.
  • A tag object is a container that contains a reference to another object and can hold added meta-data related to another object. Most commonly, it is used to store a digital signature of a commit object corresponding to a particular release of the data being tracked by Git.
  • A packfile object is a zlib version compressed of various other objects for compactness and ease of transport over network protocols.

Each object is identified by a SHA-1 hash of its contents. Git computes the hash and uses this value for the object's name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.

Git stores each revision of a file as a unique blob. The relationships between the blobs can be found through examining the tree and commit objects. Newly added objects are stored in their entirety using zlib compression. This can consume a large amount of disk space quickly, so objects can be combined into packs, which use delta compression to save space, storing blobs as their changes relative to other blobs.

Additionally, git stores labels called refs (short for references) to indicate the locations of various commits. They are stored in the reference database and are respectively:[42]

  • Heads (branches): Named references that are advanced automatically to the new commit when a commit is made on top of them.
  • HEAD: A reserved head that will be compared against the working tree to create a commit.
  • Tags: Like branch references but fixed to a particular commit. Used to label important points in history.

References[edit]

Every object in the Git database that is not referred to may be cleaned up by using a garbage collection command or automatically. An object may be referenced by another object or an explicit reference. Git knows different types of references. The commands to create, move, and delete references vary. "git show-ref" lists all references. Some types are:

  • heads: refers to an object locally,
  • remotes: refers to an object which exists in a remote repository,
  • stash: refers to an object not yet committed,
  • meta: e.g. a configuration in a bare repository, user rights; the refs/meta/config namespace was introduced retrospectively, gets used by Gerrit,[43]
  • tags: see above.

Implementations[edit]

Git is primarily developed on Linux, although it also supports most major operating systems, including BSD, Solaris, macOS, and Windows.[44]

The first Windows port of Git was primarily a Linux-emulation framework that hosts the Linux version. Installing Git under Windows creates a similarly named Program Files directory containing the Mingw-w64 port of the GNU Compiler Collection, Perl 5, MSYS2 (itself a fork of Cygwin, a Unix-like emulation environment for Windows) and various other Windows ports or emulations of Linux utilities and libraries. Currently, native Windows builds of Git are distributed as 32- and 64-bit installers.[45] The git official website currently maintains a build of Git for Windows, still using the MSYS2 environment.[46]

The JGit implementation of Git is a pure Java software library, designed to be embedded in any Java application. JGit is used in the Gerrit code-review tool, and in EGit, a Git client for the Eclipse IDE.[47]

Go-git is an open-source implementation of Git written in pure Go.[48] It is currently used for backing projects as a SQL interface for Git code repositories[49] and providing encryption for Git.[50]

The Dulwich implementation of Git is a pure Python software component for Python 2.7, 3.4 and 3.5[51]

The libgit2 implementation of Git is an ANSI C software library with no other dependencies, which can be built on multiple platforms, including Windows, Linux, macOS, and BSD.[52] It has bindings for many programming languages, including Ruby, Python, and Haskell.[53][54][55]

JS-Git is a JavaScript implementation of a subset of Git.[56]

Git server[edit]

As Git is a distributed version-control system, it could be used as a server out of the box. It's shipped with a built-in command git daemon which starts a simple TCP server running on the GIT protocol.[57] Dedicated Git HTTP servers help (amongst other features) by adding access control, displaying the contents of a Git repository via the web interfaces, and managing multiple repositories. Already existing Git repositories can be cloned and shared to be used by others as a centralized repo. It can also be accessed via remote shell just by having the Git software installed and allowing a user to log in.[58] Git servers typically listen on TCP port 9418.[59]

Open source[edit]

  • Hosting the Git server using the Git Binary.[60]
  • Gerrit, a git server configurable to support code reviews and providing access via ssh, an integrated Apache MINA or OpenSSH, or an integrated Jetty web server. Gerrit provides integration for LDAP, Active Directory, OpenID, OAuth, Kerberos/GSSAPI, X509 https client certificates. With Gerrit 3.0 all configurations will be stored as git repositories, no database required to run. Gerrit has a pull-request feature implemented in its core but lacks a GUI for it.
  • Phabricator, a spin-off from Facebook. As Facebook primarily uses Mercurial, the git support is not as prominent.[61]
  • RhodeCode Community Edition (CE), supporting git, Mercurial and Subversion with an AGPLv3 license.
  • Kallithea, supporting both git and Mercurial, developed in Python with GPL license.
  • External projects like gitolite,[62] which provide scripts on top of git software to provide fine-grained access control.
  • There are several other FLOSS solutions for self-hosting, including Gogs[63] and Gitea, a fork of Gogs, both developed in Go language with MIT license.

Git server as a service[edit]

There are many offerings of Git repositories as a service. The most popular are GitHub, SourceForge, Bitbucket and GitLab.[64][65][66][67][68]

Adoption[edit]

The Eclipse Foundation reported in its annual community survey that as of May 2014, Git is now the most widely used source-code management tool, with 42.9% of professional software developers reporting that they use Git as their primary source-control system[69] compared with 36.3% in 2013, 32% in 2012; or for Git responses excluding use of GitHub: 33.3% in 2014, 30.3% in 2013, 27.6% in 2012 and 12.8% in 2011.[70] Open-source directory Black Duck Open Hub reports a similar uptake among open-source projects.[71]

Version control systems used by responding developers:

The UK IT jobs website itjobswatch.co.uk reports that as of late September 2016, 29.27% of UK permanent software development job openings have cited Git,[72] ahead of 12.17% for Microsoft Team Foundation Server,[73] 10.60% for Subversion,[74] 1.30% for Mercurial,[75] and 0.48% for Visual SourceSafe.[76]

Extensions[edit]

There are many Git extensions, like Git LFS, which started as an extension to Git in the GitHub community and is now widely used by other repositories. Extensions are usually independently developed and maintained by different people, but at some point in the future a widely used extension can be merged to Git.

Other open-source git extensions include:

Microsoft developed the Virtual File System for Git (VFS for Git; formerly Git Virtual File System or GVFS) extension to handle the size of the Windows source-code tree as part of their 2017 migration from Perforce. VFS for Git allows cloned repositories to use placeholders whose contents are downloaded only once a file is accessed.[77]

Conventions[edit]

Git does not impose many restrictions on how it should be used, however some conventions are adopted in order to organize histories, especially those which require the cooperation of many contributors.

  • The master branch is created by default with git init and is often used as the branch that other changes are merged into.[78] Correspondingly the default name of the upstream remote is origin and so the name of the default remote branch is origin/master. Many Git users prefer alternatives to master as the name of the default branch due to its negative connotations.[79] From 2020 onwards, new GitHub repositories name the default branch main.[80]
  • Pushed commits should not be overwritten, but should rather be reverted[81] (a commit is made on top which reverses the changes to an earlier commit), unless they contained sensitive information which should not remain in the history. This prevents shared new commits based on shared commits from being invalid because the commit on which they are based does not exist in the remote.
  • The git-flow[82] workflow and naming conventions are often adopted to distinguish feature specific unstable histories (feature/*), unstable shared histories (develop), production ready histories (master), and emergency patches to released products (hotfix).
  • Pull requests are not a feature of git, but are commonly provided by git cloud services. A pull request is a request by one user to merge a branch of their repository fork into another repository sharing the same history (called the upstream remote).[83] The underlying function of a pull request is no different than that of an administrator of a repository pulling changes from another remote (the repository that is the source of the pull request); however the pull request itself is a ticket managed by the hosting server which initiates a scripts to perform these actions, it is not a feature of git SCM.

Security[edit]

Git does not provide access-control mechanisms, but was designed for operation with other tools that specialize in access control.[84]

On 17 December 2014, an exploit was found affecting the Windows and macOS versions of the Git client. An attacker could perform arbitrary code execution on a target computer with Git installed by creating a malicious Git tree (directory) named .git (a directory in Git repositories that stores all the data of the repository) in a different case (such as .GIT or .Git, needed because Git does not allow the all-lowercase version of .git to be created manually) with malicious files in the .git/hooks subdirectory (a folder with executable files that Git runs) on a repository that the attacker made or on a repository that the attacker can modify. If a Windows or Mac user pulls (downloads) a version of the repository with the malicious directory, then switches to that directory, the .git directory will be overwritten (due to the case-insensitive trait of the Windows and Mac filesystems) and the malicious executable files in .git/hooks may be run, which results in the attacker's commands being executed. An attacker could also modify the .git/config configuration file, which allows the attacker to create malicious Git aliases (aliases for Git commands or external commands) or modify extant aliases to execute malicious commands when run. The vulnerability was patched in version 2.2.1 of Git, released on 17 December 2014, and announced the next day.[85][86]

Git version 2.6.1, released on 29 September 2015, contained a patch for a security vulnerability (Template:CVE)[87] that allowed arbitrary code execution.[88] The vulnerability was exploitable if an attacker could convince a victim to clone a specific URL, as the arbitrary commands were embedded in the URL itself.[89] An attacker could use the exploit via a man-in-the-middle attack if the connection was unencrypted,[89] as they could redirect the user to a URL of their choice. Recursive clones were also vulnerable, since they allowed the controller of a repository to specify arbitrary URLs via the gitmodules file.[89]

Git uses SHA-1 hashes internally. Linus Torvalds has responded that the hash was mostly to guard against accidental corruption, and the security a cryptographically secure hash gives was just an accidental side effect, with the main security being signing elsewhere.[90][91]

See also[edit]

____
  1. http://gdt.oqlf.gouv.qc.ca/ficheOqlf.aspx?Id_Fiche=8371027#eng.
  2. Template:Cite web
  3. Template:Cite book
  4. Template:Cite mailing list "So I'm writing some scripts to try to track things a whole lot faster."
  5. 5.0 5.1 Template:Cite mailing list
  6. 6.0 6.1 6.2 6.3 Template:Cite video
  7. Template:Cite book
  8. Template:Cite book
  9. Template:Cite news
  10. BitKeeper and Linux: The end of the road? |linux.com Template:Webarchive
  11. Template:Cite news
  12. 12.0 12.1 Template:Cite mailing list
  13. Template:Cite mailing list
  14. Template:Cite mailing list
  15. Template:Cite mailing list
  16. Template:Cite mailing list
  17. Template:Cite mailing list
  18. Template:Cite mailing list
  19. https://github.com/git/git/releases
  20. Template:Cite web
  21. Template:Cite web
  22. Template:Cite web
  23. Template:Cite mailing list
  24. Jst's Blog on Mozillazine Template:Cite web
  25. Template:Cite web, observing that "git log" is 100x faster than "svn log" because the latter must contact a remote server.
  26. Template:Cite web
  27. Template:Cite mailing list, describing Git's script-oriented design
  28. Template:Cite web, praising Git's scriptability.
  29. Template:Cite web
  30. Template:Cite web
  31. 31.0 31.1 Template:Cite web
  32. Template:Cite mailing list
  33. Template:Cite mailing list
  34. Template:Cite mailing list
  35. Template:Cite mailing list
  36. Template:Cite mailing list
  37. Template:Cite mailing list, on using git-blame to show code moved between source files.
  38. Template:Cite web
  39. Template:Cite mailing list
  40. Cite error: Invalid <ref> tag; no text was provided for refs named bare_url
  41. Template:Cite web
  42. Template:Cite web
  43. Template:Cite web
  44. Template:Cite web
  45. Template:Cite web
  46. Template:Cite web (source code)
  47. Template:Cite web
  48. Template:Cite web
  49. Template:Citation
  50. Template:Cite web
  51. Template:Cite web
  52. Template:Cite web
  53. Template:Cite web
  54. Template:Cite web
  55. Template:Cite web
  56. Template:Cite web
  57. Template:Cite web
  58. 4.4 Git on the Server – Setting Up the Server Template:Webarchive, Pro Git.
  59. Template:Cite web
  60. https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-the-Server
  61. Diffusion User Guide: Repository Hosting.
  62. https://gitolite.com/gitolite/index.html
  63. https://gogs.io/
  64. Template:Cite web
  65. Template:Cite web
  66. Template:Cite web
  67. Template:Cite web
  68. Template:Cite web
  69. Template:Cite web
  70. Template:Cite web
  71. Template:Cite web
  72. Template:Cite web
  73. Template:Cite web
  74. Template:Cite web
  75. Template:Cite web
  76. Template:Cite web
  77. Template:Cite web
  78. Template:Cite web
  79. Template:Cite web
  80. Template:Citation
  81. Template:Cite web
  82. Template:Cite web
  83. Template:Cite web
  84. Template:Cite web
  85. Template:Cite web
  86. Template:Cite newsgroup
  87. Template:Cite web
  88. Template:Cite web
  89. 89.0 89.1 89.2 Template:Cite web
  90. Template:Cite web
  91. Template:Cite web


Git Hello World
[edit]

Hello World[edit]

The Hello World project is a time-honored tradition in computer programming. It is a simple exercise that gets you started when learning something new. Let’s get started with GitHub!

You’ll learn how to:

  • Create and use a repository
  • Start and manage a new branch
  • Make changes to a file and push them to GitHub as commits
  • Open and merge a pull request

What is GitHub?[edit]

GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.

This tutorial teaches you GitHub essentials like repositories, branches, commits, and Pull Requests. You’ll create your own Hello World repository and learn GitHub’s Pull Request workflow, a popular way to create and review code.

No coding necessary[edit]

To complete this tutorial, you need a GitHub.com account and Internet access. You don’t need to know how to code, use the command line, or install Git (the version control software GitHub is built on).

Tip: Open this guide in a separate browser window (or tab) so you can see it while you complete the steps in the tutorial.

Step 1. Create a Repository[edit]

A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.

Your hello-world repository can be a place where you store ideas, resources, or even share and discuss things with others.

To create a new repository

1. In the upper right corner, next to your avatar or identicon, click and then select New repository.

2. Name your repository hello-world.

3. Write a short description.

4. Select Initialize this repository with a README.

Click Create repository.

Step 2. Create a Branch[edit]

Branching is the way to work on different versions of a repository at one time.

By default your repository has one branch named main which is considered to be the definitive branch. We use branches to experiment and make edits before committing them to main.

When you create a branch off the main branch, you’re making a copy, or snapshot, of main as it was at that point in time. If someone else made changes to the main branch while you were working on your branch, you could pull in those updates.

This diagram shows:

  • The main branch
  • A new branch called feature (because we’re doing ‘feature work’ on this branch)
  • The journey that feature takes before it’s merged into main

Have you ever saved different versions of a file? Something like:

  • story.txt
  • story-joe-edit.txt
  • story-joe-edit-reviewed.txt

Branches accomplish similar goals in GitHub repositories.

Here at GitHub, our developers, writers, and designers use branches for keeping bug fixes and feature work separate from our main (production) branch. When a change is ready, they merge their branch into main.

To create a new branch[edit]

1. Go to your new repository hello-world.

2. Click the drop down at the top of the file list that says branch: main.

3. Type a branch name, readme-edits, into the new branch text box.

4. Select the blue Create branch box or hit “Enter” on your keyboard.

Now you have two branches, main and readme-edits. They look exactly the same, but not for long! Next we’ll add our changes to the new branch.

Step 3. Make and commit changes[edit]

Bravo! Now, you’re on the code view for your readme-edits branch, which is a copy of main. Let’s make some edits.

On GitHub, saved changes are called commits. Each commit has an associated commit message, which is a description explaining why a particular change was made. Commit messages capture the history of your changes, so other contributors can understand what you’ve done and why.

Make and commit changes[edit]

1. Click the README.md file.

2. Click the pencil icon in the upper right corner of the file view to edit.

3. In the editor, write a bit about yourself.

4. Write a commit message that describes your changes.

5. Click Commit changes button.

These changes will be made to just the README file on your readme-edits branch, so now this branch contains content that’s different from main.

Step 4. Open a Pull Request[edit]

Nice edits! Now that you have changes in a branch off of main, you can open a pull request.

Pull Requests are the heart of collaboration on GitHub. When you open a pull request, you’re proposing your changes and requesting that someone review and pull in your contribution and merge them into their branch. Pull requests show diffs, or differences, of the content from both branches. The changes, additions, and subtractions are shown in green and red.

As soon as you make a commit, you can open a pull request and start a discussion, even before the code is finished.

By using GitHub’s @mention system in your pull request message, you can ask for feedback from specific people or teams, whether they’re down the hall or 10 time zones away.

You can even open pull requests in your own repository and merge them yourself. It’s a great way to learn the GitHub flow before working on larger projects.

Open a Pull Request for changes to the README[edit]

Click on the image for a larger version

Step Screenshot

Click the Pull Request tab, then from the Pull Request page, click the green New pull request button.

In the Example Comparisons box, select the branch you made, readme-edits, to compare with main (the original).

Look over your changes in the diffs on the Compare page, make sure they’re what you want to submit.

When you’re satisfied that these are the changes you want to submit, click the big green Create Pull Request button.

Give your pull request a title and write a brief description of your changes.

When you’re done with your message, click Create pull request!

________________________________________

Tip: You can use emoji and drag and drop images and gifs onto comments and Pull Requests.

Step 5. Merge your Pull Request[edit]

In this final step, it’s time to bring your changes together – merging your readme-edits branch into the main branch.

1. Click the green Merge pull request button to merge the changes into main.

2. Click Confirm merge.

3. Go ahead and delete the branch, since its changes have been incorporated, with the Delete branch button in the purple box.

Celebrate![edit]

By completing this tutorial, you’ve learned to create a project and make a pull request on GitHub!

Here’s what you accomplished in this tutorial:

  • Created an open source repository
  • Started and managed a new branch
  • Changed a file and committed those changes to GitHub
  • Opened and merged a Pull Request

Take a look at your GitHub profile and you’ll see your new contribution squares!

To learn more about the power of Pull Requests, we recommend reading the GitHub flow Guide. You might also visit GitHub Explore and get involved in an Open Source project.