Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Repository/mirrors question



On 2022-01-23 at 15:42 -0600, Jim wrote:
> Aren't forks just copies of the git repository, once forked then the fork
> does
> their own thing. That wouldn't change the repository that the fork was taken
> from, would it?  'man git' hasn't any references to forks. But it wouldn't
> be
> the first time I was wrong. What would concern me more is the object count
> on gitlab's mirror.

I'm going to answer the question and explain a few ways this can happen,
but the last section below has the actual explanation of what's
happening here: GitLab has one dev branch behind the others.  Whoever
owns that copy should take a look at why that branch wedged.

---

So a fork is not integral to git, it's a feature of the various git
forges.  There have been a number of interesting stories over the years
about the fact that many of the forges did not isolate the
content-addressed namespaces (the bits where you go in by object ID),
leading to people faking up things to "appear" to be in the original
repository.

Thus if B forks from A and then adds commits and objects, going via any
tree reference or commit in the hosted A won't show those items, and
cloning _shouldn't_, normally, but you have been able to use the direct
SHA1 of the objects to still reach them via the hosted A.  I _think_
GitHub put in some mitigations because they got tired of the games
people were playing.

That said, often a pull-request, and its objects, _are_ available in the
original repository, but not in the normal heads.  So a regular `git
clone` wont see them, but if you do a `git clone --mirror` then you
might see more.  Eg, given:

[remote "origin"]
   url = https://github.com/a/b
   fetch = +refs/heads/*:refs/remotes/origin/*
   fetch = +refs/pull/*/head:refs/remotes/pr/*

then GitHub's extra "pull" area becomes visible under a pseudo-remote of
"pr", letting you diff locally and do fancier stuff.

Since `--mirror` will set up `fetch = +refs/*:refs/*` you would then see
all of the pull-requests from all of the forks show up within the
repository.

---

The next major difference is in history and what happens as branches get
abandoned.  `git prune` honors presence in the "reflog" for a certain
amount of time, so that you can go back to recently-deleted items.

---

And then there's the fact that git transfers "packs" and then the loose
objects, and if the pack contains extra items since deleted, then you'll
end up with a bunch of now-garbage items in the pack.  "git help repack"
might be your friend for your own repos, but doesn't help with
controlling what the forges are doing.

---

In some fresh clones,

         ReachableObjects  CommitGraphCount
GitHub              98615             14158
GitLab              98501             14132
SF                  98615             14158

So GitLab sees 26 fewer commits in the entire graph than the other two.
But is up-to-date.  So, we check which branches exist and what they
point at, and find one branch discrepancy.

The `declarednull` branch is zsh-5.8-401-gf85cb4504 on GH and SF, but
zsh-5.8-270-g6bcd04997 on GitLab.

So that branch is 131 commits behind, but given the difference in graph
commits, I suspect that there has been some merging along the way.

You can get a robust API and see where the commits are for each remote
using `git ls-remote --heads` for any given remote; that output will be
diffable.  I cheated and used a porcelain command, which is not
guaranteed stable for scripting:

  git branch -r | grep -v /HEAD | while read B; do
    printf '%30s\t' $B
    git describe --tags $B
  done

which had the advantage of letting me see where the commits were
relative to tags, so we can see the "401 commits since 5.8" vs the
"270 commits since 5.8".

The last commit on branch origin/declarednull in the gitlab repo is from
2020-11-28, the last on the other two is from 2021-04-13.

-Phil




Messages sorted by: Reverse Date, Date, Thread, Author