Distributed git configs - why does it clone the repo multiple times?

tom-fw · April 15, 2020, 5:50pm

I’m (finally!) looking into distributed configs using the git method/sgtk descriptor. I’ve got it working, bit I’m just wondering why it appears that when launching shotgun desktop into a config that uses a distributed config it’s cloning down the repo mulitple times?

I’ve attached a part of the log from the console that shows the git clone command being run 4 times? (see line 8, 19, 122, 133).

We’re trying to optimise our configs to have as few calls across the network as possible, as our IT team say that the number of small files contained within our centralised configs is slowing down the entire system. While the git config will have far fewer files that an fully centralised one, if it’s doing multiple calls to a network location where the git remote repo is then I’m worried that performance may not be that much improved anyway.

Any info you have on creating the most optimal setup to reduce network traffic would be greatly appreciatedgitConfig_mutliCloneExample.log (14.2 KB)

Thanks

Tom F-W

philip.scadding · April 16, 2020, 8:32am

Hi Tom!

I’m not actually totally sure without digging in deep, but I think another engineer might know more I’ll try and run it past them.

That said I have a theory (that might be wrong), I wonder if this is happening because the descriptor is not actually pointing at a specific commit so can’t know if the currently cached bundle is up to date?
Your descriptor is:

sgtk:descriptor:git_branch?path=//...path.../sq_shot_task_base_config.git&branch=master

It’s only pointing at master, so I would guess it would have to pull it down every time as it wouldn’t know which commit it was targeting?
I’m actually surprised it works without specifying the commit/version as mentioned in the docs.
Best
Phil

tom-fw · April 16, 2020, 8:58am

Hey Phil,

Thanks for getting back so quickly, and for your suggestions

In this section of the docs it does mention that you can just specify the latest commit of a given branch. This is how I arrived and the string for the descriptor that I did.

Either way I would understand why it might need to clone down and then do a comparison with the local clone, but what I’m really asking about is why it does this clone process have to happen 4 times in the one process? I would have thought that cloning the repo once at the start would be sufficient?

It would be great to hear back from the engineer about this

In the mean time we can continue to test things out at our end, and maybe look at other ways that we can distribute the config as well

Thanks

Tom

philip.scadding · April 16, 2020, 9:03am

Ah, I’d missed that in the docs you are right! I’m still not sure on the expected behavior around that, if I am correct in my theory, it would seem we should clarify that in the docs.

As for why it would happen multiple times, it would happen for every time it needs to ensure that bundle is cached locally, though again without walking through code, I’m not sure why it would be doing that check twice in quick succession like that.
I’ll let you know when I know more.

philip.scadding · April 16, 2020, 11:55am

So my theory was correct, that not providing a commit id, means that it will always attempt to download it.

As for why it is happening twice, it’s not clear without investigating, but at a guess, we think it was probably written in a way where at any point where it requires the bundle, it will ensure it is local. Since usually, it would only download if it wasn’t found locally so it would typically be a very quick check, so not a problem.
I appreciate that doesn’t fully answer the question, and it may be possible that we could tidy this up but usually, it isn’t a problem, unless you’re not specifying a commit.

I would say the best thing to do it to point to specific releases or commits.

tom-fw · April 16, 2020, 1:45pm

Thanks for the extra info Phil.

I’ll try it with the version tags/specific commits and see how that goes.

Thanks for now

Tom

tom-fw · April 17, 2020, 1:29pm

Hey @philip.scadding,

I’ve been able to get things going with specifying the exact commit and the process is much cleaner it seems (ie. one clone of the repo only).

One thing I am wondering about though is: Is there any automated clean up of older config commits on the local disk? We release updates to our config around 5-10+ times a week, so with a fully copy of the config on a C: drive per commit, that could build up to be quite a lot of redundant data quickly.

We can obviously write in some of our own ways of trying to manage that, but I’m just wondering if there’s anything built into the SGTK distribution method for handling that cleanup?

Thanks

Tom

philip.scadding · April 17, 2020, 2:31pm

No there isn’t anything implemented by default, although it is something we have discussed in the past.
The trouble is that there isn’t a clear cut way of knowing if something isn’t used anymore. Since the bundle cache is global and not specific to one config/project, when in one configuration, we can’t know if there is another configuration that is using the older version.

However you could make assumptions on your end based on your knowledge of your setup, ie maybe you know it’s safe to get rid of anything later than the last three releases for example, or perhaps have some way of checking if release is still in use (that doesn’t involve bootstrapping all configs just to know what is in use ). For example potentially you could create a file in the config that gets touched everytime it is used, and then you could check to see if it’s not been used in the last 30 days and remove the bundle?

As for when you could perform a clean up, there is probably some core hook you could use, that runs everytime a config is bootstrapped?

Maybe the bootstrap.py core hook, but there are others there that might work as well.

tom-fw · April 17, 2020, 3:09pm

there isn’t anything implemented by default

Ok good to know.

Maybe the bootstrap.py core hook, but there are others there that might work as well.

Cool. Thanks for the pointers. I’ll have a look and see what we can do to manage this within the config itself Much appreciated!

Have a great weekend!
Tom

tannaz · June 23, 2020, 2:02am

Hey Tom – Just wanted to let you know that this issue cropped up again, as you can see in this post. I’ve put in an internal ticket to track the issue. I’ve linked this post to the internal ticket, so I’ll be notified when it’s resolved, and will notify you in turn. Thanks for pointing it out to us.