Sharing published files via a cloud storage solution

philip.scadding · May 1, 2020, 8:59am

[This post is part of a series of articles about working from home in Shotgun.]

This topic aims to tackle how you could upload and download published files so that they can be shared with your remote based team.
This post won’t include the actual code needed to upload and download them from a cloud storage provider as each provider has it’s own API’s, but it will provide a basic framework as a starting point and show you where you would implement such changes in the Toolkit code.

The idea of this example is that when a user publishes files, it will publish them, as usual, creating a PublishedFile entity in Shotgun, but also upload the files to cloud storage provider such as Google Drive, OneDrive, or Dropbox.
Then when another user wants to use those publishes, they would use the loader app to download them.

Before we start a couple of things to note are:

Often cloud storage providers have a folder sync feature where a folder on your local drive is continually monitored and synced to the remote storage. Whilst this can work for some people, it can also lead to corruption of data, and so I would advise against using this. Our suggested approach is more of a push pull as required method and no files would be overwritten.
Your Shotgun site is not a good place to store large files such as Maya scene files, we don’t offer that service, you should use another provider for your remote file storage.

Please note that the code provided in this example is not tested in production and is not guaranteed to work, it is intended as an example that can be built upon.

Setup Steps

Add the tk-framework-remotestorageexample repo to your config. You could use a git descriptor and point directly at our repo, but I would suggest forking it and distributing it your self, as we are not going to maintain/support it (it’s an example!), and if we do update it, we won’t necessarily try to maintain backwards compatibility.
A Github release descriptor might work well for this purpose.

frameworks_yml_and_garden_show1574×926 246 KB
Add the example post_phase.py tk-multi-publish2 hook to your config. This hook will run at the end of your publish and will use the framework to upload the files to the remote storage. Copy the hook to your config’s hook folder and update the publish settings to use it. In my screen grab, I’ve just implemented it in Maya, but you should set it on all environments you wish to use it on.

tk-multi-publish2_yml_and_garden_show1730×1500 682 KB
Copy the example tk-maya_loader.py hook over to your config and set the Maya loader settings to use it. This hook will then download the files when the user chooses to reference or import a scene. The example makes use of hook inheritance to allow it to only implement the required changes and leave the rest to the base hook. Also, the logic here can, with a little bit of work, easily be applied to other software loader hooks, not just Maya.

tk-multi-loader2_yml_and_garden_show2108×1630 765 KB
Copy the example_local_provider.py tk-framework-remotestorageexample hook over to your config, and configure the framework to use it. Note this example hook does not upload a file to any remote storage, but instead copies it to a folder called mock_remote_storage in the $HOME directory. It is here as a proof of concept and you would want to modify it to use your cloud storage provider’s API for uploading and downloading files.

frameworks_yml_and_garden_show2102×1012 370 KB

Now it should be ready to use. Publishing will cause the PublishedFiles to be copied to the mock folder in a flat structure prefixed with the PublishedFile's id. Loading will copy the files back to the location they were originally published to.

Let us know if you spot any issues or have any suggestions!

Dependencies

It should also be noted that this code doesn’t take into account dependencies.
The best thing to do would be to make sure that everything is published (and therefore uploaded) and that you are tracking the dependencies/connections between the published scene files and everything it depends on in Shotgun. You could then update the framework’s hook to check in Shotgun if the PublishedFile has any dependencies and then also download those as well (if they didn’t exist locally).

Ahuge · May 1, 2020, 6:13pm

Super cool!

Are there hooks in the multi-workfiles2 app that we should be hooking up with this as well?

philip.scadding · May 4, 2020, 8:37am

The workfiles app is trickier.

Saving and uploading at the save time shouldn’t be a problem, you would just implement the use of the framework in the scene_opereation.py hook, however you wouldn’t be able to prefix the files with an PublishedFile ID, so you might perhaps choose to prefix them with the Task, Step or Asset/Shot ID.

It’s the downloading that is problematic. The reason being is that the loader app asks Shotgun for a list of files that could be imported, and doesn’t actually check to see if they exist locally until the user chooses to load one. Whereas the workfiles app actually uses the template to scan the disk for files that can be loaded, and that behavior is not covered by a hook. So the files would have to exist locally before opening the workfiles app.

You could, in theory, have a custom app that the user would run to sync the files, or perhaps you could choose to run a sync on engine start.
That could be very slow though, as you might not know the context yet, and so there would be no way to filter down the files you need to download. To get around this, you could check what files exist in the remote but not download them and create placeholder files on disk, then the workfiles app would find them and display them in the UI, and when the user selects one, you could then check to see if it was a placeholder in the scene_operation hook and download it if it was.

Diego_Garcia_Huerta · May 17, 2020, 10:51pm

I think tk-multi-workfiles2 have these hooks that might become handy for the work you are trying to do:

filter_work_files

filter_publishes

that initially are intended to filter some of the publishes or local workfiles given some conditions and not show them in the tk-multi-workfiles2 UI. This method provides a dictionary with information about the existing work files/publishes available., but I believe nothing stops you adding your custom logic to add extra information of the files you are trying to sync.

Note that downloading the files in this hook would make the tool very slow, as the artist has not chosen yet what to load in their DCC app. So, while this trick allows them to ‘see’ what is available and click ‘Open’, I would put the actual logic to download in the scene_operation.py hook, when they are trying to open a file that does not exist on disk but you have a record for it in the cloud.

philip.scadding · May 18, 2020, 8:46am

Oh interesting idea, I hadn’t thought of doing that. I’ve not tested, but if it didn’t find any files does that hook get called I wonder? If the hook does get called even with an empty list then yeah that approach could work, you could gather a list of names from the remote, add them to the list and then download them in the scene operation hook.

philip.scadding · May 18, 2020, 8:48am

Unrelatedly to the workfiles question, I thought it was worth calling out this post here, as @reikje was asking how you could distribute the 3rd party APIs with the config:

Diego_Garcia_Huerta · May 18, 2020, 12:18pm

@philip.scadding, I think it does get called even if there are zero workfiles found:

github.com

shotgunsoftware/tk-multi-workfiles2/blob/3b19b6edb614c3b63da3ee7e38cfe96ebdc7bd0f/python/tk_multi_workfiles/file_finder.py#L593


          about the filtered path.
"""
# Build list of work files to send to the filter_work_files hook.
# TODO: the hook expects details in the dictionary but we are just
# populationg the path value, so either the hook documentation should be
# changed or additional details should be added here.
hook_work_files = [{"work_file": {"path": path}} for path in work_file_paths]


# Execute the hook - this will return a list of filtered paths:
hook_result = self._app.execute_hook(
    "hook_filter_work_files", work_files=hook_work_files
)
if not isinstance(hook_result, list):
    self._app.log_error(
        "hook_filter_work_files returned an unexpected result type '%s' - ignoring..."
        % type(hook_result).__name__
    )
    hook_result = []


# split back out work files:
work_files = []

I’ve actually used this approach in windows to add more information to the work files, for some reason the file owner never get’s populated (at least in Windows) and find that artist’s get very confused on who did what, so I simply extract the owner of the file and use the hook to fill in the details.

Ahuge · May 25, 2020, 10:17pm

Hey Thanks @Diego_Garcia_Huerta!

That approach is a good one! I plan on trying to get this implemented, I’ll report back once I have something more complete!

Rhea_Fischer · May 26, 2020, 8:39pm

@philip.scadding hope this inspires file enumeration hooks and a new “known but not local” status in the file models!

tannaz · May 28, 2020, 1:02am

Hi guys –

I wanted to chime in here with some more info, especially around @Rhea_Fischer’s request for decoupling ‘known’ from ‘local’ in a workfile’s status. Right now, there’s no tracking of workfiles in Shotgun – essentially what we know about them is that their file path/name matches a template, we can deduce context from there, and that’s it. There’s currently no way to store metadata about work files.

Having said that, we did a proof of concept a while back for a client that implemented a Workfile entity in Shotgun and modified the Workfiles app to support it. It’s not on our roadmap to formally release this currently; it would require quite a bit of testing, plus the way it’s implemented is not the most efficient – it’s currently doing a Shotgun query per context/user sandbox, which could be optimized.

Having said that, here is the code in case it might prove helpful. Most of the implementation is in the linked workfiles_management.py hook, and of course you could extend the logic there, perhaps in conjunction with adding some custom fields to the Workfile entity. You’d also have to fork the app itself in order for it be aware of this hook.

In addition to all this, I’ve shared this conversation with the product team as a feature request.

Hopefully that’s somewhat helpful!

Ahuge · May 28, 2020, 2:21am

Hey thanks Tannaz! That’s cool I think I may have seen that in action.

Is there a reason that a separate entity was decided on instead of a custom PublishedFileType?

jfboismenu · May 28, 2020, 12:36pm

Hi @Ahuge,
Having this data tracked as a published file type would have probably needed some form of filtering in the loader and breakdown app to avoid showing/using work files instead of just publishes. Therefore it was designed as a separate entity for ease of implementation. Remember, this was a proof of concept and not meant to be necessarily the final solution.
JF

Ahuge · May 28, 2020, 3:08pm

Yeah that makes total sense!

Ahuge · May 28, 2020, 8:23pm

I think what I am going to try is combining @Diego_Garcia_Huerta’s filter hook with some of the example logic found in @tannaz’s workfile_management!

Thanks All!

Ahuge · June 2, 2020, 8:47pm

So after too long of tracing the workfiles app I found something important in relation to @Diego_Garcia_Huerta’s suggestion

So it appears that the FileModel has been hardcoded to use the AsyncFileFinder and in the Async version of the FileFinder, we are first checking to see if there are any workfile before we call our filter hook…

github.com

shotgunsoftware/tk-multi-workfiles2/blob/3b19b6edb614c3b63da3ee7e38cfe96ebdc7bd0f/python/tk_multi_workfiles/file_finder.py#L1144


            environment.context,
            environment.work_template,
            environment.version_compare_ignore_fields,
        )
    return {"work_files": work_files}

def _task_filter_work_files(self, work_files, environment, **kwargs):
    """
    """
    filtered_work_files = []
    if work_files:
        filtered_work_files = self._filter_work_files(
            work_files, environment.valid_file_extensions
        )
    return {"work_files": filtered_work_files}

def _task_process_work_items(self, work_files, environment, name_map, **kwargs):
    """
    """
    work_items = {}
    if (

It appears it was made an AsyncFileFinder when the workfiles app moved to v2 back in 2015. So this has been the way it works for quite some time.

I think I might make a PR to make the `_task_filter_work_files` use `_filter_work_files` like the non async version does (in `find_files`) and not first check if workfiles is empty.

Is this a change that you think wouldn't cause any issues if it was merged in?

Ahuge · June 2, 2020, 11:21pm

Ahh another thing that I ran into, the Context that your app is in isn’t the same as whatever you’ve selected in the UI, I might need to find a better way to get the context of whichever workspace we currently care about.

My current workflow is going to be something similar to the following at the start of my filter hook. It’s fairly hacky…but it feels like the easiest way to handle this for me currently

import inspect


def _find_var_from_all_frames_back(variable_name):
    stack = inspect.stack()
    for frame_info in stack:
        frame = frame_info[0]
        if variable_name in frame.f_locals:
            return frame.f_locals.get(variable_name)
        elif variable_name in frame.f_globals:
            return frame.f_globals.get(variable_name)
    raise RuntimeError("Could not find {} in any previous frame!".format(variable_name))


def get_var_from_x_frames_back(variable_name, stack_count=None):
    if stack_count is None:
        return _find_var_from_all_frames_back(variable_name)
    # Get the environment
    stack = inspect.stack()
    frame = stack[stack_count+1][0]
    if variable_name in frame.f_locals:
        return frame.f_locals.get(variable_name)
    elif variable_name in frame.f_globals:
        return frame.f_globals.get(variable_name)
    raise RuntimeError("Could not find {} in the frame {} back!".format(variable_name, stack_count))


class FilterWorkFiles(HookClass):
    def execute(self, work_files, **kwargs):
        environment = get_var_from_x_frames_back(variable_name="environment")
        context = environment.context

Rhea_Fischer · June 4, 2020, 11:11pm

@tannaz thank you for sharing the prototype code.

There is a fine point about implementing ‘known’ vs ‘local’ in that its really about how the toolkit’s file enumeration methods interact with a remote storage model. That doesn’t exist now, but importantly if it did, it would not strictly require that toolkit track workfiles, only that the enumeration methods “look in the remote storage” as well as local.

tannaz · June 5, 2020, 2:49am

Hi all –

A few updates here:

@Ahuge, your suggestion makes sense and we’d appreciate a PR. Thanks! It’d be good to include an equivalent fix for filter_publishes in addition to filter_workfiles, otherwise it’s just a half-fix. As for your context hack, as an alternative, you could add a context parameter to the filter_workfiles hook so that it can take environment.context. Of course, you’d still need to fork tk-multi-workfiles2 – you’d need to modify the app to pass the new param to the hook – but it’d be a less “invasive” hack that what you’ve suggested.

And @Rhea_Fischer – you’re right that you could onstensibly check local and remote storage for a given workfile, but I wonder how that would affect performance. Also in a lot of cases, you can’t access remote storage in a standard way: transfer/download processes are outside of the scope of Toolkit’s functionality, so I’d guess that that code would have to be custom. Of course, we could always offer a hook and have you fill in that logic… Is that what you had in mind?

Rhea_Fischer · June 5, 2020, 3:33pm

@tannaz exactly, a hook for file enumeration. Another thing to point out is that Shotgun TK has a filesystem abstraction model, but its not applied consistently in the codebase. If it did, and we had a place to put a factory function, that’s where we could specialize new types of “storage” entities which have a relationship to cloud storage. This is how I’m organizing code on our project.

Thank you kindly for continued attention here.

tannaz · June 26, 2020, 11:36pm

Hi all – Just wanted to let you know that I’ve shared this conversation with the product team. There are some really great suggestions here, and I can’t promise that they’ll be implemented in the very near future, but obviously remote workflows are front-of-mind right now, so I hope this conversation will help shape future development.