Distributed Config and Render Farm Setup

philip.scadding · January 21, 2020, 9:06am

Ah I knew there was a thread I hadn’t updated:

That said it doesn’t cover the farm specifically, it’s more generically about bootstrapping. So you may still have questions.

Where are you up to, and what is the high level approach/order of operations you are planning to use? Maybe I can help.

jeff · January 21, 2020, 5:50pm

I’m in the process of trying to get Shotgun working on the farm w/ Deadline. Right now I’ve got a bootstrap script that I cobbled together that would put SGTK on each render node at bootup, but from what I’ve read here it seems like it makes more sense to only do that once at job creation on a single node.

I don’t really want to hijack this thread, so you let me know what the best way to continue this is.

philip.scadding · January 22, 2020, 10:02am

Yeah, I might split this out, but for now, I think it’s fine. We can always move these messages over to a new thread later.

I guess my first question should have been, what is the reason you are bootstrapping on the farm?
Is it to:

Process Nuke or Houdini renders that contain a Shotgun write node/Mantra node/Alembic node?
Publish the complete renders to Shotgun, and maybe create a playable Version?
Run some other custom scripted process using the Toolkit API?

jeff · January 24, 2020, 10:37am

The main reason is to be able to process Nuke renders that contain a shotgun write, so that folder paths are respected - and ultimately it would be awful nice to be able to publish a movie file of the rendered image sequence at the end to Shotgun as well as the sequence itself.

philip.scadding · January 27, 2020, 10:22am

OK thanks! To start I would say there is no one way of doing this.

I can provide the way I personally think you should approach this, however, I have not actually implemented this myself self so there may be things I’ve not considered. @Patrick took a different approach to the way I’m going to suggest but he has actually implemented it, and it works for him. Though I think there are pros and cons to both of our approaches.

Rendering

So focusing first on the rendering the Shotgun write node aspect of it.

Our recommendation is to convert the Shotgun write node into a standard Nuke write node. The reason we recommend converting is mostly that if you bootstrap the engine on the farm, for each render node that picks up a job, then you run the risk of degrading your Shotgun Site performance due to the sheer number of simultaneous connections being made. On top of that, the bootstrap process will slow the render down.

You could bootstrap and counteract the issue somewhat by having each Nuke job dished out to a single node where it doesn’t restart the process between frames, but it would require diligence to avoid submitting jobs in a way that could cause an overload on your Shotgun site.

If you go down the converting root, there are two ways that I can see of doing it.

Convert the node in your Nuke script before submitting the job. This is by far the easiest thing to do, but the downside here is that your user will be left with a Nuke script with a converted node. You can potentially convert them back, and or reload your script to a Shotgun write node.
Submit the scene to the farm as is with the Shotgun write node. Have a pre-job that ran before the main render job, that would start Nuke, bootstrap Toolkit, convert the nodes, and then either save the script in a central location and modify the render job to use this as the source script. The render job would then launch as normal without Shotgun and render using the standard Nuke write node.

Addtional notes:
- When you submit the job from Nuke you would need to provide a few details to the job as environment variables, that could would then be used in the bootstrap process. You would need to provide the Task entity id, which would then provide to the bootstrap API.
- You may need to have a separate Toolkit environment for running on the farm, where you cut down the available Toolkit apps to just the non GUI ones. Try without doing this first though.
- You may need to handle the path cache syncing, when bootstrapping into a Task. Read just above this linked line.
- Consider setting the SHOTGUN_BUNDLE_CACHE_PATH environment variable on your farm machines to point to a folder on a central location, so that the farm nodes don’t need to each download all the dependencies.

Publishing

I have less experience here. The publishing should ideally run as a post-job after the main rendering, and if you can get away without needing to be in Nuke to publish then that would be easiest. Either way, you will need to bootstrap Toolkit.

You could use the publish API:

Or you could write your own publish script:

jeff · January 27, 2020, 11:46pm

Whoah, lots to unpack here!

Thanks for the very detailed info dump. I just finished this last show, so I’m on vacation for a week. I’ll dig into this when I get back and see what I can figure out!

Appreciate the tips @philip.scadding. If I end up getting this to work, I’m thinking I’ll do a write-up on it with code samples so that others can learn from what I figure out.

Also, if anyone else is working on this exact setup in a small vfx house - I’d be happy to collaborate.

matt_ce · February 4, 2020, 5:10am

@jeff we are in a similar situation and have a working setup here which we might be able to share eventually. The problem is there are so many little details to untangle.

As a general note, I can see I’m not the only person who’s had a problem with this side of SGTK. For a software package that’s intended for use by VFX/animation studios, support for anything headless or farm related is not great, and is complicated to set up.

It would be lovely if there was a cleaner and more generalised way of starting/bootstrapping software in SGTK that didn’t rely so much on copying environments around. All our environment/licensing setup is done in the shotgun pipeline configuration (before_app_launch.py) but it’s so heavily tied to the launcher app. Ideally it should be generalised so you could launch an app from the command line and have it run all your environment setup as part of the process.

A concrete example of this is that we have a script that uses Nuke to generate a Quicktime (with slate and burnin) on the farm, and publish a shotgun Version at the end. We spawn this farm job from an interactive Nuke session (by copying the environment etc from the local submitting Nuke session), but we would also like to be able to run this Quicktime generaton process from other apps, like Houdini for example. This is made much more difficult since we are not already in a Nuke environment and can’t simply copy the existing environment to the farm job.

In my perfect world there would be an API where all you need to do is something like:

import sgtk
entity = {“type”: “Project”, “id”: 123, “name”: “My Project”}
ctx = sgtk.context_from_entity_dictionary(entity)
sgtk.launch_software(‘Nuke 11.2 Batch’, args=[‘args’, ‘go’, ‘here’], context=ctx)

which could spawn off a subprocess, initialise the environment, and run the application that you already have set up in your SG ‘Software’ page, exactly as it would be run if you’d started it from Shotgun Desktop, and bootstrap it into the nominated context.

Then it might also be trivial to make little farm wrapper scripts where you could start software on the farm like:

/pipeline/sgtk_start_nuke.py /path/to/myrenderscript.nk
etc etc

Does that sound like it could be achievable at all? Or is it just a wistful fantasy?

cheers

philip.scadding · February 7, 2020, 10:03am

Hey @matt_ce thanks for your feedback, I’ve submitted an “idea” on your behalf on our roadmap page.

jeff · March 3, 2020, 8:36pm

Finally getting around to trying this setup and I’ve got my pre and post job tasks for Nuke somewhat figured out.

However, I’m running into an issue bootstrapping sgtk. I’ve downloaded the Git release like your docs show, then put it on a shared network drive, and put that into the bootstrap example code you’ve provided. But for some reason, I’m still not able to do a sys.path.insert(0, 'path') followed by an import sgtk.

import sys
import os

sys.path.append(os.path.abspath(r"T:\Python\tk-core-0.19.3"))

print(sys.path)

import sgtk

For example, the above should work based on the docs here and it doesn’t.

Is it possible that this is because I’m in Python 3?

Edit: tried in Python 2.7 as well and still can’t import that module as written. I think it’s because of the way that it’s being packaged.

Patrick · March 3, 2020, 10:58pm

you need to add the \python folder to your path :

eg

sys.path.append(os.path.abspath(r"T:\Python\tk-core-0.19.3\python"))

jeff · March 3, 2020, 11:24pm

Yep, that did it!

So it does come back to the fact that the sgtk python package is built in a non-standard way, so it doesn’t respond to normal python package import conventions.

@philip.scadding, you probably want to clarify this in the docs on this page - https://developer.shotgunsoftware.com/3d8cc69a/?title=Bootstrapping+and+running+an+app#downloading-a-standalone-toolkit-core-api

jeff · March 4, 2020, 3:25am

Got pretty far today, but I’ve got one outstanding issue that is oddly enough related to import nuke. Log from Deadline available here.

Here’s a semi- obfuscated version of my code, in case it’s immediately apparent where I went wrong.

@philip.scadding if we can get this figured out, I’m happy to let you put this code up on the new Shotgun docs site to help future users with a bit of a detailed walk-through.

I’m following your approach of boostrapping on the farm as part of the job startup for nuke jobs, converting nodes, and then ultimately saving.

Patrick · March 4, 2020, 11:55am

Correct me if I’m wrong but the jobpreload script is called prior to Nuke launching which would explain why you can’t import it at that time. If you want to bootstrap within the nuke session you need to add the bootstrap code to your init.py and add the path to that init.py to your NUKEPATH.

jeff · March 4, 2020, 8:23pm

You know, that would make a lot of sense Patrick. I

Looks like I will have to switch to doing it in a slightly different way.

In the meantime, I just enabled that switch show_convert_actions: True and we’re just doing it manually and submitting that way.

Patrick · March 4, 2020, 9:00pm

Can I ask what you’re attempting to use the nuke module for in the prejobload?
I see from your script that youre trying to do stuff with the tk-nuke engine and apps. You’ll need to do that from your nuke session. What you can do with your bootstrap in the prejobload is cache the pipeline config apps/enignes locally(there’s a few hoops to jump to do this, but it’s all there in the API docs, perhaps I need to do a blogpost if there’s interest), and in the case of nuke or maya or , you can add the path to those engines to the dcc’s NUKEPATH or equivilent. THEN when Nuke launches, Shotguns standard bootstrap process will kick in, and if you’ve set the necessary environment variables, you’ll have a valid SG session when nuke launches, that you can use with your loaded scene file. The env vars, from the top of my head, are; TANK_CONTEXT and TANK_ENGINE.

All of this comes with the BIG health-warning from SG that this approach can constitute somewhat of a denial-of-service attack on the SG servers… if you imagine these calls being triggered at the same time on a 1000 node renderfarm… nothing that can’t be resolved with a staggered jobstart, which you’d probably want to do in any case not to saturate the filesystem IO bus.

jeff · March 4, 2020, 11:04pm

I’m trying to follow Phillip’s recommended workflow for converting the Shotgun Write node as part of the job preload, but obviously running into some issues.

If you did a blog post, I’m sure it would end up helping a lot of people like myself.

jeff · March 6, 2020, 1:05am

I’ve switched to just converting the write nodes from Shotgun Write Nodes to Normal nuke write nodes, for now, as part of the submission to Deadline process and this is at least going to get us through this project (albeit with not publishing to Shotgun).

However I’ve now noticed that when the artists version up a shot, if they forget to also convert the write node back and forth in order to get the latest context - they end up writing out a new shot with the same version number as before because the write node is not longer dynamic.

What I’m trying to do is write a function called back_and_forth() that just calls the methods to go back and forth between the two node types to get the latest information and update the write node.

However, I’ve been getting this error which is rather strange:

    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "T:\Assets\Applications\Plugins\Nuke\Nuke Settings/Pipeline_Tools\sg_tools.py", line 20, in back_and_forth
        app.convert_from_write_nodes()
      File "C:\Users\Frame 48\AppData\Roaming\Shotgun\bundle_cache\app_store\tk-nuke-writenode\v1.4.2\app.py", line 266, in convert_from_write_nodes
        self.__write_node_handler.convert_nuke_to_sg_write_nodes()
      File "C:\Users\Frame 48\AppData\Roaming\Shotgun\bundle_cache\app_store\tk-nuke-writenode\v1.4.2\python\tk_nuke_writenode\handler.py", line 522, in convert_nuke_to_sg_write_nodes
        new_sg_wn = nuke.createNode(TankWriteNodeHandler.SG_WRITE_NODE_CLASS)
    RuntimeError: C:/Users/Frame 48/AppData/Roaming/Shotgun/bundle_cache/app_store/tk-nuke-writenode/v1.4.2/gizmos/WriteTank.gizmo:
    Missing end_group command(s)

I went ahead and checked the WriteTank.gizmo and sure enough it has the end_group in it that it is supposed to.

This is what that function looks like:

def back_and_forth():

    import sgtk
    eng = sgtk.platform.current_engine()
    app = eng.apps["tk-nuke-writenode"]

    print("Converting back to SG write nodes to get latest shotgun info")
    app.convert_from_write_nodes()

    time.sleep(3)

    print("Converting back to farm write node")
    app.convert_to_write_nodes()

Any thoughts?

Patrick · March 6, 2020, 9:33am

Why not convert back immediately after submitting the deadline job? You’ll need to make a copy of the converted nukescript, and submit that to deadline rather than the actual workfile.
Alternatively you need to version up after every render which isn’t ideal.

philip.scadding · March 6, 2020, 10:12am

@jeff good point, I think the docs could be clearer here, I just submitted an update for approval, it should be out soon, but @Patrick is quite right, you need to point to the python folder inside the core. I apologise this could have been clearer!

As for the preload, I was actually suggesting a pre job rather than a pre load, ie a job that would launch Nuke use the bootstrap API and convert the nodes, save the scene, and the actually render job, would be a dependency on this pre job.