Happy New Year! This is Ryan with our first development update for 2020. It’s great to get back to work on the FLIP Fluids project.
The main theme for this week was catching up on the support requests that have piled up over the past two weeks. Thank you all for your patience! We’ve managed to get back to all support requests by the end of the week.
Not much happened in terms of code development last week. Just customer support, testing, and an in depth look at how animated obstacles are computed within the simulator.
Ring Toss!
An experiment while testing the FLIP Fluids viscosity solver. In this animation, a rotating ring object emits a high viscosity fluid towards a pole.
[gfycat data_id="decisivefastcolt"]
The ring tossing object is a FLIP Fluid Inflow object, which normally continuously emits fluid. To get this object to emit single rings, I used a sine wave modifier in the graph editor to turn the inflow Enabled option on/off every few frames.
How are solid obstacles computed within the simulator?
We received an interesting scene troubleshooting question in this Blender Artists forum post by lumpyoatmeal. The question involves a scene that contains 60 individually animated cylinders.
A question about many obstacles. In the attached image you can see that I’m making a big cylinder from many smaller cylinders. 60 to be exact. The big cylinder is rotating, by having the small cylinders using a Follow Path constraint on a circular path. This is taking a very long time to simulate/bake for each frame. Is the slowness entirely due to the large number of obstacles or is it also due to having Export Animated Mesh turned on for each small cylinder?
Blender Artists user: lumpyoatmeal
Why is this taking so long to compute? The timing stats in the FLIP Fluids Stats panel showed that the Simulation Objects update category was taking a disproportionately long time to compute. Here is an in depth explanation of how solid obstacles are treated inside the simulator to troubleshoot the issue.
TL;DR: The simulator chooses between two different ways to multithread the calculations for handling obstacle objects. For this scene, the simulator chose a less than ideal scheme.
The slowdown will be caused by the large number of objects rather than if the objects have the Export Animated Mesh option enabled. The animated mesh option will only affect the time it takes to export the mesh from Blender for use in the fluid engine. From the point of view of the simulator, it only sees a sequence of meshes and does not know whether the object was keyframe exported or animated exported.
How are animated/dynamic obstacles handled in the simulator?
For obstacles to interact with the fluid and to be used in the physics calculations, their geometry needs to be converted to another form. The objects are converted to volumetric data called a Signed Distance Field (SDF for short). A SDF is basically a grid of data that tells us whether a point is inside or outside of the object volume, and also stores the distance to the nearest point on the volume surface. You may have heard about SDFs as one of the data structures in OpenVDB.
Updating the SDF can be quite an expensive operation within the engine. Every time an object moves in the simulator, the SDF must be updated. Even during substeps, we need to update the SDF. The meshes are interpolated within the frame between substeps. So if you have 38 substeps in a frame, the SDF will be updated 38 times if there are moving objects and all that time really adds up.
Note: this scene involved an extreme number of substeps, requiring 38 simulation steps per frame. Something not very typical for a FLIP simulation.
Since the SDF update is so expensive, we have some optimizations to reduce the amount of data that needs to be updated. If objects are not moving, they only need to be computed once. The engine uses separate grids to store the static obstacles and dynamic obstacles, and also a master grid that merges and stores all of the SDF information. So if you have a few static obstacles and a few dynamic obstacles, the static and dynamic data will be stored on the separate static and dynamic grids. This avoids needing to re-compute the static data.
Animated obstacles can also start/stop their motion. Another optimization is that the engine automatically detects if a dynamic object is actually moving/changing during the frame. If a dynamic object is static during part of the animation, it will be moved and stored on the static grid until it starts moving and needs to be updated again. This is another way that we avoid re-computing SDF.
Additional optimizations for dynamic objects
And then there are more optimizations within the simulator specifically for handling dynamic objects. Due to the previous optimizations of separating static/dynamic objects, the majority of the time spent updating the SDF concerns only the dynamic objects in the simulation. There is some further optimization for how we multithread dynamic SDF calculations.
Inside the engine, the meshes may not be represented exactly as they are in the Blender scene. The engine actually separates all of the mesh geometry into separate pieces if it can (we’ll call each these mesh islands). If the geometry of a Blender object can be separated into individual pieces, the engine will do this and treat each piece/island as a separate object. For example, if you have a single Blender object that is made up of 60 cylinders, the engine will separate this object into 60 mesh islands that will be each handled separately in the engine. This optimization is to create smaller object volumes when possible and smaller volumes are quicker to compute than a single large volume containing many pieces.
How are the SDF calculations for dynamic mesh islands multithreaded?
There are two different multithreading schemes in the engine for how all of the mesh islands are converted into an SDF. For the rest of the explanations, I’ll just assume there are 8 threads available.
Scheme 1: Calculate each mesh island one by one and assign 8 threads to work on each island
Here is some general pseudocode for how this works:
for each mesh island:
1. Create a tight fitting grid around the island.
2. Divide the grid into subgrids (typically smaller chunks
of 8 x 8 x 8 voxels).
3. Put each subgrid into a queue for processing
4. Launch 8 threads. each thread will:
4.1. pull a subgrid off the queue
4.2. calculate the SDF for this small region
4.3. merge the subgrid SDF into the mesh island grid
4.4. repeat until processing queue is empty
5. Finally, merge the mesh island grid with the dynamic SDF grid
Pros:
Memory efficient. Memory only needs to be reserved to compute a single mesh island at a time.
Fast for large obstacles that cover a lot of grid space
Cons:
Can be slow for small obstacles that do not cover much grid space. For example, for thin narrow cylinders, having many threads work on a single object can be overkill and also add a lot overhead
This multithreading scheme is what is being used in your scene and is the reason for why the SDF update is running slowly.
Scheme 2: Calculate 8 mesh islands at a time and assign 1 thread to work on each island
Here is some general pseudocode for how this works:
1. Put each mesh island into a queue for processing
2. Launch 8 threads. Each thread will:
2.1. pull a mesh island off the queue
2.2. create a tight fitting grid around the island
2.3. calculate the SDF for the entire island grid in the single thread
2.4. merge the mesh island grid with the dynamic SDF grid
2.5. repeat until the processing queue is empty
Pros:
Very fast if the mesh islands are small and do not cover much grid space
Cons:
Uses more memory than scheme 1 since now 8 mesh island grids need to be reserved at once. If all mesh islands are large and cover a lot of grid space, this scheme has the potential to use a very large amount of memory. If all mesh islands are small however, the increase in memory can be negligible.
If there are not a large number of objects and the objects vary in size, there may be wasted computing power. For example if there are 7 small objects and 1 large object, the 7 threads will complete quickly and the simulator will be waiting on a single thread to complete the single large object.
When is each multithreading scheme used in the simulator?
In general, scheme 1 is used most of the time due to the assumption that most use cases for the simulator would involve a small number of obstacles that are on the larger side.
Scheme 2 is mostly used for simulations that involve the Blender Fracture Modifier branch. Early on in development we decided that we wanted to support the Fracture Modifier within the FLIP Fluids simulator. Fracture modifier simulations could contain hundreds or thousands of small obstacle pieces, which would be too slow to process under scheme 1. Scheme 2 was designed around adding fracture modifier support.
There is one thing to know about the fracture modifier that determines how the scheme 2 optimization is triggered: Fracture modifier simulations are entirely contained in a single Blender object. Hundreds of small individually moving pieces are contained in the object. This fact is what determines whether to use Scheme 1 or 2.
Whether to use scheme 1 or 2 depends on the number of mesh islands contained in a single Blender object. The engine is not limited to use a single scheme in an SDF update and will choose which to use on a per-Blender-object basis.
Specifics:
Scheme 1 – Will be used if the number of mesh islands in a single Blender object is less than or equal to 25.
Scheme 2 – Will be used if the number of mesh islands in a single Blender object is greater than 25.
Since each of the 60 cylinders are separate Blender objects in your scene, scheme 1 will be chosen by the engine. Scheme 2 would actually be optimal for this scene.
Testing and timing results
I have recreated a scene with a similar setup that you had described. Since this optimization testing mainly concerns the SDF update, I removed surface tension and viscosity features to leave out irrelevant computations. The timing to update the SDF was measured for single steps over a few frames on an Intel i7-7700 @3.60 GHz CPU.
Scheme 1: 48 seconds per substep Scheme 2: 8 seconds per substep
This shows that scheme 2 would be a much better choice for multithreading this type of scene.
Forcing your scene to use scheme 2
A way to force the simulator to choose scheme 2 would be to merge all of the cylinder objects into a single Blender object. As far as I know there is not a simple way to do this in Blender when each mesh island needs to move independently of each other. Scripting is the only way I could think of doing this (See the original post for the script).
How can we optimize the SDF update in the future to better handle this type of scene?
Smarter triggers for choosing between scheme 1 and 2. Maybe base this choice on grid coverage of mesh islands?
Maybe we could use a mixure of scheme 1 and 2. For example, compute 4 mesh islands at a time and assign 2 threads to each mesh island.
Does the SDF really need to be updated on every substep? Maybe we can get away with updating the SDF less often when there are many substeps during a frame?
Do we need to calculate the entire SDF grid? Maybe we could only compute parts of the SDF that are in close proximity of the fluid?
Add options to select an object primitive rather than a general mesh. The SDF is expensive to compute because it handles general meshes of any shape. Adding primitives with known shapes are very quick to compute (Cubes, Spheres, Cylinders, Cones, etc.).
There are a lot of ways we can make the simulator faster. Optimization is one of my favourite parts of development. It just takes time to develop! Some optimizations can take a lot longer to develop than others.
Weekly Development Notes #3 – How are solid obstacles computed within the simulator?
Covering the week of January 6th – 10th, 2020.
Happy New Year! This is Ryan with our first development update for 2020. It’s great to get back to work on the FLIP Fluids project.
The main theme for this week was catching up on the support requests that have piled up over the past two weeks. Thank you all for your patience! We’ve managed to get back to all support requests by the end of the week.
Not much happened in terms of code development last week. Just customer support, testing, and an in depth look at how animated obstacles are computed within the simulator.
Ring Toss!
An experiment while testing the FLIP Fluids viscosity solver. In this animation, a rotating ring object emits a high viscosity fluid towards a pole.
[gfycat data_id="decisivefastcolt"]The ring tossing object is a FLIP Fluid Inflow object, which normally continuously emits fluid. To get this object to emit single rings, I used a sine wave modifier in the graph editor to turn the inflow Enabled option on/off every few frames.
How are solid obstacles computed within the simulator?
We received an interesting scene troubleshooting question in this Blender Artists forum post by lumpyoatmeal. The question involves a scene that contains 60 individually animated cylinders.
Why is this taking so long to compute? The timing stats in the FLIP Fluids Stats panel showed that the Simulation Objects update category was taking a disproportionately long time to compute. Here is an in depth explanation of how solid obstacles are treated inside the simulator to troubleshoot the issue.
TL;DR: The simulator chooses between two different ways to multithread the calculations for handling obstacle objects. For this scene, the simulator chose a less than ideal scheme.
The slowdown will be caused by the large number of objects rather than if the objects have the Export Animated Mesh option enabled. The animated mesh option will only affect the time it takes to export the mesh from Blender for use in the fluid engine. From the point of view of the simulator, it only sees a sequence of meshes and does not know whether the object was keyframe exported or animated exported.
How are animated/dynamic obstacles handled in the simulator?
For obstacles to interact with the fluid and to be used in the physics calculations, their geometry needs to be converted to another form. The objects are converted to volumetric data called a Signed Distance Field (SDF for short). A SDF is basically a grid of data that tells us whether a point is inside or outside of the object volume, and also stores the distance to the nearest point on the volume surface. You may have heard about SDFs as one of the data structures in OpenVDB.
Updating the SDF can be quite an expensive operation within the engine. Every time an object moves in the simulator, the SDF must be updated. Even during substeps, we need to update the SDF. The meshes are interpolated within the frame between substeps. So if you have 38 substeps in a frame, the SDF will be updated 38 times if there are moving objects and all that time really adds up.
Note: this scene involved an extreme number of substeps, requiring 38 simulation steps per frame. Something not very typical for a FLIP simulation.
Since the SDF update is so expensive, we have some optimizations to reduce the amount of data that needs to be updated. If objects are not moving, they only need to be computed once. The engine uses separate grids to store the static obstacles and dynamic obstacles, and also a master grid that merges and stores all of the SDF information. So if you have a few static obstacles and a few dynamic obstacles, the static and dynamic data will be stored on the separate static and dynamic grids. This avoids needing to re-compute the static data.
Animated obstacles can also start/stop their motion. Another optimization is that the engine automatically detects if a dynamic object is actually moving/changing during the frame. If a dynamic object is static during part of the animation, it will be moved and stored on the static grid until it starts moving and needs to be updated again. This is another way that we avoid re-computing SDF.
Additional optimizations for dynamic objects
And then there are more optimizations within the simulator specifically for handling dynamic objects. Due to the previous optimizations of separating static/dynamic objects, the majority of the time spent updating the SDF concerns only the dynamic objects in the simulation. There is some further optimization for how we multithread dynamic SDF calculations.
Inside the engine, the meshes may not be represented exactly as they are in the Blender scene. The engine actually separates all of the mesh geometry into separate pieces if it can (we’ll call each these mesh islands). If the geometry of a Blender object can be separated into individual pieces, the engine will do this and treat each piece/island as a separate object. For example, if you have a single Blender object that is made up of 60 cylinders, the engine will separate this object into 60 mesh islands that will be each handled separately in the engine. This optimization is to create smaller object volumes when possible and smaller volumes are quicker to compute than a single large volume containing many pieces.
How are the SDF calculations for dynamic mesh islands multithreaded?
There are two different multithreading schemes in the engine for how all of the mesh islands are converted into an SDF. For the rest of the explanations, I’ll just assume there are 8 threads available.
Scheme 1: Calculate each mesh island one by one and assign 8 threads to work on each island
Here is some general pseudocode for how this works:
Pros:
Cons:
This multithreading scheme is what is being used in your scene and is the reason for why the SDF update is running slowly.
Scheme 2: Calculate 8 mesh islands at a time and assign 1 thread to work on each island
Here is some general pseudocode for how this works:
Pros:
Cons:
When is each multithreading scheme used in the simulator?
In general, scheme 1 is used most of the time due to the assumption that most use cases for the simulator would involve a small number of obstacles that are on the larger side.
Scheme 2 is mostly used for simulations that involve the Blender Fracture Modifier branch. Early on in development we decided that we wanted to support the Fracture Modifier within the FLIP Fluids simulator. Fracture modifier simulations could contain hundreds or thousands of small obstacle pieces, which would be too slow to process under scheme 1. Scheme 2 was designed around adding fracture modifier support.
There is one thing to know about the fracture modifier that determines how the scheme 2 optimization is triggered: Fracture modifier simulations are entirely contained in a single Blender object. Hundreds of small individually moving pieces are contained in the object. This fact is what determines whether to use Scheme 1 or 2.
Whether to use scheme 1 or 2 depends on the number of mesh islands contained in a single Blender object. The engine is not limited to use a single scheme in an SDF update and will choose which to use on a per-Blender-object basis.
Specifics:
Since each of the 60 cylinders are separate Blender objects in your scene, scheme 1 will be chosen by the engine. Scheme 2 would actually be optimal for this scene.
Testing and timing results
I have recreated a scene with a similar setup that you had described. Since this optimization testing mainly concerns the SDF update, I removed surface tension and viscosity features to leave out irrelevant computations. The timing to update the SDF was measured for single steps over a few frames on an Intel i7-7700 @3.60 GHz CPU.
Scheme 1: 48 seconds per substep
Scheme 2: 8 seconds per substep
This shows that scheme 2 would be a much better choice for multithreading this type of scene.
Forcing your scene to use scheme 2
A way to force the simulator to choose scheme 2 would be to merge all of the cylinder objects into a single Blender object. As far as I know there is not a simple way to do this in Blender when each mesh island needs to move independently of each other. Scripting is the only way I could think of doing this (See the original post for the script).
How can we optimize the SDF update in the future to better handle this type of scene?
There are a lot of ways we can make the simulator faster. Optimization is one of my favourite parts of development. It just takes time to develop! Some optimizations can take a lot longer to develop than others.