The pan and tilting works just like the Street View in Google Maps with the accelerometer turned on. That feature sold me on my first Smartphone! It was like standing in the middle of the street and turning around and seeing everything live!
As to the number of pictures used for the stitching, beyond the minimum necessary to
exactly cover the grid of checkerboard, the additional photos are
only to help with the stitching process (
not to improve the image quality), and an overlap of 10-25% is all that is necessary, not 50-100%. So 24 photos are all that are
ever needed, whether shot as jpegs, or extracted frames from the 4K video, paused at the correct intervals to add up to a grid of 360° across and 120° tall. The gimbal elevation in the app has 10° hatch marks to break the 120° up into 3 layers at 0°, 60°, and 120°, with the red hatch being 90°. The 360° field of view rotation has to be done manually, using the direction of red arrow over the map in the map view, but since the lens has a 94° FOV, anything from a 45° to 75° interval should work. The easiest
to keep track of would be a 45° rotation each time, consisting of 8 shots total, so, 0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°, where you have front, right, back, left, and half way between each. 3 x 8 = 24 frames to stitch together with considerable overlap. 1-2 second pause at each location, to get sharp video frames, and the photographic/video process should only take 2 minutes tops. The rest is extraction of the 24 frames, and the stitching. Ready, set, go!
This is the manual process that your SDK would automate, but it's pretty simple to do it manually!