AR gestures with ARKit & RealityKit: AR with iOS (Part-IV)

Shiru99
5 min readApr 29, 2023

AR gestures

AR gestures are interactions and movements made by users in an Augmented Reality (AR) environment to manipulate AR objects or interact with AR content.

In the context of AR with iOS, AR gestures can be used to enable users to interact with virtual objects overlaid on the real world through intuitive gestures.

Some common AR gestures

  1. Tap Gesture: Users can tap on virtual objects to select or interact with them. For example, tapping on a virtual button to trigger an action, or tapping on a virtual object to rotate or scale it.
  2. Pan Gesture: Users can use a pan gesture to move virtual objects in the AR environment. For example, dragging a virtual object to reposition it in the real world.
  3. Pinch Gesture: Users can use a pinch gesture to scale virtual objects. For example, pinching on a virtual object to resize it or zoom in/out.
  4. Rotation Gesture: Users can use a rotation gesture to rotate virtual objects. For example, rotating a virtual object with two fingers to change its orientation or alignment.
  5. Long-Press Gesture: Users can use a long-press gesture to trigger actions or interactions with virtual objects. For example, long-pressing on a virtual button to reveal additional options or information.
  6. Swipe Gesture: Users can use swipe gestures to trigger specific actions or interactions with virtual objects. For example, swiping left or right on a virtual object to get different colour variant of virtual object.

The implementation of AR gestures may require handling touch events, gesture recognizers, or other APIs provided by ARKit and RealityKit frameworks

AR gestures on ARViewContainer

import SwiftUI
import RealityKit

struct ContentView : View {

var body: some View {
ARViewContainer()
.gesture(TapGesture().onEnded {
// Handle tap gesture here
print("Tap gesture detected")
})
.gesture(DragGesture().onChanged { value in
// Handle drag gesture here
print("Drag gesture detected: \(value.translation)")
})
.gesture(MagnificationGesture().onChanged { value in
// Handle pinch gesture here
print("Pinch gesture detected: \(value.magnitude)")
})
.gesture(RotationGesture().onChanged { value in
// Handle rotation gesture here
print("Rotation gesture detected: \(value.degrees)")
})
}
}

struct ARViewContainer: UIViewRepresentable {

func makeUIView(context: Context) -> ARView {
let arView = ARView(frame: .zero)

// Load the model from the app's asset catalog
let modelEntity = try! ModelEntity.load(named: "toy_car.usdz")

// Create an anchor entity and add the model to it
let anchorEntity = AnchorEntity()
anchorEntity.addChild(modelEntity)

// Set the position of the anchor entity to 1 meter in front of the camera
anchorEntity.position = [0, 0, -1]

// Add the anchor entity to the scene
arView.scene.addAnchor(anchorEntity)

return arView
}

func updateUIView(_ uiView: ARView, context: Context) { }
}

Code Explanation

The code provided demonstrates how to implement AR gestures (tap, drag, pinch, and rotation) on an AR view in SwiftUI using RealityKit.

  1. ARViewContainer is a UIViewRepresentable that creates an ARView and loads a 3D model (in this case, "toy_car.usdz") from the app's asset catalog. It creates an anchor entity, adds the model as its child, sets the position of the anchor entity to 1 meter in front of the camera, and adds the anchor entity to the AR scene.
  2. In ContentView, an instance of ARViewContainer is added as a subview. Then, gesture recognizers (TapGesture, DragGesture, MagnificationGesture, and RotationGesture) are attached to the AR view using SwiftUI's .gesture() modifier.
  3. The .onEnded and .onChanged closures within each gesture recognizer handle the corresponding gestures. For example, when a tap gesture is detected, it prints a message "Tap gesture detected". When a drag gesture is detected, it prints the translation of the drag. When a pinch gesture is detected, it prints the magnitude of the pinch. When a rotation gesture is detected, it prints the degrees of rotation.

Issue with gestures on ARViewContainer

.gesture(DragGesture().onChanged { value in
// Update the position of the modelEntity based on the translation value
modelEntity.position += SIMD3<Float>(value.translation.width, value.translation.height, 0)
})

This code adds a DragGesture to an ARViewContainer screen in a SwiftUI app. The position of a modelEntity is updated based on the horizontal and vertical translation values of the drag gesture. The modelEntity.position is updated by adding the translation values to its current position using the += operator, resulting in interactive manipulation of the model's position in the scene based on the user's pan gesture.

The issue is that if you drag on any part of the ARViewContainer where the modelEntity is not even available, the modelEntity will be updated, and the position of modelEntity will change accordingly.

What will happen if we get multiple modelEntities placed in ARView? gestures on modelEntity

Gestures on modelEntity

This allows the user to interact with the model by performing gestures such as rotation, translation (pan), and scaling (pinch) using touch or other input methods.

import SwiftUI
import RealityKit


struct ContentView : View {

var body: some View {
ARViewContainer()
}
}

struct ARViewContainer: UIViewRepresentable {

func makeUIView(context: Context) -> ARView {
let arView = ARView(frame: .zero)

// Load the model from the app's asset catalog
let modelEntity = try! ModelEntity.load(named: "toy_car.usdz")

// Create an anchor entity and add the model to it
let anchorEntity = AnchorEntity()
anchorEntity.addChild(modelEntity)

// Set the position of the anchor entity to 1 meter in front of the camera
anchorEntity.position = [0, 0, -1]

// Add the anchor entity to the scene
arView.scene.addAnchor(anchorEntity)

// Install gestures on the modelEntity
arView.installGestures([.rotation, .translation, .scale], for: modelEntity as! HasCollision)

return arView
}

func updateUIView(_ uiView: ARView, context: Context) { }
}

Overall, this code sets up an AR scene with a 3D model of a toy car and enables user interaction with the model through gestures in the ARView.

With Gestures on modelEntity, even if you place multiple 3D models, the user interaction with the models will be smooth.

Custom Gestures within CustomARView

[Note: same can be achieved with AR gestures on ARViewContainer]

import SwiftUI
import RealityKit

struct ContentView : View {

var body: some View {
CustomARViewContainer()
}
}

struct CustomARViewContainer: UIViewRepresentable {

func makeUIView(context: Context) -> CustomARView {
return CustomARView()
}

func updateUIView(_ uiView: CustomARView, context: Context) { }
}


class CustomARView: ARView {

init() {
super.init(frame: .zero)

let modelEntity = try! ModelEntity.loadModel(named: "toy_car.usdz")

let anchorEntity = AnchorEntity()
anchorEntity.addChild(modelEntity)

anchorEntity.position = [0, 0, -1]

self.scene.addAnchor(anchorEntity)

self.installGestures([.rotation, .translation, .scale], for: modelEntity as! HasCollision)

let longpress = UILongPressGestureRecognizer(target: self, action: #selector(self.handleLongPress(_:)))
self.addGestureRecognizer(longpress)
}


@objc func handleLongPress(_ recognizer: UITapGestureRecognizer? = nil) {

guard let touchInView = recognizer?.location(in: self) else {
return
}

guard let modelEntity = self.entity(at: touchInView) as? ModelEntity else {
print("modelEntity not found")
return
}

print("Long press detected on - \(modelEntity.name)")

}
}

Explanation : In this code, the long press gesture is set up to recognize when the user long presses on a 3D model loaded into the ARView. When a long press is detected, the code uses the entity(at: ) method to get the 3D entity (ModelEntity) at the location where the user long pressed on the screen. If the entity exists, the code prints out the name of the 3D model that was long pressed.

This code is useful when you want to perform an action when the user interacts with a 3D object in your AR app. For example, you could use this code to detect when the user selects a 3D object and display additional information or trigger an animation.

--

--