Eye Tracking with ARKit (iOS)- Part I

3 min readJan 8, 2024

Exploring ARKit, ARFaceAnchor, BlendShapeLocations, and LookAtPoint

Eye tracking technology can be a game-changer, unlocking new dimensions of user interaction and immersion. an AR experience that responds to the movement of your eyes, providing a level of engagement and personalization previously unattainable. Imagine scrolling Instagram or X with eyebrow movement.

In this article, we’ll work on eye tracking with a specific focus on implementing it using ARKit, Apple’s powerful augmented reality framework. The code and resources you need are conveniently hosted in the GitHub repository.

Demo : Eye Tracking with ARKit (iOS)

Eye tracking utilizes the front camera to monitor and track the gaze and movements of a user’s eyes. This technology enables AR applications to precisely discern the user’s point of focus, facilitating dynamic adjustments in the presentation of digital content based on their gaze direction.

ARKit & ARFaceAnchor

ARKit is Apple’s augmented reality (AR) framework that empowers developers to integrate AR experiences seamlessly into iOS applications. One crucial aspect of ARKit is its ability to track and analyze facial expressions using the ARFaceAnchor class.


ARFaceAnchor is a class in ARKit that represents the face geometry tracked in an AR session. It provides information about the user’s facial features, such as the position and orientation of the face, as well as details about facial expressions through the use of blend shapes.


Blend Shape Locations:

Blend shapes, in the context of ARFaceAnchor, refer to specific facial muscle movements or expressions that ARKit can identify and track. BlendShapeLocation is an enumeration in ARKit that defines various regions of the face, each corresponding to a specific facial feature or expression.

Some common BlendShapeLocations include:

  • .mouthSmileLeft and .mouthSmileRight: Indicates a smile on the left or right side of the mouth.
  • .eyeBlinkLeft and .eyeBlinkRight: Represents a blink in the left or right eye.
  • .browDownLeft and .browDownRight: Signifies a downward movement of the left or right eyebrow.
  • .mouthPucker: Indicates a puckering or tightening of the mouth.

Use Case: Interactive Gaming

In gaming applications, facial expressions can be harnessed as input for character control or to trigger in-game events. Imagine building a game similar to Flappy Bird, but with a unique twist — this time, the bird takes flight with a simple blink of the eye. By utilizing blend shapes and ARFaceAnchor in ARKit, developers can introduce an entirely new level of interactivity, making gaming experiences more immersive and responsive to the player’s facial expressions.

Extending this concept to other applications, similar functionality can be implemented in social media apps, allowing users to scroll through content without physically touching the screen. This innovative use of facial expressions can redefine how users interact with digital environments, making the experience more intuitive and engaging.


The lookAtPoint property in ARFaceAnchor plays a crucial role in eye tracking by providing information about the point in 3D space where the user is looking. When a user's gaze is directed at a specific point, this property holds the coordinates of that point in the AR world.

How it works:

  1. Gaze Tracking: ARKit meticulously tracks the position of a user’s eyes by utilizing information from leftEyeTransform and rightEyeTransform.
  2. 3D Coordinate Calculation: Leveraging the detected eye transforms, ARKit performs calculations to pinpoint the user’s current gaze in the 3D space. This calculated point is then represented by the lookAtPoint.
  3. Real-time Updates: The lookAtPoint property stays dynamically updated in real-time, continuously reflecting changes in the user’s gaze. This provides developers with up-to-the-moment information about the focal point in the AR environment.

The lookAtPoint property is a valuable tool for developers implementing eye tracking features. It provides a dynamic and accurate representation of a user's gaze point in 3D space, opening up possibilities for creating immersive and responsive AR applications.