2D-Geo-Clustering

Unsupervised 2D Geometric Clustering of Facial Features and Head Pose

This repository contains a complete, end-to-end implementation of a fully unsupervised pipeline that:

  1. Nose-aligns facial images to a dataset-wide average nose-tip coordinate.
  2. Extracts simple 2D geometric descriptors for four facial regions:
    • Lips (width-to-height ratio)
    • Eyes (aspect ratio)
    • Pupils (normalized 2D coordinates)
    • Head pose (eye-line displacement & orientation difference)
  3. Clusters each descriptor set using K-Means, selecting the number of clusters via the “elbow” method.
  4. Evaluates cluster compactness and separation with Silhouette, Calinski–Harabasz, and Davies–Bouldin indices.

The pipeline was developed and validated on a subset of AffectNet-YOLO (25 266 faces at 96×96 px) and cross-validated on FER (48×48 px). It yields semantically meaningful clusters (e.g. closed vs. open mouth, eye openness levels, five pupil‐gaze directions, three head orientations) without any labels or pretrained embeddings.


Table of Contents

  1. Repository Structure
  2. Prerequisites & Setup
  3. Data Preparation
  4. Pipeline Overview
    1. Stage 1: Nose-Tip Alignment
    2. Stage 2: Descriptor Extraction
    3. Stage 3: Unsupervised Clustering
  5. Scripts & Usage
  6. Output & Folder Structure
  7. Evaluation Metrics
  8. Future Work & Extensions
  9. References

Repository Structure

├── README.md
├── shape_predictor_68_face_landmarks.dat            # dlib pretrained model (not included here; download separately)
├── data/
│   ├── AFFECTNET-YOLO/                              # Unlabeled images used for training (96×96 px)
│   └── FER/                                         # (Optional) 48×48 px images for cross-dataset validation
├── outputs/                                         # All intermediate and final results will be saved here
│   ├── Alignment/                                   # Nose-aligned images
│   ├── HeadAngleDisp_Images/                        # Annotated images for angle & displacement
│   ├── Head_Clusters/                               # Head-pose clustering results (plots, CSVs, cluster folders)
│   ├── LipClustering1/                              # Lip ratio clustering (annotated images & plots)
│   ├── Eye_Emo/                                     # Eye ratio extraction & clustering
│   ├── Pupil_Annotated/                             # Pupil detection & heuristic labeling
│   └── Pupil_Clusters/                              # Pupil clustering results (plots, CSVs)
├── LipClusteringForSubfolderGreyImageSSHyper.py     # Lip ratio extraction + clustering script
├── EyeClusteringForSubfolderGreyImageSSHyper.py     # Eye ratio extraction + clustering script
├── PupilMoveClassification.py                       # Pupil localization & K-Means clustering
├── HeadNoseAlignment.py                             # Compute average nose tip → align images
├── HeadAngleDisp.py                                 # Compute eye-line displacement & angle for each image
├── HeadClusteringAngleDisp.py                       # Perform K-Means on head pose features (angle, displacement)
└── requirements.txt                                 # Python dependencies

Prerequisites & Setup

  1. Python 3.8+
  2. Install required libraries
    pip install -r requirements.txt
    

    The main dependencies are:

    • dlib (v19.22 or compatible) – for face detection & 68-point landmarks
    • opencv-python (v4.5) – for image I/O & processing
    • numpy (v1.21) – numerical computations
    • pandas (v1.3) – CSV handling
    • scikit-learn (v1.0) – K-Means, StandardScaler, clustering metrics
    • matplotlib (v3.4) – plotting
    • seaborn (v0.11) – (optional, used in head clustering plots)
  3. Download dlib’s 68-point landmark model
    The repository expects the file
    shape_predictor_68_face_landmarks.dat
    

    in this top‐level directory. You can download it from:
    http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
    Unzip it so that shape_predictor_68_face_landmarks.dat is available.


Data Preparation

  1. AffectNet-YOLO Images
    • Place your “AFFECTNET-YOLO” images (96×96 px) in data/AFFECTNET-YOLO/, preserving any subfolder structure.
    • This pipeline ignores annotation files; it only needs face images.
  2. Optional: FER Dataset
    • For cross-dataset validation (lip & eye), put the FER 48×48 px face crops in data/FER/.
    • The lip/eye scripts will run “as-is” on these smaller crops (the same ratio calculations still apply).
  3. Output Directory
    • All generated images, CSVs, plots, and cluster folders will be saved under outputs/.
    • Each script will create its own subdirectory inside outputs/—see Output & Folder Structure for details.

Pipeline Overview

At a high level, our pipeline consists of three stages:

  1. Stage 1: Nose-Tip Alignment
  2. Stage 2: Descriptor Extraction
    • Lips → rlip
    • Eyes → reye
    • Pupils → (nx, ny)
    • Head → (d, ∆θ)
  3. Stage 3: Unsupervised Clustering
    • K-Means on each descriptor set; k chosen via elbow method
    • Compute Silhouette, Calinski–Harabasz, and Davies–Bouldin indices

Below is a detailed breakdown of each stage.

Stage 1: Nose-Tip Alignment

Stage 2: Descriptor Extraction

After nose alignment, we extract four low-dimensional geometric descriptors from each aligned face:

Lip Ratio Extraction

Eye Aspect Ratio Extraction

Pupil Position Classification

Head Pose (Angle + Displacement)

We encode head pose by measuring:

  1. Displacement d = Euclidean distance between the midpoints of:
    • Actual eye-line: midpoint of landmarks #37 & #46
    • Reference (mean eye-line): midpoint of the dataset’s average #37 & #46
  2. Angular deviation ∆θ (degrees) = difference in arctangent angles (line slopes) of:
    • Actual eye-line (angle_i = atan2(y46−y37, x46−x37))
    • Reference eye-line (angle_ref = atan2(ȳ46−ȳ37, x̄46−x̄37))
      [ ∆θ = igl( heta_i - heta_{ m ref}igr) imes rac{180}{\pi}. ]

Output & Folder Structure

After you run all scripts in sequence (alignment → descriptor extraction → clustering), you’ll see the following top-level outputs/ structure:

outputs/
├── Alignment/                                   # Nose-aligned images
│   └── *.jpg, *.png, …  
├── HeadAngleDisp_Images/                        # Head pose annotation & data
│   ├── <annotated images>.jpg
│   └── colds_AngleDisp_Data.csv
├── Head_Clusters/
│   ├── Angle/
│   │   ├── Cluster_0/, Cluster_1/, Cluster_2/
│   │   └── KMeans_Clustering_Angle.png
│   ├── Displacement/
│   │   ├── Cluster_0/, Cluster_1/, Cluster_2/
│   │   └── KMeans_Clustering_Displacement.png
│   ├── Angle_Displacement/
│   │   ├── Cluster_0/…/Cluster_4/
│   │   └── KMeans_Clustering_Angle_Displacement.png
│   ├── KMeans_Clustering_Results.csv
│   └── Clustering_Evaluation_Metrics.csv
├── LipClustering1/
│   ├── output/                              # first 50 lip‐annotated images
│   ├── plots/
│   │   ├── cluster_visualization.png
│   │   └── ratios_histogram.png
│   └── results.txt
├── Eye_Emo/
│   ├── Eye_ExpFull_landmark_ratios.csv     # raw height,width,ratio
│   ├── elbow_method_plot.png
│   ├── clustered_data_ratio.csv            # ratio + cluster
│   ├── Cluster_0/…/Cluster_3/              # images with landmarks overlaid
│   ├── clusters_plot_ratio.png
│   └── clusters_ratio_vs_mean_plot.png
└── Pupil_Annotated/
    ├── <annotated images>.jpg
    ├── clustering_results.csv          # “heuristic + kmeans” labels
    └── Pupil_Clusters/
        ├── cluster_distribution.png
        ├── pupil_scatter.png
        └── kmeans_pupil_scatter.png

Every script creates and populates its respective subfolder under outputs/. You can safely delete or re-run them in any order, but a recommended execution sequence is:

  1. HeadNoseAlignment.pyoutputs/Alignment/
  2. HeadAngleDisp.pyoutputs/HeadAngleDisp_Images/
  3. HeadClusteringAngleDisp.pyoutputs/Head_Clusters/
  4. LipClusteringForSubfolderGreyImageSSHyper.pyoutputs/LipClustering1/
  5. EyeClusteringForSubfolderGreyImageSSHyper.pyoutputs/Eye_Emo/
  6. PupilMoveClassification.pyoutputs/Pupil_Annotated/ and outputs/Pupil_Clusters/

Evaluation Metrics

For every clustering module (lips, eyes, pupils, head), the following metrics are computed and/or available:

Scores on AffectNet-YOLO

Module k clusters Silhouette Calinski–Harabasz Davies–Bouldin
Lips 4 0.5781 76 789.41 0.5781
Eyes 4 0.6143 34 515.84 0.4879
Pupils 5 0.6500 (≈) 18 000 (≈) 0.5200 (≈)
Head 3 (angle) 0.5600 (≈) 30 658  0.5600 (≈)
  3 (disp) 0.5500 (≈) 37 520  0.5600 (≈)
  5 (comb) 0.3400 (≈) 10 288  0.8700 (≈)
Pupil 5 0.6500 (≈) 18 000 0.5200 (≈)

Cross-Dataset Results (FER, Lips/Eyes only)

Module Dataset Silhouette Calinski–Harabasz Davies–Bouldin
Lips AffectNet-YOLO 0.5781 76 789.41 0.5781
  FER 0.7067 20 988.36 0.4588
Eyes AffectNet-YOLO 0.6143 34 515.84 0.4879
  FER 0.6020 5 587.74 0.5289

Future Work & Extensions


References

  1. Cootes, T. F., Taylor, C. J., Cooper, D. H., & Graham, J. (1995). Active shape models—their training and application. Computer Vision and Image Understanding, 61(1), 38–59.
  2. Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell., 23(6), 681–685.
  3. Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In Proc. IEEE CVPR (pp. 3476–3483).
  4. Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proc. IEEE CVPR (pp. 1867–1874).
  5. King, D. E. (2009). Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research, 10, 1755–1758.
  6. Lee, S., & Kim, H. (2022). Contrastive learning for facial landmark detection under occlusions. IEEE Access.
  7. Patel, R., & Singh, A. (2023). Self-supervised facial keypoint detection in the wild. In ICCV.
  8. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Symp. on Math. Stat. and Probab., Vol. 1, 281–297.
  9. Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: An overview. WIREs Data Mining and Knowledge Discovery, 2(1), 86–97.
  10. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (pp. 849–856).
  11. Wang, X., & Zhao, Y. (2024). Contrastive clustering for facial features. IEEE Trans. Pattern Anal. Mach. Intell.
  12. Zhang, L., & Chen, B. (2024). Geometric clustering of facial landmarks. In ECCV.
  13. Murphy-Chutorian, E., & Trivedi, M. M. (2018). Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell.
  14. Kim, J., & Park, S. (2023). Unsupervised clustering of head-pose embeddings. In ICCV.
  15. Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). MoCoGAN: Decomposing motion and content for video generation. In CVPR.
  16. Zafeiriou, S., Zhang, C., & Pantic, M. (2013). Facial landmark detection in the wild: A survey. Image and Vision Computing, 31(3), 408–420.
  17. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Comput. and Appl. Math., 20, 53–65.
  18. Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27.
  19. Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-1(2), 224–227.
  20. Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Trans. Syst., Man, and Cybernetics, 9(1), 62–66.