vedicthg

VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars

VedicTHG is a lightweight, bring-your-own-assets toolkit for audio-driven talking-head generation.
It converts an input audio clip into a sequence of visemes (mouth shapes) and renders a lip-synced video by compositing mouth sprites onto a reference face image (optionally with simple motion/rigging).

This repository is GitHub-ready and intentionally ships no datasets and no personal face media.
You bring your own assets (with consent) — we bring the pipeline (and zero awkward copyright problems).

Project page: https://vineetkumarrakesh.github.io/vedicthg/
Preprint: https://doi.org/10.48550/arXiv.2602.08775

What’s included

Audio → phonemes via PocketSphinx (allphone/phoneme decoding)
Phoneme → viseme mapping with optional overlap/coarticulation heuristics (“Vedic” smoothing)
Rendering engine that overlays viseme sprites onto a face image (OpenCV + MoviePy)
Optional dynamic rig using MediaPipe FaceMesh to stabilize mouth placement
Benchmarking scripts (speed, CPU/memory, simple sync proxies) and ablations

Method summary (high level)

Audio processing: load audio, optional MFCC extraction.
Phoneme alignment: decode an approximate phoneme sequence and timings.
Viseme synthesis: map phonemes to discrete viseme IDs and blend with overlap for smoothness.
Rendering: for each frame, select/blend a mouth sprite, warp/compose it into the mouth region, and write the video with the original audio.

Note: This repo provides a reproducible engineering pipeline. It does not include or redistribute any copyrighted datasets (GRID/LRS/VoxCeleb/etc.).
In other words: the code is free, the datasets are not, and your future self will thank you.

Install

Recommended: Python 3.10 or 3.11 (Windows/Linux).
(Some wheels like mediapipe may be limited on newer Python versions.)

# from the repo root
python -m pip install --upgrade pip
pip install -e .

If you prefer a pinned environment:

pip install -r requirements.txt
pip install -e .

Optional extras

Whisper-based phoneme extraction script:
```
pip install -e ".[whisper]"
```
Some analysis/ablation scripts (identity metrics):
```
pip install -e ".[analysis]"
```

Quickstart demo (no datasets)

1) Generate dummy assets (safe, synthetic, and guaranteed to not sue you):

python scripts/prepare_dummy_assets.py

This creates:

data/raw/avatar.png (a simple synthetic face)
data/raw/visemes/0.png ... 14.png (sprite mouth shapes)
data/raw/benchmark/clip_01.wav ... (synthetic audio clips)

2) Run the simple demo renderer:

python scripts/run_demo.py

Output:

results/demo.mp4

If you want the full CLI demo (phoneme recognition + profiling):

python -m vedicthg.demo --audio data/raw/benchmark/clip_01.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/demo_cli.mp4 --fps 30

Bring your own assets

You must have permission/consent for any face media you use.
If it’s not yours (or you don’t have explicit permission), don’t use it.
VedicTHG is powerful — but not powerful enough to defeat ethics (or the law).

Minimum inputs:

Face image: data/raw/avatar.png
A front-facing portrait works best.
Mouth sprites folder: data/raw/visemes/
PNG sprites named 0.png ... 14.png corresponding to the viseme IDs used by the mapper.
Audio: data/raw/example.wav (16 kHz mono recommended)

Recommended workflow:

Put your assets under data/raw/ (this folder is gitignored).

Run:

python -m vedicthg.demo --audio data/raw/example.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/my_run.mp4

Mouth sprite conventions

Transparent background PNGs are recommended.
Keep a consistent crop/scale across all sprites.
Viseme indices are defined in src/vedicthg/viseme_mapper.py.

Benchmarking

Benchmarks live in experiments/benchmarks/.

After generating or adding benchmark clips in data/raw/benchmark/:

python experiments/benchmarks/run_benchmark.py --root . --num-clips 10 --fps 30 --render

Typical outputs (gitignored):

experiments/benchmarks/results/*.csv
results/*.mp4 (if rendering is enabled)

Repository layout

VedicTHG/
  src/vedicthg/        # installable python package
  scripts/             # runnable scripts (no PYTHONPATH hacks needed)
  experiments/         # benchmarks, ablations, analysis
  docs/                # extra documentation
  data/                # empty placeholders (gitignored; BYO assets)
  results/             # empty placeholders (gitignored)
  assets/              # optional project media (kept empty here)

Troubleshooting

mediapipe install issues (Windows): use Python 3.10/3.11 and upgrade pip.
MoviePy / FFmpeg errors: ensure FFmpeg is installed and available on PATH.
No audio / wrong codec: convert audio to 16 kHz mono WAV.
Mouth misalignment: try enabling rigging options in the renderer and use a cleaner, front-facing portrait.
PocketSphinx model files: PocketSphinx includes default models; if you override paths, verify them.

Responsible use

This project can create realistic-looking talking head videos when paired with suitable assets. Please use it responsibly:

Use only face media you own or have explicit permission to use.
Clearly label synthetic media when shared publicly.
Do not use for impersonation, deception, harassment, or political manipulation.

Citation

If you use VedicTHG in academic work, please cite the preprint:

@misc{rakesh2026vedicthgsymbolicvediccomputation,
      title={VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars}, 
      author={Vineet Kumar Rakesh and Ahana Bhattacharjee and Soumya Mazumdar and Tapas Samanta and Hemendra Kumar Pandey and Amitabha Das and Sarbajit Pal},
      year={2026},
      eprint={2602.08775},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.08775}, 
}