VedicTHG is a lightweight, bring-your-own-assets toolkit for audio-driven talking-head generation.
It converts an input audio clip into a sequence of visemes (mouth shapes) and renders a lip-synced video by compositing mouth sprites onto a reference face image (optionally with simple motion/rigging).
This repository is GitHub-ready and intentionally ships no datasets and no personal face media.
You bring your own assets (with consent) — we bring the pipeline (and zero awkward copyright problems).
Project page: https://vineetkumarrakesh.github.io/vedicthg/
Preprint: https://doi.org/10.48550/arXiv.2602.08775
Note: This repo provides a reproducible engineering pipeline. It does not include or redistribute any copyrighted datasets (GRID/LRS/VoxCeleb/etc.).
In other words: the code is free, the datasets are not, and your future self will thank you.
Recommended: Python 3.10 or 3.11 (Windows/Linux).
(Some wheels like mediapipe may be limited on newer Python versions.)
# from the repo root
python -m pip install --upgrade pip
pip install -e .
If you prefer a pinned environment:
pip install -r requirements.txt
pip install -e .
pip install -e ".[whisper]"
pip install -e ".[analysis]"
1) Generate dummy assets (safe, synthetic, and guaranteed to not sue you):
python scripts/prepare_dummy_assets.py
This creates:
data/raw/avatar.png (a simple synthetic face)data/raw/visemes/0.png ... 14.png (sprite mouth shapes)data/raw/benchmark/clip_01.wav ... (synthetic audio clips)2) Run the simple demo renderer:
python scripts/run_demo.py
Output:
results/demo.mp4If you want the full CLI demo (phoneme recognition + profiling):
python -m vedicthg.demo --audio data/raw/benchmark/clip_01.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/demo_cli.mp4 --fps 30
You must have permission/consent for any face media you use.
If it’s not yours (or you don’t have explicit permission), don’t use it.
VedicTHG is powerful — but not powerful enough to defeat ethics (or the law).
Minimum inputs:
data/raw/avatar.pngdata/raw/visemes/0.png ... 14.png corresponding to the viseme IDs used by the mapper.data/raw/example.wav (16 kHz mono recommended)Recommended workflow:
data/raw/ (this folder is gitignored).python -m vedicthg.demo --audio data/raw/example.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/my_run.mp4
src/vedicthg/viseme_mapper.py.Benchmarks live in experiments/benchmarks/.
After generating or adding benchmark clips in data/raw/benchmark/:
python experiments/benchmarks/run_benchmark.py --root . --num-clips 10 --fps 30 --render
Typical outputs (gitignored):
experiments/benchmarks/results/*.csvresults/*.mp4 (if rendering is enabled)VedicTHG/
src/vedicthg/ # installable python package
scripts/ # runnable scripts (no PYTHONPATH hacks needed)
experiments/ # benchmarks, ablations, analysis
docs/ # extra documentation
data/ # empty placeholders (gitignored; BYO assets)
results/ # empty placeholders (gitignored)
assets/ # optional project media (kept empty here)
mediapipe install issues (Windows): use Python 3.10/3.11 and upgrade pip.This project can create realistic-looking talking head videos when paired with suitable assets. Please use it responsibly:
If you use VedicTHG in academic work, please cite the preprint:
@misc{rakesh2026vedicthgsymbolicvediccomputation,
title={VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars},
author={Vineet Kumar Rakesh and Ahana Bhattacharjee and Soumya Mazumdar and Tapas Samanta and Hemendra Kumar Pandey and Amitabha Das and Sarbajit Pal},
year={2026},
eprint={2602.08775},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.08775},
}
See also: CITATION.cff.
No datasets included. Bring your own assets.
(We provide the code; you provide the consent. Everyone wins.)