vedicthg

VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars

VedicTHG is a lightweight, bring-your-own-assets toolkit for audio-driven talking-head generation.
It converts an input audio clip into a sequence of visemes (mouth shapes) and renders a lip-synced video by compositing mouth sprites onto a reference face image (optionally with simple motion/rigging).

This repository is GitHub-ready and intentionally ships no datasets and no personal face media.
You bring your own assets (with consent) — we bring the pipeline (and zero awkward copyright problems).

Project page: https://vineetkumarrakesh.github.io/vedicthg/
Preprint: https://doi.org/10.48550/arXiv.2602.08775

What’s included

Method summary (high level)

  1. Audio processing: load audio, optional MFCC extraction.
  2. Phoneme alignment: decode an approximate phoneme sequence and timings.
  3. Viseme synthesis: map phonemes to discrete viseme IDs and blend with overlap for smoothness.
  4. Rendering: for each frame, select/blend a mouth sprite, warp/compose it into the mouth region, and write the video with the original audio.

Note: This repo provides a reproducible engineering pipeline. It does not include or redistribute any copyrighted datasets (GRID/LRS/VoxCeleb/etc.).
In other words: the code is free, the datasets are not, and your future self will thank you.

Install

Recommended: Python 3.10 or 3.11 (Windows/Linux).
(Some wheels like mediapipe may be limited on newer Python versions.)

# from the repo root
python -m pip install --upgrade pip
pip install -e .

If you prefer a pinned environment:

pip install -r requirements.txt
pip install -e .

Optional extras

Quickstart demo (no datasets)

1) Generate dummy assets (safe, synthetic, and guaranteed to not sue you):

python scripts/prepare_dummy_assets.py

This creates:

2) Run the simple demo renderer:

python scripts/run_demo.py

Output:

If you want the full CLI demo (phoneme recognition + profiling):

python -m vedicthg.demo --audio data/raw/benchmark/clip_01.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/demo_cli.mp4 --fps 30

Bring your own assets

You must have permission/consent for any face media you use.
If it’s not yours (or you don’t have explicit permission), don’t use it.
VedicTHG is powerful — but not powerful enough to defeat ethics (or the law).

Minimum inputs:

Recommended workflow:

  1. Put your assets under data/raw/ (this folder is gitignored).
  2. Run:
    python -m vedicthg.demo --audio data/raw/example.wav --face data/raw/avatar.png --mouth-dir data/raw/visemes --output results/my_run.mp4
    

Mouth sprite conventions

Benchmarking

Benchmarks live in experiments/benchmarks/.

After generating or adding benchmark clips in data/raw/benchmark/:

python experiments/benchmarks/run_benchmark.py --root . --num-clips 10 --fps 30 --render

Typical outputs (gitignored):

Repository layout

VedicTHG/
  src/vedicthg/        # installable python package
  scripts/             # runnable scripts (no PYTHONPATH hacks needed)
  experiments/         # benchmarks, ablations, analysis
  docs/                # extra documentation
  data/                # empty placeholders (gitignored; BYO assets)
  results/             # empty placeholders (gitignored)
  assets/              # optional project media (kept empty here)

Troubleshooting

Responsible use

This project can create realistic-looking talking head videos when paired with suitable assets. Please use it responsibly:

Citation

If you use VedicTHG in academic work, please cite the preprint:

@misc{rakesh2026vedicthgsymbolicvediccomputation,
      title={VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars}, 
      author={Vineet Kumar Rakesh and Ahana Bhattacharjee and Soumya Mazumdar and Tapas Samanta and Hemendra Kumar Pandey and Amitabha Das and Sarbajit Pal},
      year={2026},
      eprint={2602.08775},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.08775}, 
}

See also: CITATION.cff.


No datasets included. Bring your own assets.
(We provide the code; you provide the consent. Everyone wins.)