Jagadish Krishnamoorthy

I'm a AI frameworks / Pytorch and AI models Engineer at AMD in San Francisco Bay Area.  Contributing to open source projects in personal and professional experience on a day to day basis.
I deal with Python, C/C++, CMake, Jenkins, Dockers/container technology for development of AI workloads on GPU.

At AMD I've worked on Pytorch, ONNX RT, DeepSpeed, Apex, HF Transformer, AI models training and inference.
Open source Contributor to Pytorch, DeepSpeed, ONNX RT and other AI frameworks.

  • Integrating AI frameworks such as Pytorch with ROCm AMD GPU AI SW stack.

  • Integrating libraries such as BLAS, BLASLT, magma with Pytorch.

  • Pytorch datatypes such as fp32, tf32, ocp fp8, mx fp8, mx fp4; scaling, scaled_mm gemm, matmul gemm kernels.

  • Maintenance of Upstream Pytorch, release Pytorch branches, ROCm fork Pytorch branches. https://github.com/ROCm/pytorch .

  • torch.cuda, distributed, c10d modules, single/multi GPUs training and inference using Pytorch DDP/DP. RCCL / NCCL communications, training and inference of AI models such as GPT, Llama, Bert, Cifar and transformer based LLM models.

  • GPU architecture, memory, streams, CUDA/HIP working knowledge, AI software stack ranging from application layer to GPU hardware.

  • GPU Cloud: Multi node, multi GPU AI / ML workload distributed training using PyTorch, Composer, HuggingFace, Slurm / k8s usage for HPC jobs, Python Django project.


My social links

profile photo

Hobby projects (AI and Blockchain)


Blogs


Misc

I have active interest in AI agents, LLM, transformer architecture encoder and decoder, training and inference on single and multi node GPU systems. Hit an Email for ideas and discussions !