Welcome to Kunvar's corner of the cyberspace

Hello, I'm Kunvar!

My main focus these days is to take a trained neural network and attempt to understand the various structures they've learnt and how they use these structures to implement various algorithms.

This typically includes investigating the learned parameters of various model components, understanding how they interact with other model components, how the model uses seemingly unrelated learned features to implement various algorithms, and how we can intervene and change the model behavior in a desirable way by editing the internal parameters.

If you have thoughts on anything I've written, want to discuss some projects, get/give feedback, or otherwise want to contact me, my email is : 1stuserhere@gmail.com

Here are some projects that I'm working on these days:

Working on

Mechanistic analysis of meta out-of-context learning in LLMs


Project logs

Thinking about

Notebook
Topic
Date written
The malware inside your AI
Interpretability
Interpretability

Notebook logs

Reading

Title
Author
Date Read
Thomas Harris
Thomas Harris
Thomas Harris

Reading logs

Watching

Title
Released in
Watched on
Shawshank Redemption
1994
Silence of the lambs
1991

Movie logs