- First two layers become edge detectors —> combinations ofedges become corners —> if you combine corners with edges, then you have shapes —> shapes become objects
- backpropagation is necessary “minimizing the loss function” b/c you can’t interdict half way through
Don’t beehive information
Token = 3/4 of a word
computational time: number of layers is exponential by layer
32 bits vs. 8 bits and the implications, it’s a 1/3 right? Look up CPU, GPU, and TPU
- What is your plan to upgrade new LLMs?
- Can you ask what data was this trained on?