Thanks! Here’s the high level description from there:
“Gemini models build on top of Transformer decoders (Vaswani et al., 2017) that are enhanced with
improvements in architecture and model optimization to enable stable training at scale and optimized
inference on Google’s Tensor Processing Units. They are trained to support 32k context length”
Here is their technical report. I’m yet to read it, though.
Thanks! Here’s the high level description from there:
“Gemini models build on top of Transformer decoders (Vaswani et al., 2017) that are enhanced with improvements in architecture and model optimization to enable stable training at scale and optimized inference on Google’s Tensor Processing Units. They are trained to support 32k context length”