Paper Reading - Grok-1

发表于2024-05-01|Paper ReadingLLM

|阅读量:

开源大模型 Grok-1

基本参数：

Parameters: 314B
Architecture: Mixture of 8 Experts (MoE) 混合专家模型
Experts Utilization: 2 experts used per token
Layers: 64
Attention Heads: 48 for queries, 8 for keys/values
Embedding Size: 6,144
Tokenization: SentencePiece tokenizer with 131,072 tokens
Additional Features:
- Rotary embeddings (RoPE)
- Supports activation sharding and 8-bit quantization
Maximum Sequence Length (context): 8,192 tokens

项目架构：

run.py -> runner.py -> checkpoint.py -> model.py
run.py(控制参数配置和输入输出): params -> TransforerConfig() -> LanguageModelConfig() -> ModelRunner() -> InferenceRunner() -> initialize() -> run() -> sample_of_model() -> output
runner.py(模型加载和推理): ModelRunner, InferenceRunner
checkpoint.py: 权重加载
model.py(主要模型Transformer): InOutEmbed + Transformer

模型架构：

目前看完了上层架构，多层注意力和混合专家后续更。

从上层启动器到decoder层的结构

文章作者: GoldenArc

文章链接: https://golden-arc.github.io/2024/05/01/grok1/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Golden Arc！

Paper Reading LLM

相关推荐

Paper Reading - Research Contributions in HCI

Paper Reading - Is Time on Smartphones Well Spent?

Paper Reading - Frustration:Still a Common User Experience

Paper Reading - Trash in Motion:Emergent Interactions with Robotic Trashcans

Paper Reading - Towards a Bedder Future:A Study of Using Virtual Reality while Lying Down

Paper Reading - Five Provocations for Ethical HCI Research