Reviewed-on: #3 |
||
---|---|---|
.gitignore | ||
LICENSE | ||
README.md | ||
example.py | ||
requirements.txt |
README.md
Godala-moe: A Mixture of Experts LLM Model
Overview
Godala-moe is an edge-ready Large Language Model (LLM) designed to run efficiently on various platforms. It is based on the Hugging Face HuggingFaceTB/SmolLM2-1.7B Base model and has undergone continued pretraining stages, multiple finetuning steps, and merging processes to create a mixture of experts.
Features
- Edge-Ready: Designed to run on edge devices with minimal resource requirements.
- Mixture of Experts: Combines multiple expert models for enhanced performance.
- Continued Pretraining: Underwent continued pretraining stages to improve model quality.
- Finetuning: Multiple finetuning steps to specialize the model for specific tasks.
Current Status
The model has currently been trained on approximately 400 million tokens. While it shows promise, the quality is not yet at its optimal level due to the limited training data. Additionally, the code generation feature does not currently produce output in Markdown format.
Future Improvements
- Increase Training Tokens: Plan to increase the number of training tokens to improve model performance.
- Markdown Highlighting: Implement correct markdown highlighting for better readability.
- Increased Parameters: Aim to increase the model parameters for enhanced capabilities.
Usage
To use Godala-moe, follow these steps:
-
Install Dependencies: Ensure you have the necessary dependencies installed. You can install them using pip:
pip install transformers torch
-
Run the Model: Use the provided script to run the model. Here is an example script:
from transformers import AutoModelForCausalLM, AutoTokenizer location = "Godala-moe" device = "cuda" # Use "cpu" when not using GPU tokenizer = AutoTokenizer.from_pretrained(location) model = AutoModelForCausalLM.from_pretrained(location).to(device) messages = [{"role": "user", "content": "What can you tell me about Godot?"}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=510, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
Download
You can download the Godala-moe model from this Link.
License
This project is licensed under the Creative Commons Attribution 4.0 International License. The original model was licensed under the Apache 2.0 License.