The rapid advancements in open-source models like Llama-3.1 demonstrate the growing potential of smaller, fine-tuned models that can rival even the top closed-source alternatives. Lumino allows you to fine-tune Llama-3.1 8B on your proprietary datasets, creating powerful, accurate, and fully owned AI models.In this guide, we’ll walk through how to fine-tune a Llama-3.1 8B model using the Lumino SDK on a sample dataset, transforming a base model into a fine-tuned, high-performance model for your specific needs. Our example will cover everything from dataset preparation to running evaluations, showing how a base model can be elevated to outperform larger models.
To begin, let’s prepare a dataset for fine-tuning. In this tutorial, we’ll use a simple trivia-based dataset. If you’re following along, feel free to create or use any dataset that fits your use case.
[ { "instruction": "What is the capital of France?", "output": "The capital of France is Paris." }, { "instruction": "Who wrote the novel '1984'?", "output": "The novel '1984' was written by George Orwell." }, { "instruction": "What is the speed of light?", "output": "The speed of light is approximately 299,792 kilometers per second." }]
To use this dataset with the Lumino SDK, we need to convert it to a .jsonl format, where each line represents a structured conversation between a system, user, and assistant. Here’s a Python script that formats the dataset into the correct structure for fine-tuning:
Copy
import jsondataset = [ { "instruction": "What is the capital of France?", "output": "The capital of France is Paris." }, { "instruction": "Who wrote the novel '1984'?", "output": "The novel '1984' was written by George Orwell." }, { "instruction": "What is the speed of light?", "output": "The speed of light is approximately 299,792 kilometers per second." }]# Format the dataset into the required structureformatted_data = []for example in dataset: formatted_data.append({ "messages": [ {"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": example["instruction"]}, {"role": "assistant", "content": example["output"]} ] })# Write to a .jsonl filewith open("formatted-encyclopedia.jsonl", "w", encoding="utf-8") as f: for item in formatted_data: f.write(json.dumps(item) + "\n")print("Dataset has been formatted and saved to 'formatted-encyclopedia.jsonl'.")
Result: formatted_encyclopedia.jsonl
Copy
{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."}]}{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "Who wrote the novel '1984'?"}, {"role": "assistant", "content": "The novel '1984' was written by George Orwell."}]}{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "What is the speed of light?"}, {"role": "assistant", "content": "The speed of light is approximately 299,792⬤
This script uses the Lumino SDK to upload the dataset and confirm successful upload. Make sure to replace “Formatted_Trivia.jsonl” with the path to your dataset file.
The fine-tuning parameters are flexible, allowing you to adjust aspects like batch size, learning rate, and the number of epochs. In this example, we’ll use 5 epochs and a small batch size to keep things lightweight.
While the job is running, you can monitor its status or adjust certain parameters by interacting with the Lumino API. Below is an example of retrieving the fine-tuning job’s status:
Copy
import osimport asynciofrom lumino.api_sdk.sdk import LuminoSDKasync def main(): async with LuminoSDK(os.environ.get("LUMSDK_API_KEY")) as client: jobs = await client.fine_tuning.list_fine_tuning_jobs() print("Fine-tuning jobs:", jobs)asyncio.run(main())
Since Lumino SDK currently does not support model evaluation, we can manually download the fine-tuned model and perform inference using libraries like Hugging Face’s transformers. This allows you to load the model and generate responses based on input prompts.
Once your model has been fine-tuned using the Lumino SDK, download the model files (this will depend on your hosting method) or save it locally if you’re running the fine-tuning job on your infrastructure.
Step 3: Load the Model and Perform Inference using Hugging Face Transformers
Here’s a Python script to load the fine-tuned model and perform inference:
Copy
from transformers import AutoModelForCausalLM, AutoTokenizerimport sys# Usage:# python predict.py "What is the capital of Japan?"# Ask the model somethingquery = sys.argv[1]# Load your fine-tuned model and tokenizer (replace with your model path or Hugging Face hub path)model_path = "path_to_your_finetuned_model"tokenizer = AutoTokenizer.from_pretrained(model_path)model = AutoModelForCausalLM.from_pretrained(model_path)# Generate predictions from the modeldef generate_answer(prompt): inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(inputs['input_ids'], max_length=50) return tokenizer.decode(outputs[0], skip_special_tokens=True)# Print model outputprint(generate_answer(query))
Running the Inference
Copy
python predict.py "What is the capital of Japan?"
The model will return the fine-tuned response based on the prompt provided.
By using the Lumino SDK, you can quickly fine-tune open-source models like Llama-3.1 8B for specific tasks, creating faster, more accurate models at a fraction of the cost of closed-source alternatives. The flexibility of Lumino’s fine-tuning API allows you to customize the entire process, from dataset management to model deployment.For more details, check out our full documentation or explore the PyPi repo to get started.