The rapid advancements in open-source models like Llama-3.1 demonstrate the growing potential of smaller, fine-tuned models that can rival even the top closed-source alternatives. Lumino allows you to fine-tune Llama-3.1 8B on your proprietary datasets, creating powerful, accurate, and fully owned AI models.

In this guide, we’ll walk through how to fine-tune a Llama-3.1 8B model using the Lumino SDK on a sample dataset, transforming a base model into a fine-tuned, high-performance model for your specific needs. Our example will cover everything from dataset preparation to running evaluations, showing how a base model can be elevated to outperform larger models.

What You Will Learn

  • How to prepare and format your dataset for fine-tuning
  • Uploading datasets to Lumino using the SDK
  • Initiating fine-tuning jobs with customizable parameters
  • Running evaluations to assess model performance

Prerequisites

  • Lumino SDK: Install using pip install lumino
  • API Key: Obtain it from the Lumino dashboard, and set it up as an environment variable:
export LUMINO_API_KEY=your_api_key

Dataset Preparation

To begin, let’s prepare a dataset for fine-tuning. In this tutorial, we’ll use a simple trivia-based dataset. If you’re following along, feel free to create or use any dataset that fits your use case.

Example Dataset - Original Format

[
  {
    "instruction": "What is the capital of France?",
    "output": "The capital of France is Paris."
  },
  {
    "instruction": "Who wrote the novel '1984'?",
    "output": "The novel '1984' was written by George Orwell."
  },
  {
    "instruction": "What is the speed of light?",
    "output": "The speed of light is approximately 299,792 kilometers per second."
  }
]

Formatting the Dataset to JSONL

To use this dataset with the Lumino SDK, we need to convert it to a .jsonl format, where each line represents a structured conversation between a system, user, and assistant. Here’s a Python script that formats the dataset into the correct structure for fine-tuning:

import json

dataset = [
    {
        "instruction": "What is the capital of France?",
        "output": "The capital of France is Paris."
    },
    {
        "instruction": "Who wrote the novel '1984'?",
        "output": "The novel '1984' was written by George Orwell."
    },
    {
        "instruction": "What is the speed of light?",
        "output": "The speed of light is approximately 299,792 kilometers per second."
    }
]

# Format the dataset into the required structure
formatted_data = []
for example in dataset:
    formatted_data.append({
        "messages": [
            {"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."},
            {"role": "user", "content": example["instruction"]},
            {"role": "assistant", "content": example["output"]}
        ]
    })

# Write to a .jsonl file
with open("formatted-encyclopedia.jsonl", "w", encoding="utf-8") as f:
    for item in formatted_data:
        f.write(json.dumps(item) + "\n")

print("Dataset has been formatted and saved to 'formatted-encyclopedia.jsonl'.")

Result: formatted_encyclopedia.jsonl

{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."}]}
{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "Who wrote the novel '1984'?"}, {"role": "assistant", "content": "The novel '1984' was written by George Orwell."}]}
{"messages": [{"role": "system", "content": "You are a helpful assistant that provides accurate answers to general knowledge questions."}, {"role": "user", "content": "What is the speed of light?"}, {"role": "assistant", "content": "The speed of light is approximately 299,792​⬤

Uploading the Dataset to Lumino

Method 1: Uploading via the Lumino SDK

Once the dataset is prepared, it’s time to upload it to Lumino using the SDK. Here’s how to do that:

import os
import asyncio
from lumino.sdk import LuminoSDK
from lumino.models import (
    DatasetCreate,
)


async def main():
    async with LuminoSDK(os.environ.get("LUMINO_API_KEY")) as client:
        files = await client.dataset.list_datasets()
        await client.dataset.upload_dataset("formatted-encyclopedia.jsonl", DatasetCreate(name="trivia-dataset", description="trivia dataset"))

        print(files)

asyncio.run(main())

This script uses the Lumino SDK to upload the dataset and confirm successful upload. Make sure to replace “Formatted_Trivia.jsonl” with the path to your dataset file.

Method 2: Uploading via the Lumino Web App UI

Alternatively, you can upload your dataset through the Lumino Web App, which provides an intuitive UI for managing datasets.

  1. Log in to the Lumino Web App: Go to the Lumino Web App and log in with your credentials.
  2. Navigate to the Dataset Page:: Once logged in, head to the “Datasets” section on the dashboard.
  3. Upload Dataset::
  • Click the Upload Dataset button.
  • Select your formatted .jsonl file (e.g., Formatted_Trivia.jsonl) from your local machine.
  1. Confirm the Upload::
  • After uploading, you’ll see the dataset listed in the Datasets section. You can now use this dataset for fine-tuning jobs.

Fine-tuning

After uploading the dataset, we can start a fine-tuning job.

Create Fine Tune Job

The fine-tuning parameters are flexible, allowing you to adjust aspects like batch size, learning rate, and the number of epochs. In this example, we’ll use 5 epochs and a small batch size to keep things lightweight.

import os
import asyncio
from lumino.sdk import LuminoSDK
from lumino.models import (
    FineTuningJobCreate,FineTuningJobParameters
)

async def main():
    async with LuminoSDK(os.environ.get("LUMINO_API_KEY")) as client:
        job = await client.fine_tuning.create_fine_tuning_job(FineTuningJobCreate(
            base_model_name="llm_llama3_1_8b",
            dataset_name="sample",
            name="industrial-finetune",
            type="LORA",
            parameters=FineTuningJobParameters(
                batch_size=2,
                shuffle=True,
                num_epochs=1,
                lr=0.1,
                seed=42
            )
        ))
        print(f"Fine-tuning job started: {job}")

asyncio.run(main())

Monitoring the Job

While the job is running, you can monitor its status or adjust certain parameters by interacting with the Lumino API. Below is an example of retrieving the fine-tuning job’s status:

import os
import asyncio
from lumino.sdk import LuminoSDK

async def main():
    async with LuminoSDK(os.environ.get("LUMINO_API_KEY")) as client:
        jobs = await client.fine_tuning.list_fine_tuning_jobs()
        print("Fine-tuning jobs:", jobs)

asyncio.run(main())

Inference the Fine-tuned Model

Since Lumino SDK currently does not support model evaluation, we can manually download the fine-tuned model and perform inference using libraries like Hugging Face’s transformers. This allows you to load the model and generate responses based on input prompts.

Step 1: Export Your Fine-tuned Model

Once your model has been fine-tuned using the Lumino SDK, download the model files (this will depend on your hosting method) or save it locally if you’re running the fine-tuning job on your infrastructure.

Step 2: Prepare a Sample Prompt

Let’s say we want to ask the model a simple trivia question or prompt. For example:

{"instruction": "What is the capital of Japan?", "output": "The capital of Japan is Tokyo."}

You can change this prompt based on your use case.

Step 3: Load the Model and Perform Inference using Hugging Face Transformers

Here’s a Python script to load the fine-tuned model and perform inference:

from transformers import AutoModelForCausalLM, AutoTokenizer
import sys

# Usage:
# python predict.py "What is the capital of Japan?"

# Ask the model something
query = sys.argv[1]

# Load your fine-tuned model and tokenizer (replace with your model path or Hugging Face hub path)
model_path = "path_to_your_finetuned_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

# Generate predictions from the model
def generate_answer(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(inputs['input_ids'], max_length=50)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print model output
print(generate_answer(query))

Running the Inference

python predict.py "What is the capital of Japan?"

The model will return the fine-tuned response based on the prompt provided.

Conclusion

By using the Lumino SDK, you can quickly fine-tune open-source models like Llama-3.1 8B for specific tasks, creating faster, more accurate models at a fraction of the cost of closed-source alternatives. The flexibility of Lumino’s fine-tuning API allows you to customize the entire process, from dataset management to model deployment.

For more details, check out our full documentation or explore the PyPi repo to get started.