Modelos de chat basados en LLM or SLM

Los grandes modelos de lenguaje exhiben grandes habilidades para seguir instrucciones. Una modalidad que se ha vuelto muy popular es la modalidad de chat donde las instrucciones estan estructuradas en una secuencia de texto del tipo:

<USUARIO>
<ASISTENTE>
<USUARIO>
<ASISTENTE>

En este ejemplo exploraremos un modelo de generación de texto que ha sido entrenado con conjunto de datos en este formato dandole la habilidad de seguir conversaciones.

Introdución

Los grandes modelos de lenguaje son capaces de resolver problemas de clasificación al utilizar determinadas estructuras del idioma. Algunos modelos de lenguaje están especificamente entrenados para seguir instrucciones, los cuales los hace muy útiles a la hora de implementar zero-shot or few-shot learning.

En este ejemplo, veremos como utilizar un modelo de aprendizaje automático entrenado de esta forma para resolver problemas de clasificación.

Para ejecutar este notebook

Para ejecutar este notebook, instale las siguientes librerias:

[1]:

!wget https://raw.githubusercontent.com/santiagxf/M72109/master/NLP/Datasets/mascorpus/tweets_marketing.csv \
    --quiet --no-clobber --directory-prefix ./Datasets/mascorpus/

!pip -q install transformers[torch] accelerate datasets evaluate flash-attention

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.6/302.6 kB 4.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 542.0/542.0 kB 9.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 kB 8.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 6.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 7.0 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 7.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 401.2/401.2 kB 12.1 MB/s eta 0:00:00

Verificando el hardware disponible

[2]:

import torch
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

print("Este notebook se está ejecutando en", device)

Este notebook se está ejecutando en cuda

Trabajando con un SLM entrenado para chat

En este ejemplo, utilizaremos el modelo dolly en su version de 2.8 millones de parámetros. Para poder utilizar este modelo en Google Colab, necesitamos realizar algunas optimizaciones, entre las cuales, bajar la precisión numérica a 16 bits.

[6]:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

WARNING:transformers_modules.microsoft.Phi-3-mini-128k-instruct.8a362e755d2faf8cec2bf98850ce2216023d178a.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
WARNING:transformers_modules.microsoft.Phi-3-mini-128k-instruct.8a362e755d2faf8cec2bf98850ce2216023d178a.modeling_phi3:Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Podemos verificar como funciona este modelo:

[8]:

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

 To solve the equation 2x + 3 = 7, you need to isolate the variable x. Here are the steps:

1. Subtract 3 from both sides of the equation to get rid of the +3 on the left side. This gives you: 2x = 7 - 3, which simplifies to 2x = 4.

2. Now, divide both sides of the equation by 2 to solve for x. This gives you: x = 4 / 2, which simplifies to x = 2.

So, the solution to the equation 2x + 3 = 7 is x = 2.