[GH-ISSUE #10102] llama3.2 struggles to parse email into EmailModel #68680

Closed
opened 2026-05-04 14:49:28 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @khteh on GitHub (Apr 3, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10102

I have the following prompts:

    _email_parser_prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                """
                You are an expert email parser.
                Parse the date of email, sender's name, sender's phone, sender's email, project id, site location, violation type, required changes, 
                compliance deadline, and maximum potential fine from the email. If any of the fields aren't present, don't populate them. 
                Try to cast dates into the dd-mm-YYYY format. Don't populate fields if they're not present in the email.

                Here's the email:
                {email}
                """,
            ),
            ("human", "{email}"),
            #("placeholder", "{email}"), #should be a list of base messages
        ]
    )

and the following EmailModel:

class EmailModel(BaseModel):
    date_str: str | None = Field(
        default=None,
        exclude=True,
        repr=False,
        description="The date of the email reformatted to match mm-dd-YYYY. This is usually found in the Date: field in the email.",
    )
    name: str | None = Field(
        default=None,
        description="The name of the email sender. This is usually found in the From: field in the email formatted as name <email>",
    )
    phone: str | None = Field(
        default=None,
        description="The phone number of the email sender (if present in the message). This is usually found in the signature at the end of the email body.",
    )
    email: str | None = Field(
        default=None,
        description="The email addreess of the email sender (if present in the message). This is usually found in the From: field in the email formatted as name <email>",
    )
    project_id: int | None = Field(
        default=None,
        description="The project ID (if present in the message) - must be an integer",
    )
    site_location: str | None = Field(
        default=None,
        description="The site location of the project (if present in the message). Use the full address if possible.",
    )
    violation_type: str | None = Field(
        default=None,
        description="The type of violation (if present in the message)",
    )
    required_changes: str | None = Field(
        default=None,
        description="The required changes specified by the email (if present in the message)",
    )
    compliance_deadline_str: str | None = Field(
        default=None,
        exclude=True,
        repr=False,
        description="The date that the company must comply (if any) reformatted to match YYYY-mm-dd",
    )
    max_potential_fine: float | None = Field(
        default=None,
        description="The maximum potential fine (if any)",
    )

    @staticmethod
    def _convert_string_to_date(date_str: str | None) -> date | None:
        try:
            return datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %z') if date_str else None
        except Exception as e:
            print(e)
            return None

    @computed_field
    @property
    def date_of_email(self) -> date | None:
        return self._convert_string_to_date(self.date_str) if self.date_str else None

    @computed_field
    @property
    def compliance_deadline(self) -> date | None:
        return self._convert_string_to_date(self.compliance_deadline_str) if self.compliance_deadline_str else None

And the following chain:

self._chainLLM = init_chat_model("llama3.2", model_provider="ollama", temperature=0)
        self._email_parser_chain = (
            self._email_parser_prompt
            | self._chainLLM.with_structured_output(EmailModel)
        )

It fails to parse the email into the required attributes of EmailModel:

[llama3.2](extract: name=None phone=None email='admin@osha.com' project_id=None site_location='123 Main Street, Dallas, TX' violation_type=None required_changes=None max_potential_fine=25000.0 date_of_email=datetime.datetime(2025, 4, 2, 15, 39, 59, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))) compliance_deadline=None)
@pytest.mark.asyncio(loop_scope="function")
async def test_email_parser_chain(EmailRAGFixture):
    state = {
         "email": EMAILS[0],
         "extract": None,
         "escalation_text_criteria": "\"There's a risk of fire or water damage at the site\"",
         "escalation_dollar_criteria": 100_000,
         "escalate": False,
         "escalation_emails": ["brog@abc.com", "bigceo@company.com"],
    }
    config = RunnableConfig(run_name="Email RAG Test", thread_id=datetime.now())
    result: EmailRAGState = await EmailRAGFixture.ParseEmail(state, config)
    print(f"extract: {result['extract']}")
    assert result
    assert result["extract"]
    assert result["extract"].date_str
    assert result["extract"].name == 'Occupational Safety and Health Administration (OSHA)'
    assert result["extract"].phone
    assert result["extract"].phone == "(555) 123-4567"
    assert result["extract"].email
    assert result["extract"].email == "compliance.osha@osha.gov"
    assert result["extract"].project_id
    assert result["extract"].project_id == 111232345
    assert result["extract"].site_location
    assert result["extract"].site_location == "123 Main Street, Dallas, TX"
    assert result["extract"].escalate
    assert result["extract"].violation_type
    assert result["extract"].required_changes
    assert result["extract"].compliance_deadline
    assert result["extract"].max_potential_fine
    assert result["extract"].max_potential_fine == 25000.0

Sample email:

    """
    Date: Wed, 02 Apr 2025 15:39:59 -0700
    From: Occupational Safety and Health Administration (OSHA) <admin@osha.com>
    Reply-To: "Occupational Safety and Health Administration (OSHA)" <admin@reply.osha.com>
    To: "Blue Ridge Construction" <admin@blueridge.com>
    Cc: Donald Duck <donald@duck.com>, Comment <comment@noreply.osha.com>
    Message-ID: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
    In-Reply-To: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
    References: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
    Subject: Re: Project 111232345 - Downtown Office Complex Location: Dallas, TX

    During a recent inspection of your construction site at 123 Main
    Street, the following safety violations were identified:

    Lack of fall protection: Workers on scaffolding above 10 feet
    were without required harnesses or other fall protection
    equipment. 
    
    Unsafe scaffolding setup: Several scaffolding structures were noted as
    lacking secure base plates and bracing, creating potential
    collapse risks.
    
    Inadequate personal protective equipment (PPE): Multiple workers were
    found without proper PPE, including hard hats and safety glasses.

    Required Corrective Actions:
    Install guardrails and fall arrest systems on all scaffolding
    over 10 feet. Conduct an inspection of all scaffolding structures and reinforce unstable sections. 
    Ensure all workers on-site are provided with necessary PPE and conduct safety training on proper
    usage.

    Deadline for Compliance: All violations must be rectified by November 10, 2025. Failure to comply may result in fines
    of up to $25,000 per violation.

    Contact: For questions or to confirm compliance, please reach out to the OSHA regional office at (555) 123-4567 or email compliance.osha@osha.gov.
    """,

I have tried many other prompts including a single-shot prompt but to no avail:

"""
You are an expert email extractor. Extract the following email headers from the text below:
Date: 
From: 
Reply-To: 
To: 
Cc: 
Message-ID: 
In-Reply-To: 
References: 
Subject: 

Extract date from the Date field, name and email from the From field, project id from the Subject or email body text, 
phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text.
If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. 
Don't populate fields if they're not present in the email.

Here's the email:
{email}
""",
"""
You are an expert email extractor.
Extract date from the Date: field, name and email from the From: field, project id from the Subject: or email body text, 
phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text.
If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. 
Don't populate fields if they're not present in the email.

Here's the email:
{email}
""",
"""
You are an expert email parser.
Parse the date of notice, sending entity name, sending entity
phone, sending entity email, project id, site location,
violation type, required changes, compliance deadline, and
maximum potential fine from the message. If any of the fields
aren't present, don't populate them. Try to cast dates into
the YYYY-mm-dd format. Don't populate fields if they're not
present in the message.

Here's the notice message:

{message}
""",                

"""
You are an expert email parser.
Parse date from the Date: field, name and email from the From: field, project id from the Subject: or email body text,
phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text.
If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. 
Don't populate fields if they're not present in the email.

email:
Date: Wed, 02 Apr 2025 15:39:59 -0700
From: Occupational Safety and Health Administration (OSHA) <admin@osha.com>
Reply-To: "Occupational Safety and Health Administration (OSHA)" <admin@reply.osha.com>
To: "Blue Ridge Construction" <admin@blueridge.com>
Cc: Donald Duck <donald@duck.com>, Comment <comment@noreply.osha.com>
Message-ID: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
In-Reply-To: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
References: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com>
Subject: Re: Project 111232345 - Downtown Office Complex Location: Dallas, TX

During a recent inspection of your construction site at 123 Main
Street, the following safety violations were identified:

Lack of fall protection: Workers on scaffolding above 10 feet
were without required harnesses or other fall protection
equipment. 

Unsafe scaffolding setup: Several scaffolding structures were noted as
lacking secure base plates and bracing, creating potential
collapse risks.

Inadequate personal protective equipment (PPE): Multiple workers were
found without proper PPE, including hard hats and safety glasses.

Required Corrective Actions:
Install guardrails and fall arrest systems on all scaffolding
over 10 feet. Conduct an inspection of all scaffolding structures and reinforce unstable sections. 
Ensure all workers on-site are provided with necessary PPE and conduct safety training on proper
usage.

Deadline for Compliance: All violations must be rectified by November 10, 2025. Failure to comply may result in fines
of up to $25,000 per violation.

Contact: For questions or to confirm compliance, please reach out to the OSHA regional office at (555) 123-4567 or email compliance.osha@osha.gov.

date: 02-04-2025
name: Occupational Safety and Health Administration (OSHA)
email: admin@osha.com
phone: (555) 123-4567
project_id: 111232345
site_location: 123 Main Street, Dallas, TX
violation_type: Lack of fall protection, Unsafe scaffolding setup, Inadequate personal protective equipment (PPE)
required_changes: 'Install guardrails and fall arrest systems on all
                    scaffolding over 10 feet. Conduct an inspection of all scaffolding
                    structures and reinforce unstable sections. Ensure all workers
                    on-site are provided with necessary PPE and conduct safety training
                    on proper usage.'
max_potential_fine: 25000.0,
compliance_deadline: 10-11-2025

Here's the email:
{email}
""",
Originally created by @khteh on GitHub (Apr 3, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10102 I have the following prompts: ``` _email_parser_prompt = ChatPromptTemplate.from_messages( [ ( "system", """ You are an expert email parser. Parse the date of email, sender's name, sender's phone, sender's email, project id, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email. If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. Don't populate fields if they're not present in the email. Here's the email: {email} """, ), ("human", "{email}"), #("placeholder", "{email}"), #should be a list of base messages ] ) ``` and the following `EmailModel`: ``` class EmailModel(BaseModel): date_str: str | None = Field( default=None, exclude=True, repr=False, description="The date of the email reformatted to match mm-dd-YYYY. This is usually found in the Date: field in the email.", ) name: str | None = Field( default=None, description="The name of the email sender. This is usually found in the From: field in the email formatted as name <email>", ) phone: str | None = Field( default=None, description="The phone number of the email sender (if present in the message). This is usually found in the signature at the end of the email body.", ) email: str | None = Field( default=None, description="The email addreess of the email sender (if present in the message). This is usually found in the From: field in the email formatted as name <email>", ) project_id: int | None = Field( default=None, description="The project ID (if present in the message) - must be an integer", ) site_location: str | None = Field( default=None, description="The site location of the project (if present in the message). Use the full address if possible.", ) violation_type: str | None = Field( default=None, description="The type of violation (if present in the message)", ) required_changes: str | None = Field( default=None, description="The required changes specified by the email (if present in the message)", ) compliance_deadline_str: str | None = Field( default=None, exclude=True, repr=False, description="The date that the company must comply (if any) reformatted to match YYYY-mm-dd", ) max_potential_fine: float | None = Field( default=None, description="The maximum potential fine (if any)", ) @staticmethod def _convert_string_to_date(date_str: str | None) -> date | None: try: return datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %z') if date_str else None except Exception as e: print(e) return None @computed_field @property def date_of_email(self) -> date | None: return self._convert_string_to_date(self.date_str) if self.date_str else None @computed_field @property def compliance_deadline(self) -> date | None: return self._convert_string_to_date(self.compliance_deadline_str) if self.compliance_deadline_str else None ``` And the following chain: ``` self._chainLLM = init_chat_model("llama3.2", model_provider="ollama", temperature=0) self._email_parser_chain = ( self._email_parser_prompt | self._chainLLM.with_structured_output(EmailModel) ) ``` It fails to parse the email into the required attributes of `EmailModel`: ``` [llama3.2](extract: name=None phone=None email='admin@osha.com' project_id=None site_location='123 Main Street, Dallas, TX' violation_type=None required_changes=None max_potential_fine=25000.0 date_of_email=datetime.datetime(2025, 4, 2, 15, 39, 59, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))) compliance_deadline=None) ``` ``` @pytest.mark.asyncio(loop_scope="function") async def test_email_parser_chain(EmailRAGFixture): state = { "email": EMAILS[0], "extract": None, "escalation_text_criteria": "\"There's a risk of fire or water damage at the site\"", "escalation_dollar_criteria": 100_000, "escalate": False, "escalation_emails": ["brog@abc.com", "bigceo@company.com"], } config = RunnableConfig(run_name="Email RAG Test", thread_id=datetime.now()) result: EmailRAGState = await EmailRAGFixture.ParseEmail(state, config) print(f"extract: {result['extract']}") assert result assert result["extract"] assert result["extract"].date_str assert result["extract"].name == 'Occupational Safety and Health Administration (OSHA)' assert result["extract"].phone assert result["extract"].phone == "(555) 123-4567" assert result["extract"].email assert result["extract"].email == "compliance.osha@osha.gov" assert result["extract"].project_id assert result["extract"].project_id == 111232345 assert result["extract"].site_location assert result["extract"].site_location == "123 Main Street, Dallas, TX" assert result["extract"].escalate assert result["extract"].violation_type assert result["extract"].required_changes assert result["extract"].compliance_deadline assert result["extract"].max_potential_fine assert result["extract"].max_potential_fine == 25000.0 ``` Sample email: ``` """ Date: Wed, 02 Apr 2025 15:39:59 -0700 From: Occupational Safety and Health Administration (OSHA) <admin@osha.com> Reply-To: "Occupational Safety and Health Administration (OSHA)" <admin@reply.osha.com> To: "Blue Ridge Construction" <admin@blueridge.com> Cc: Donald Duck <donald@duck.com>, Comment <comment@noreply.osha.com> Message-ID: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> In-Reply-To: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> References: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> Subject: Re: Project 111232345 - Downtown Office Complex Location: Dallas, TX During a recent inspection of your construction site at 123 Main Street, the following safety violations were identified: Lack of fall protection: Workers on scaffolding above 10 feet were without required harnesses or other fall protection equipment. Unsafe scaffolding setup: Several scaffolding structures were noted as lacking secure base plates and bracing, creating potential collapse risks. Inadequate personal protective equipment (PPE): Multiple workers were found without proper PPE, including hard hats and safety glasses. Required Corrective Actions: Install guardrails and fall arrest systems on all scaffolding over 10 feet. Conduct an inspection of all scaffolding structures and reinforce unstable sections. Ensure all workers on-site are provided with necessary PPE and conduct safety training on proper usage. Deadline for Compliance: All violations must be rectified by November 10, 2025. Failure to comply may result in fines of up to $25,000 per violation. Contact: For questions or to confirm compliance, please reach out to the OSHA regional office at (555) 123-4567 or email compliance.osha@osha.gov. """, ``` I have tried many other prompts including a single-shot prompt but to no avail: ``` """ You are an expert email extractor. Extract the following email headers from the text below: Date: From: Reply-To: To: Cc: Message-ID: In-Reply-To: References: Subject: Extract date from the Date field, name and email from the From field, project id from the Subject or email body text, phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text. If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. Don't populate fields if they're not present in the email. Here's the email: {email} """, """ You are an expert email extractor. Extract date from the Date: field, name and email from the From: field, project id from the Subject: or email body text, phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text. If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. Don't populate fields if they're not present in the email. Here's the email: {email} """, """ You are an expert email parser. Parse the date of notice, sending entity name, sending entity phone, sending entity email, project id, site location, violation type, required changes, compliance deadline, and maximum potential fine from the message. If any of the fields aren't present, don't populate them. Try to cast dates into the YYYY-mm-dd format. Don't populate fields if they're not present in the message. Here's the notice message: {message} """, """ You are an expert email parser. Parse date from the Date: field, name and email from the From: field, project id from the Subject: or email body text, phone, site location, violation type, required changes, compliance deadline, and maximum potential fine from the email body text. If any of the fields aren't present, don't populate them. Try to cast dates into the dd-mm-YYYY format. Don't populate fields if they're not present in the email. email: Date: Wed, 02 Apr 2025 15:39:59 -0700 From: Occupational Safety and Health Administration (OSHA) <admin@osha.com> Reply-To: "Occupational Safety and Health Administration (OSHA)" <admin@reply.osha.com> To: "Blue Ridge Construction" <admin@blueridge.com> Cc: Donald Duck <donald@duck.com>, Comment <comment@noreply.osha.com> Message-ID: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> In-Reply-To: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> References: <blue-ridge-construction/fb4a803e-1035-11f0-90fb-93770151fc6c@osha.com> Subject: Re: Project 111232345 - Downtown Office Complex Location: Dallas, TX During a recent inspection of your construction site at 123 Main Street, the following safety violations were identified: Lack of fall protection: Workers on scaffolding above 10 feet were without required harnesses or other fall protection equipment. Unsafe scaffolding setup: Several scaffolding structures were noted as lacking secure base plates and bracing, creating potential collapse risks. Inadequate personal protective equipment (PPE): Multiple workers were found without proper PPE, including hard hats and safety glasses. Required Corrective Actions: Install guardrails and fall arrest systems on all scaffolding over 10 feet. Conduct an inspection of all scaffolding structures and reinforce unstable sections. Ensure all workers on-site are provided with necessary PPE and conduct safety training on proper usage. Deadline for Compliance: All violations must be rectified by November 10, 2025. Failure to comply may result in fines of up to $25,000 per violation. Contact: For questions or to confirm compliance, please reach out to the OSHA regional office at (555) 123-4567 or email compliance.osha@osha.gov. date: 02-04-2025 name: Occupational Safety and Health Administration (OSHA) email: admin@osha.com phone: (555) 123-4567 project_id: 111232345 site_location: 123 Main Street, Dallas, TX violation_type: Lack of fall protection, Unsafe scaffolding setup, Inadequate personal protective equipment (PPE) required_changes: 'Install guardrails and fall arrest systems on all scaffolding over 10 feet. Conduct an inspection of all scaffolding structures and reinforce unstable sections. Ensure all workers on-site are provided with necessary PPE and conduct safety training on proper usage.' max_potential_fine: 25000.0, compliance_deadline: 10-11-2025 Here's the email: {email} """, ```
Author
Owner

@ParthSareen commented on GitHub (Apr 3, 2025):

Llama3.2:3b is a pretty small model. I'd try bumping up the context size first, but honestly a bigger model is probably your best bet when passing in quite a bit of data.

<!-- gh-comment-id:2777167059 --> @ParthSareen commented on GitHub (Apr 3, 2025): Llama3.2:3b is a pretty small model. I'd try bumping up the context size first, but honestly a bigger model is probably your best bet when passing in quite a bit of data.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68680