This project is a proof-of-concept Legal Assistant chatbot specializing in Canadian Music Law, developed during my internship at AdventSys Technologies. The assistant was designed to provide concise, domain-specific legal reasoning and Q&A capabilities for issues involving SOCAN (Society of Composers, Authors and Music Publishers of Canada) and the Copyright Act of Canada.
The system was built using the Mistral-7B Instruct v0.3 model as the base and fine-tuned with a LoRA (Low-Rank Adaptation) adapter for efficiency. A curated dataset was constructed from SOCAN documentation and relevant sections of the Canadian Copyright Act, structured according to the CUAD (Contract Understanding Atticus Dataset) schema, ensuring alignment with established benchmarks for legal NLP.
Key contributions include:
Designing and implementing a fine-tuning pipeline for domain adaptation.
Annotating and structuring legal datasets to support accurate contract-style reasoning.
Deploying a Gradio-based demo interface, allowing users to interact with the assistant in a Q&A format.
Establishing the foundation for future scaling into broader applications in Canadian IP and copyright law.
This project demonstrates applied skills in LLM development, dataset engineering, fine-tuning, and legal domain adaptation, bridging technical expertise with practical applications in the music industry. pdf DomainSpecificLegalAssistant…This project is a proof-of-concept Legal Assistant chatbot specializing in Canadian Music Law, developed during my internship at AdventSys Technologies. The assistant was designed to provide concise, domain-specific legal reasoning and Q&A capabilities for issues involving SOCAN (Society of Composers, Authors and Music Publishers of Canada) and the Copyright Act of Canada.
The system was built using the Mistral-7B Instruct v0.3 model as the base and fine-tuned with a LoRA (Low-Rank Adaptation) adapter for efficiency. A curated dataset was constructed from SOCAN documentation and relevant sections of the Canadian Copyright Act, structured according to the CUAD (Contract Understanding Atticus Dataset) schema, ensuring alignment with established benchmarks for legal NLP.
Key contributions include:
Designing and implementing a fine-tuning pipeline for domain adaptation.
Annotating and structuring legal datasets to support accurate contract-style reasoning.
Deploying a Gradio-based demo interface, allowing users to interact with the assistant in a Q&A format.
Establishing the foundation for future scaling into broader applications in Canadian IP and copyright law.
This project demonstrates applied skills in LLM development, dataset engineering, fine-tuning, and legal domain adaptation, bridging technical expertise with practical applications in the music industry. pdf DomainSpecificLegalAssistantWW…