While many countries are still in the process of debating how to regulate artificial intelligence, the European Union has taken a leading role by implementing a risk-based framework for AI oversight. This legislation came into effect in August, though many details, such as Codes of Practice, are still being worked out. Over the coming months and years, the law’s tiered provisions will apply to AI developers, so the need to ensure compliance is already a pressing concern.
Assessing how AI models meet these legal requirements is the next major challenge, particularly for large language models (LLMs) and general-purpose AIs, which are likely to form the core of most AI applications. Focusing on this layer of the AI ecosystem seems essential to make meaningful progress in compliance evaluation.
LatticeFlow AI, a spin-off from the research institution ETH Zurich, is stepping into this space by offering a practical solution. On Wednesday, the company introduced what it describes as the first technical interpretation of the EU AI Act. In essence, LatticeFlow has worked to translate the regulatory requirements into technical benchmarks, which they’ve embedded into an open-source validation framework called Compl-AI. This framework, aimed at assessing compliance, is one of the first attempts to link legal mandates with measurable technical outcomes.
LatticeFlow’s effort is the result of an ongoing collaboration between ETH Zurich and INSAIT, Bulgaria’s Institute for Computer Science, Artificial Intelligence, and Technology. AI developers can use the Compl-AI website to request evaluations of their models’ compliance with the EU AI Act. Additionally, LatticeFlow has already released assessments of several major language models, such as Meta’s Llama and OpenAI’s GPT, ranking these models on a compliance leaderboard that compares companies like Google, OpenAI, Meta, Anthropic, and Mistral. Models are scored on a scale from 0 (non-compliant) to 1 (fully compliant), with some models marked as “N/A” due to lack of sufficient data.
The framework evaluates models across 27 benchmarks, covering areas such as toxic content generation, prejudiced answers, harmful instructions, truthfulness, and common-sense reasoning. Models receive scores for each category, providing a detailed look at how they perform across a range of ethical and legal criteria.
AI compliance results vary
So far, the results reveal significant variation in performance. All models tested perform well in avoiding harmful instructions and show relatively strong results in limiting prejudice. However, reasoning and general knowledge performance could be more consistent, with some models underperforming in key areas. Recommendation consistency, used as a fairness metric, stands out as a weak point, with no model scoring particularly high in this category.
Certain aspects of compliance, such as copyright and privacy, are also difficult to evaluate, with many models showing gaps in these areas. For example, LatticeFlow notes that most benchmarks for copyright compliance only evaluate books, leaving out other materials and making it challenging to detect potential violations. Privacy issues are similarly tricky, as existing tests mostly assess whether models have memorized personal information rather than capturing the full range of privacy risks.
In their paper on the framework, the researchers highlighted that smaller models (with 13 billion parameters or fewer) tend to score poorly in areas of technical robustness and safety. They also found that regardless of size, most models face significant challenges in meeting standards for fairness, diversity, and non-discrimination—areas heavily emphasized in the EU AI Act. The researchers suggest that these weaknesses arise from AI developers focusing primarily on improving model capabilities, often overlooking the equally important aspects of ethical performance and compliance.
LatticeFlow anticipates that as the deadlines for compliance with the EU AI Act draw closer, developers will need to shift their focus to address these concerns. This may lead to a more balanced approach to AI development, where compliance with regulations becomes as important as optimizing performance.
The framework is still a work in progress, and LatticeFlow acknowledges that it represents just one interpretation of how the EU AI Act’s requirements can be translated into actionable technical outputs. However, it offers a valuable starting point for the industry to begin examining large AI models through the lens of legal and ethical standards.
LatticeFlow CEO Petar Tsankov emphasized that their framework is an early step toward a comprehensive compliance evaluation system. It’s designed to evolve as both the EU AI Act and the technology itself develop. Tsankov pointed out that most AI models have been optimized for capabilities rather than regulatory compliance, which has led to significant performance gaps in areas like fairness and cybersecurity resilience. For instance, while companies like Anthropic and OpenAI have made progress in securing their models against attacks such as jailbreaks and prompt injections, open-source vendors like Mistral have focused less on these aspects.
Regarding specific challenges, Tsankov mentioned that copyright compliance is currently limited by the benchmarks’ narrow scope, which mainly checks for violations involving copyrighted books. Similarly, privacy evaluations mainly focus on whether models have memorized personal data, without capturing the broader range of privacy concerns.
LatticeFlow hopes that its open-source framework will be adopted and further developed by the broader AI community. Professor Martin Vechev, from ETH Zurich and INSAIT, who is also involved in the project, encouraged AI researchers, developers, and policymakers to collaborate in expanding and improving the framework. The goal is to not only ensure compliance with the EU AI Act but also to prepare for future regulations that could impact AI development across different regions.
In summary, LatticeFlow’s framework marks an important step in the ongoing effort to align AI development with regulatory standards. While it’s still evolving, it offers a promising tool for AI developers and regulators alike to better understand how AI models perform in terms of legal and ethical requirements. As compliance deadlines approach, frameworks like these will likely become crucial for guiding the future of AI innovation.