Traceability in AI-Enhanced Code: A Developer’s Guide
The swift and widespread adoption of Generative AI has permeated business sectors across the globe. With transcription tools and content creation readily available, AI’s potential to reshape the future is endless. From software tools that AI will render obsolete to new ways of coding, it poses profound challenges for software development and the industry.
Today, the industry faces the challenge of solving a riddle: If a developer has taken a piece of code and modified it with AI, is it still the same code? One of the significant challenges facing software developers is how to do this without hampering creativity or overstepping the line regarding copyright or licensing laws.
To date, officials are stymied. The regulatory environment for AI is still evolving as policymakers and regulators work to address the potential ethical, security, and legal challenges posed by AI technologies. The U.S. Copyright Office has explored how AI-generated works intersect with copyright law. Still, there needs to be an established, comprehensive code or legal framework specifically governing how AI developers use copyrighted materials. In the UK, the Intellectual Property Office (IPO) confirmed recently that it has been unable to facilitate an agreement for a voluntary code of practice that would govern the use of copyright works by AI developers.
Balancing intellectual property rights with technological advancement as AI evolves remains a significant issue.
AI and OS — A Perfect Match
Open-source software provides fertile ground for training AI models because it lacks restrictions associated with proprietary software. It gives AI access to many standard code bases that run infrastructures worldwide. At the same time, it is exposed to the acceleration and improvements AI generates, further enhancing Open Source development capabilities.
Developers, too, massively benefit from AI because they can ask questions, get answers, and, right or wrong, use AI as a basis to create something to work with. This significant productivity gain is rapidly accelerating and refining coding. Developers can leverage AI to solve mundane tasks quickly, get inspiration, or source alternative examples of something they thought was a perfect solution.
Total Certainty and Transparency
However, it’s not all upside. The integration of AI into OSS has complicated licensing implications. General Public License (GPL) is a series of widely used free software licenses (there are others, too), or copyleft, that guarantee end users four freedoms: to run, study, share, and modify the software. Under these licenses, any software modification needs to be released within the same software license. If a code is licensed under GPL, any modification must also be GPL-licensed.
Therein lies the issue. Unless there is total transparency in how the software has been trained, it is impossible to be sure of the appropriate licensing requirements or how to license it in the first place. Traceability is paramount if copyright infringement and other legal complications are to be avoided. Additionally, there is the ethical question — if a developer has modified a piece of code, is it still the same code? We’ve covered that in more detail here.
Traceability
So the pressing issue is this: What practical steps can developers take to safeguard themselves against the code they produce, and what role can the rest of the software community — OSS platforms, regulators, enterprises, and AI companies — play in helping them do that? OSS offers transparency to support integrity and confidence in traceability because everything is exposed and can be observed. A mistake or oversight in proprietary software might happen, but because it is a closed system, the chances of seeing, understanding, and repairing the error are practically zero. Developers working in OSS operate in full view of a community of millions. The community requires certainty about where a source code from a third party originated — is it a human, or is it AI?
Foundations
Apache Software Foundation has a directive saying maintainers of their projects shouldn’t take source code done by AI. AI can assist them, but the code they contribute is the developer’s responsibility. If it turns out that there is a problem, then it’s the developers’ issue to resolve. Many companies, including Aiven, have a similar protocol. Our guidelines state that developers can use only the pre-approved constrained Generative AI tools. Still, developers are responsible for the outputs and need to be scrutinized and analyzed, not simply taken as they are. This way, we can ensure that we comply with the highest standards. What guidelines and standards has your company established, and how can you help establish them? These are good questions to ask if none exist.
Beyond this, there are ways organizations using OSS can also play a role by taking steps to safeguard their risks in the process. This includes establishing an internal AI Tactical Discovery team — explicitly created to focus on the challenges and opportunities created by AI. In one case, our team led a project to critique OSS code bases, using tools like Software Composition Analysis to analyze the AI-generated codebase, comparing it against known open-source repositories and vulnerability databases.
Creating a Root of Trust in AI
Despite efforts today, creating new licensing and laws around AI’s role in software development will take time. Consensus, which will be attained with investigation, review, and discussion, is required regarding the specifics of AI’s role and the terminology used to describe it. This challenge is magnified by the speed of AI development and its application in code bases. This process moves much quicker than those trying to put parameters in place to control it.
When assessing whether AI has provided copied OSS code as part of its output, factors such as proper attribution, license compatibility, and ensuring the availability of the corresponding open source code and modifications are necessary. It would also help if AI companies started adding traceability to their source code. This will create a root of trust that has the potential to unlock significant benefits in software development.