Go Big or Go Home: What GitHub Learned Building Copilot

In an ACM tech talk, a principal researcher for GitHub's R&B arm shares the lessons he has learned developing a generative AI-enhanced app for coders.

Oct 4th, 2024 1:01pm by Joab Jackson

Copilot

Launched in June 2021, GitHub’s Copilot feature now has 1.8 million paying users, with 50% of their code written by Copilot. For better or worse, the feature, an AI-driven code completion tool, is popular with users.

But how did GitHub develop this cutting-edge AI-based app?

Earlier this week, Eddie Aftandilian, a principal researcher for GitHub’s R&B arm, GitHub Next, summarized the lessons he and his team learned building out the new products and their offshoots in a virtual talk hosted by the Association for Computing Machinery (you can enjoy the full talk here).

GitHub Next is an advanced 18-person research team within GitHub. The team here created the original GitHub Copilot in conjunction with OpenAI and then went on to create Copilot Workspace and Copilot X — Along with a lot of other projects that “didn’t succeed” along the way, Aftandilian said.

“So we’ve learned a lot from working on AI developer tools for the past four years,” Aftandilian said.

These days, every business is grappling with how to use generative AI, suspecting if they don’t get on top of this new trend, their competitors will pass them. Here are a few of the ideas that Aftandilian advised to keep in mind:

Think Big

“Being a research team within a big company is a precarious job. You’re different from all the other teams,” Aftandilian said.

The goal is not to deliver incremental value to an existing product. Rather it is to create something new entirely. This, of course, is a higher risk, and a lot of what a research team does will result in failure.

GitHub Next assumes research projects will have an 80% failure rate. If they are too successful, it means they aren’t being ambitious enough.

Oddly enough, building a new thing for an organization can actually be easier than adding an incremental feature to an existing product, he said. There are fewer politics to deal with, given the fact that there is already a product team to deal with. Plus, the changes are that the existing product engineering team might have built something similar anyway.

So you have to bet on something that will lead to “asymmetric value” or something that could return a large amount of value if it works, Aftandilian advised.

At the time, GitHub had no plans for any sort of code-completion tool anyway, so it was a green field for the research team.

The original team that built Copilot had only 10 people. “One of the great things about software is that a small team of great people can build something with huge impact,” Aftandilian said.

Deliver Results Early

When taking on a new project, it is tempting to set up a multi-year plan to get the full functionality needed. But if you have nothing to show management, they will start wondering what your team is doing.

“It’s okay to have a long-term vision and an overarching plan, but you need to drop off useful things regularly, or it’s going to look like you’re doing nothing,” Aftandilian said.

Rapidly delivering results can also mean pivoting quickly.

In 2022, the team built a vector database system aimed to provide natural language “neural retrieval” code search for machine learning apps. Although the results weren’t good enough for a stand-alone product, the team applied them to building a chatbot that would help Copilot mine private codebases for company-specific content. Copilot itself used only standard models built from public code bases. This feature later became the base for GitHub Copilot for Docs.

The idea is to do the least amount of work early on to show to decision-makers. “A usable prototype is much more useful than a series of charts and graphs,” Aftandilian said. It also allows you to test decisions early on.

“As researchers, one of the mistakes we make is valuing difficulty for novelty, when really we should celebrate simple solutions to hard problems,” Aftandilian said.

This is doubly true in the era of the Large Language Model (LLM), he added.

“If you can solve a problem purely through prompting an off-the-shelf model, you should do that.”

Photo of Eddie Aftandilian.

Set Deadlines, but Pick Your Battles

“Setting deadlines focuses the team and forces you to make hard decisions, and great teams will rise to the challenge,” Aftandilian said. The deadlines must be tied to an external commitment. “People are smart and will see through a fake commitment,” he added.

Balance is the key. You don’t want to burn the team out with constant deadlines, so work some slow times, “where the team has time to explore new ideas.”

The original pilot of Copilot for Docs (a component of Copilot X) was a success, and GitHub wanted a working model ASAP. So, the research team agreed to an eight-week sprint to polish the software for production.

“It was fun to rally towards something big. And even though it was intense, the people who worked on Copilot for Docs will tell you that it’s the most fun that they had at Next,” Aftandilian boasted.

That said, you must have some discipline at picking projects and, equally important, knowing when to drop those projects.

“It’s tempting to let stagnating projects continue in the hope that they’ll unstick themselves or to avoid conflict,” he said.

In fact, many projects aren’t really stagnating, just moving forward really slowly.

Nonetheless, they incur what Aftandilian called an “opportunity cost,” where the team spends time maintaining projects while missing out on greater advances that come with working on other things.

“Time you spend maintaining things you’ve already built is time you’re not spending building new things,” Aftandilian said.

With the original Copilot, a few of the research engineers went “on loan” for about a year to help the new Copilot product team get started, but after things were running smoothly, they returned to the research arm.

Get the User Interface Right

Research labs at most companies have many projects that delivered on their promise but were still little used. Take code refactoring and static analysis tools, for instance. There are many of these for each IDE, but they tend to be ignored by coders.

The problem may very well be that they don’t yet have good interfaces to make them easily usable, Aftandilian argued.

“Usually, the people building the tools don’t have UI expertise to find that right, but getting the UX right is the difference between success and failure,” he said.

This was a lesson Aftandilian learned with Copilot Workspaces, which, unlike the standard AI-driven autocomplete, helps the developer solve a single development task in depth. A user could extract a specification of some code from a repository, edit that specification, then apply the edit to the code.

While the workflow for this process remained the same, the user interface was totally revamped by a frontend developer, who made it consistent with GitHub’s design language and made it “look like a real product,” Aftandilian said.

Further revisions of the UX came about to match it closer to how developers were using the tool. The latest, still experimental, version of the UI, focuses on how the developers could use the tool to brainstorm solutions with AI.

“We spend a lot of time and effort on the UX of copilot workspace, and we believe that having a well-designed and polished UX is critical to making it something our users love to use,” he said.

Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 25 years, including stints at IDG and Government Computer News. Before that, he...