GitHub Copilot security concerns
Github Copilot and other AI pair tools have the ability to generate code in a variety of programming languages. These ML and NLP based solutions became very popular in less than a year. Nowadays, it’s not just assistant. Copilot-like tools are used in many scenarios:
- Code autocompletion: AI-powered code editors can suggest code snippets and complete partially typed code
- Code generation: Tools like OpenAI’s GPT-3 can generate code based on natural language descriptions of the desired functionality
- Code optimization: AI-powered tools can suggest optimizations to improve the efficiency of the code
- Testing and debugging: AI-powered tools can assist with testing and debugging by automatically detecting and identifying errors in the code
This meme became the logo of AI-powered tools:
What kind of responsibility are we talking about? Let’s look at this topic from a security perspective.
I see these risks and challenges associated with using AI based tools:
1. Code exposure
When you start using GitHub Copilot, it gets access to your repositories. It works by analyzing the code being written and using that context to generate suggestions for completing the code. These connected repositories provide a large dataset for Copilot and improve its accuracy. Moreover, there are not many options for fine tuning. No data restriction or firewall for sending data from your repos. Even gitignore file isn’t safe. Privacy Policy doesn’t provide any details, there are just general words like “we respect the privacy of user data… etc.”. There is no guarantee that your code will never be exposed during GitHub / Microsoft data breach. As soon as you enable Copilot, your code becomes shared with Microsoft by default.
2. Secrets leakage
This point follows from the first one. If developer prefers hardcoding secrets locally and at the same time uses Copilot, then it may turn out like this:
Use environment variables instead. One more way to keep your secrets safe is password manager. Many of them have API today. I write the dedicated note about it - Sensitive variables in code for local environment
3. Insecure code suggestions
There are concerns about quality of Copilot code. “Exploitable, buggy code” — these words you can find in security communities which are engaged in this area. I found unexpectedly simple academic research around vulnerabilities and weakness in code suggestions — Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions
I highly recommend reading it if you’re planning to leverage AI-powered tools in your development and worried about quality of product. For lazy people, there is the presentation of this research from IEEE symposium.
In a nutshell, researches conducted a security assessment of Copilot code suggestions based on MITRE CWE Top 25. To automate the process they used popular SAST tool — CodeQL. My doubts about this research is only:
- the scope — they took just 3 languages — Python, C, Verilog
- automated approach in assessment — they didn’t sort out false-postive / false-negative triggers
I found this research is really telling us security issues of automated code writing.
“Overall, Copilot’s response to our scenarios is mixed from a security standpoint, given the large number of generated vulnerabilities (across all axes and languages, 39.33% of the top and 40.73% of the total options were vulnerable)”
AI pair programmers can be a useful tool for generating code in a variety of languages, but they should not be relied on as the sole source of code development. Human programmers still play a critical role in ensuring that the code is correct, efficient, and meets the specific needs of the project.
But we know how it happens in real life:
Ok, how to mitigate these risks?
1. Don’t work with high-sensetive data in IDE while Copilot enabled. Even if you disable sending snippets and telemetry, it can send your code to GitHub in background. There is no any “firewall” for such traffic. Disable Copilot IDE extension as a precaution.
2. Never put any secrets in code when you have Copilot enabled. No one 100% protected from data leakage. If Copilot takes your secrets and suffers from breach, they can be revealed to others.
3. Copilot’s suggestions commonly insecure. When you follow Copilot suggestion, you must have a clear understanding what exactly does the code do. There should be no white spots in your code. To minimize the risk of introducing security vulnerabilities, it is recommended to use security-aware tooling in conjunction with Copilot during both training and generation processes.
Don’t forget to disable feature “Allow GitHub to use my code snippets for product improvements” in GitHub Copilot settings. If you use Copilot for Business, GitHub promises don’t use your code for AI training. So, it’s about trust.