A bit more than a year ago this (https://www.understandingai.org/p/metas-llama-31-can-recall-42-percent) was discussed quietly with the question "If an llm was trained on stolen source code from say Microsoft and then later regurgitates an exact part of that code, what is the legal status of that output?" Where exactly is the threshold where it becomes theft by proxy? Is it a large c struct with matching names and typed properties? Or does it need to be an entire header/definition file pair like with c++ .h and .cpp?
On a more humorous note, a lot of people are half seriously talking about de-compiling NVIDIA's drivers, running that through more than a couple LLM's to clean it up, and then when it can compile and works, publish that as open source. Most want to see what the green machine and leather jacket man will do.