Meta Employees Torrented Terabytes of Pirated Books to Train AI Models, Court Documents Reveal
February 11, 2025
Recently unsealed court documents have revealed that Meta, the parent company of Facebook, Instagram, and WhatsApp, allegedly downloaded terabytes of pirated books to train its artificial intelligence models. The documents suggest that not only did employees engage in large-scale torrenting of copyrighted materials, but they also openly discussed their concerns about the legality and ethics of their actions through internal emails.
Massive Copyright Violations Alleged
The lawsuit, filed by a group of authors and publishers, claims that Meta systematically acquired vast amounts of copyrighted content through torrenting— a peer-to-peer file-sharing method often associated with digital piracy. The court documents show that Meta employees obtained entire datasets of books, which were then used to train the company’s AI models, including its generative AI systems.
Among the pirated materials were works from bestselling authors, technical manuals, and academic texts—content that typically requires licensing agreements for legal use in AI training. Instead of securing those licenses, Meta employees allegedly turned to torrenting sites to amass a vast corpus of books.
Employees Raised Concerns
One of the most striking revelations in the court filings is the internal email exchanges between Meta employees, where some staff members appeared to be aware that their methods could raise legal and ethical issues.
“Torrenting from a corporate laptop doesn’t feel right,” one employee wrote in an email exchange, according to the filings. Another employee reportedly questioned whether the company had secured the proper rights to the books being downloaded. Despite these concerns, the torrenting and use of pirated content allegedly continued.
The emails paint a picture of a company whose employees were aware of potential copyright violations but lacked clear guidelines or oversight to prevent them.
Meta’s Response
Meta has not directly addressed the specific allegations of torrenting but has broadly denied any wrongdoing. The company argues that its AI training datasets comply with fair use laws and that it sources publicly available content in ways that align with industry standards.
A Meta spokesperson said in a statement: “We are committed to developing AI responsibly and in compliance with copyright law. We do not comment on ongoing litigation, but we believe these claims misrepresent our practices.”
Legal and Industry Implications
The case against Meta adds to the growing controversy surrounding the use of copyrighted material in AI training. In recent months, several major AI companies, including OpenAI and Google, have faced lawsuits from authors, artists, and publishers who claim their work was used without permission.
Legal experts suggest that if Meta is found guilty of willfully using pirated materials, the company could face significant financial penalties and restrictions on how it trains its AI models. The lawsuit also raises broader questions about how tech giants handle intellectual property rights in the age of generative AI.
The case is still unfolding, but the revelations from the court documents could have major consequences for Meta and the AI industry at large. If proven, the allegations could fuel calls for stricter regulations on how companies acquire and use copyrighted content in AI development.