Technologies that fundamentally change society only come around once every decade or so. The Internet was one. Artificial Intelligence (A.I.) is the next. A.I. has the potential to improve lives and reshape industries from healthcare to financebut A.I. can only be as good as the quality of data its trained on.
The extensive growth of text, images, videos, and audio available on the public web has fueled the rise of A.I. models by providing a constantly expanding source of information. This is whyresearchers predict that AI, already a $137 billion industry, will grow more than 37% each year this decade.
For instance, Meta recently released LLaMA, a collection of foundation language models that aim at democratizing access to A.I. research. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, the Facebook parent said.
However, even as it touts the importance of publicly available data to A.I., Meta is simultaneously pursuinglitigation to close access to public web data that it acknowledges it does not own.
If Big Tech is allowed to build a walled garden around data thats present in the public domain (meaning data that isnt behind a login), it will prevent A.I. from reaching its full potential.
Looking ahead, the volume of data and information created, captured, copied, and consumed worldwide is expected to reach 120 zettabytes this yearnearly triple what it was in just 2019.
If publicly available web data is stripped from the public and held onto only by the most powerful companies, the ability for A.I. to advance in a way that benefits society would be severely limited. If only a few companies were developing cutting-edge A.I., its development will not be aligned with humanitys best interests.
Publicly available data is not only the lifeblood of emerging artificial intelligence tools, but its also essential for current business operations. Companies and nonprofits alike rely on publicly available web data to efficiently and effectively carry out their missions, with 94% using it on a daily basis, according to asurvey of 150 IT, technology, and data analytics experts from U.S. retail, technology, and nonprofit organizations. In this survey, nearly four out of five respondents stated they would be unable to operate effectively without access to public web data.
The potential for A.I. to be used for social good is equally exciting. For example, through our pro bono program, The Bright Initiative, we assist nonprofit, academic and charitable organizations, helping them tackle serious social problems such as antisemitism, hate speech, and human trafficking.
More broadly, developers must have access to the datasets they need to ethically train A.I. By providing a vast amount of diverse and up-to-date information, public web data can be used to train machine learning models, improve accuracy, and ensure A.I. is aligned with humanitys goals.
Or Lenchner is the CEO of Bright Data, a web data platform dedicated to maintaining transparent access to public web data for all.
The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.