Previously, that paragraph stated that publicly available data would be used to train “language models,” with only Google Translate mentioned.
As a result, this section has been expanded to make it more clear.
Analysis: So, what about concerns about privacy, plagiarism, and other issues?
We already knew that Google’s Bard, as well as Microsoft’s Bing AI, are essentially massive data hoovers, extracting and crunching online content from all over the web to refine conclusions on any topic under the sun that they may be questioned on.
After all, Google has had Bard out for some time, so it has been working in this manner for some time, and it has only recently decided to update its policy. That alone appears to be quite devious.
I don’t want anything you’ve posted.
Of course, there are broader issues concerning accuracy and misinformation when data is scraped from the web on a large scale.
On top of that, platforms like Reddit and Twitter have recently expressed concerns, with Elon Musk opposing “scraping people’s public Twitter data to build AI models” with the frustrating limitations that have recently been implemented (which could be a big win for Zuckerberg and Threads, ultimately).
All of this is a minefield, to be sure, but the big tech companies making big strides with their LLM (large language model) data-scraping AIs are simply forging ahead, all eyes on their rivals and the race to establish themselves at the forefront, seemingly with barely a thought about how some of their competitors are doing.