Artificial Intelligence

Here is a noteworthy product announcement from Cloudflare. To combat the increase of AI-related web crawlers, the company is introducing tools to assist companies in managing and monetizing their website data. According to their blog post, this also tackles the issue of AI services decreasing website traffic by users opting for LLM services instead of visiting websites.

The rise of AI Large Language Models (LLMs) and other generative tools created a murkier third category. Unlike malicious bots, the crawlers associated with these platforms are not actively trying to knock your site offline or to get in the way of your customers. They are not trying to steal sensitive data; they just want to scan what is already public on your site.

However, unlike helpful bots, these AI-related crawlers do not necessarily drive traffic to your site. AI Data Scraper bots scan the content on your site to train new LLMs. Your material is then put into a kind of blender, mixed up with other content, and used to answer questions from users without attribution or the need for users to visit your site. Another type of crawler, AI Search Crawler bots, scan your content and attempt to cite it when responding to a user’s search. The downside is that those users might just stay inside of that interface, rather than visit your site, because an answer is assembled on the page in front of them.

See also this news article on The Register.

Tim Harford released another outstanding episode in his “Cautionary Tale” podcast series. The episode is called “Flying Too High: AI and AirFrance Flight 447” and tells a a frightening tale of a fatal plane crash caused by pilot errors when the fly-by-wire system temporarily malfunctioned.

The episode presents interesting concepts, one of which is the de-skilling of pilots, causing them to make incorrect assessments and take wrong actions because of lack of experience of flying without the systems. The automation paradox is that as automation becomes more sophisticated and reduces the need for human intervention, human skills become even more critical for handling exceptional and often dangerous situations.

The questions raised by Sisi Wei, editor-in-chief at The Markup, in a recent article shed light on the dilemmas faced by journalists when covering AI-generated pictures. She questions whether the news articles should contain the generated images and, if so, how to label them or what kinds of disclaimers to include. As she notes, this issue is difficult because readers may not pay attention to the caption. The following is a quote from the article.

There’s no question to me that anyone who comes into contact with the internet these days will need to start questioning if the images they’re seeing are real. But what’s our job as journalists in this situation? When we republish viral or newsworthy images that have been altered or were generated by AI, what should we do to make sure we’re giving readers the information they need? Doing it in the caption or the headline isn’t good enough—we can’t assume that readers will read them.

Article on ChatGTP that is worth reading, but is likely flawed. The fact that Noam Chomsky contributed and included his own theory of language is noteworthy.

In short, ChatGPT and its brethren are constitutionally unable to balance creativity with constraint. They either overgenerate (producing both truths and falsehoods, endorsing ethical and unethical decisions alike) or undergenerate (exhibiting noncommitment to any decisions and indifference to consequences). Given the amorality, faux science and linguistic incompetence of these systems, we can only laugh or cry at their popularity.

Clive Thompson makes some interesting remarks about the story of the Google engineer Blake who became convinced that Google’s conversation technology LaMDA. In a blog post he references the work of Sherry Turkle who showed how humans perceive robots as more real when robots seem needy:

This is something I’ve learned from the work of Sherry Turkle, the famous MIT scientist who studies the relationship between humans and machines. Turkle has studied a ton of robot-human interactions, and talked to a lot of users (and designers) of robots that are designed for human companionship— i.e. toy-robot babies, or toy-robot animals.

One thing she noticed? The more that a robot seems needy, the more real it seems to us.

In the following scientific article on the use of data sets in AI research the authors found that there is an “increasing concentration on fewer and fewer datasets introduced by a few elite institutions”:

We find increasing concentration on fewer and fewer datasets within most task communities. Consistent with this finding, the majority of papers within most tasks use datasets that were originally created for other tasks, instead of ones explicitly created for their own task—even though most tasks have created more datasets than they have imported. Lastly, we find that these dominant datasets have been introduced by researchers at just a handful of elite institutions.

Found via this article shared on HackerNews.

Facebook/Meta is shutting down its facial recognition system. They explain their choice in this blog post.

But the many specific instances where facial recognition can be helpful need to be weighed against growing concerns about the use of this technology as a whole. There are many concerns about the place of facial recognition technology in society, and regulators are still in the process of providing a clear set of rules governing its use. Amid this ongoing uncertainty, we believe that limiting the use of facial recognition to a narrow set of use cases is appropriate.

Surprising conclusions from Twitter on a recent controversy about a bias of their image cropping algorithm towards white people and women.

We considered the tradeoffs between the speed and consistency of automated cropping with the potential risks we saw in this research. One of our conclusions is that not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people.

Via The Register.

The Register reports on a paper that aims to show how Big Tech has adopted similar strategies similar to Big Tobacco to influence AI ethics research, policy, and generally spreading doubts about the harms of AI:

The analogy “is not perfect,” the two brothers acknowledge, but is intended to provide a historical touchstone and “to leverage the negative gut reaction to Big Tobacco’s funding of academia to enable a more critical examination of Big Tech.” The comparison is also not an assertion that Big Tech is deliberately buying off researchers; rather, the researchers argue that “industry funding warps academia regardless of intentionality due to perverse incentive.

The Register reports on a controversy surrounding the automatic image-cropping functionality of Twitter:

When previewing pictures on the social media platform, Twitter automatically crops and resizes the image to match your screen size, be a smartphone display, PC monitor, etc. Twitter uses computer-vision software to decide which part of the pic to focus on, and it tends to home in on women’s chests or those with lighter skin. There are times where it will pick someone with darker skin over a lighter-skinned person, though generally, it seems to prefer women’s chests and lighter skin.

It seems Twitter has not come up with a technical fix, but is instead resorting to. Read the full article here.