Tech News : OpenAI’s Video Gamechanger

OpenAI’s new ‘Sora’ AI-powered text-to-video tool is so good that its outputs could easily be mistaken for real videos, prompting deepfake fears in a year of important global elections.

Sora 

Open AI says that its new Sora text-to-video model can generate realistic videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora can both generate entire videos “all at once” or extend generated videos to make them longer.

According to OpenAI Sora can: “generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background”. 

How? 

Although Sora is based on OpenAI’s existing technologies such as its DALL-E and image generator and the GPT large language models (LLMs), what makes its outputs so realistic is the combination of Sora being a diffusion model and using “transformer architecture”. For example, as a diffusion model, Sora’s video-making process starts off with something looking like “static noise,” but this is transformed gradually by removing that ‘noise’ over many steps.

Also, transformer architecture means the “model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world”, i.e. it contextualises and pieces together sequential data.

Other aspects that make Sora so special are how its “deep understanding of language” enable it to accurately interpret prompts and “generate compelling characters that express vibrant emotions,” and the fact that Sora can “create multiple shots within a single generated video that accurately persist characters and visual style”. 

Weaknesses 

OpenAI admits, however, that Sora has its weaknesses, including:

– Not always accurately simulating the “physics of a complex scene” or understanding the cause and effect. OpenAI gives the example of a person taking a bite out of a cookie, but afterward, the cookie may not have a bite mark.

– Confusing spatial details of a prompt, e.g. mixing up left and right.

– Struggling with precise descriptions of events that take place over time, e.g. following a specific camera trajectory.

Testing & Safety 

The potential and the power of Sora (for both good and bad) mean that OpenAI appears to be making sure it’s been thoroughly tested before releasing it to the public. For example, it’s currently only available to ‘red teamers’ who are assessing any potential critical areas for harms or risks, and with a number of visual artists, designers, and filmmakers to gain their feedback on how to advance the model to be most helpful for creative professionals.

Other measures that OpenAI says it’s taking to make sure Sora is safe include:

– Building tools to help detect misleading content, including a detection classifier that can tell when a video was generated by Sora and including C2PA metadata (data that verifies a video’s origin and related information). Both of these could help combat Sora being used for malicious/misleading deepfakes.

– Leveraging the existing safety methods used for DALL·E such as using a text classifier to check and reject text input prompts that violate OpenAI’s usage policies such as requests for extreme violence, sexual content, hateful imagery, celebrity likeness, or intellectual property of others.

– The use of image classifiers that can review each video frame to ensure adherence to OpenAI’s usage policies before a video is shown to the user.

Concerns 

Following the announcement of how realistic Sora’s videos can be, concerns have been expressed online about its potential to be used by bad actors to effectively spread misinformation and disinformation using convincing Sora-produced deepfake videos (if Sora is publicly released in time). The ability of convincing deepfake videos to influence opinion is of particular concern with major elections coming up this year, e.g. in the US, Russia, Taiwan, the UK, and many more countries, and with major high-profile conflicts still ongoing (e.g. in Ukraine and Gaza).

In 2024, more than 50 countries that collectively account for half the planet’s population will be holding their national elections during 2024, and if Sora’s videos are as convincing as has been reported, and/or security measures and tools are not as effective as hoped, the consequences for countries, economies, and world peace could be dire.

What Does This Mean For Your Business? 

For businesses, the ability to create amazingly professional and imaginative videos from simple text prompts whenever they want and as often as they want could significantly strengthen their marketing. For example, it could enable them to add value, reduce cost and complications in video making, improve and bolster their image and the quality of their communications, and develop an extra competitive advantage without needing any special video training, skills, or hires.

Sora could, however, also be a negative, disruptive threat to video-producing businesses and those whose value is their video-making skills. Also, as mentioned above, there is the very real threat of political damage or criminal damage (fraud) being caused by the convincing quality of Sora’s videos being used as deepfakes, and the difficulty of trying to control such a powerful tool. Some tech commentators have suggested that AI companies may need to collaborate with social media networks and governments to help tackle the potential risks, e.g. the spreading of misinformation and disinformation once Sora is released for public use.

That said, it will be interesting to see just how good the finished product’s outputs will be. Competitors of OpenAI (and its Microsoft partner) are also working on getting their own new AI image generator products out there, including Google’s Lumiere model, so it’s also exciting to see how these may compare, and the level of choice that businesses have.

Tech News : Google Waits On Hold For You Until A Human Answers

Google’s new “Talk to a Live Representative” feature will call a business for you, wait on hold for you, and call you when a human representative is available.

Being Tested 

Similar to Google Pixel’s existing “Hold for Me” feature, “Talk to a Live Representative” is currently in testing in the US by Google’s Search Labs members (those who experiment with early-stage Google Search products) on the Google app (Android and iOS) and desktop Chrome. Following (successful) testing it’s been reported that the feature will be made available for all search users, i.e. on all devices, not just on Google Pixel phones.

Improved Version of Hold For Me 

Although “Talk to a Live Representative” is similar to “Hold for Me,” where the Google Assistant waits on hold for you and notifies you when the support representative is ready to speak with you, it’s slightly improved in that it handles the entire process and shortens it. For example, “Talk to a Live Representative” proactively calls the business on your behalf in the first place and asks you the reason for the call so the customer service representative will already know why you’re calling.

In short, the one major time and hassle-saving point of the “Talk to a Live Representative” feature is that you only need to actually pick up your phone when a human at the company is available to talk to you.

‘Request A Call’ Button 

Users will know if they can use the feature to call a business’s customer service number because Google will display a “Request a call” button next to its search listing if that business is supported. The button can then be used to select the reason for your call and Google texts you with updates about its progress and calls you when a human customer representative becomes available.

Some Companies Already Supported 

Although the customer service numbers for some companies are already supported by the new feature, it’s perhaps not surprising that these few are large, mostly US-based companies such as airlines (Delta and Alaska Airlines), retail giants (Costco and Walmart) and others including Samsung, ADT, UPS, FedEx, DHL, and more.

What Does This Mean For Your Business?

Although this is an updated/improved existing product being rolled out to a much wider market beyond Pixel phone users, it will be easy for any businessperson to see its potential value. Most of us will have experienced the frustration and inconvenience of having to be made to wait a long time on hold on customer service numbers (often being cut off) whilst also being distracted, having our attention divided, and feeling stressed by having to monitor the phone to make sure we don’t miss the opportunity when it’s answered.

Provided it gets successfully through the testing phase and does what it says on the tin, “Talk to a Live Representative” sounds like a feature that could be of real, practical use to UK businesses large and small. It sounds as though it could save businesses time, hassle, and stress and help them to concentrate more on their work, and help minimise disruption. Unfortunately, there’s no clear date for its general roll-out … if only Google could call us when the feature’s ready to use.

An Apple Byte : Instagram and Facebook Ads ‘Apple Tax’

Meta has announced that it will be passing on Apple’s 30 per cent service charge (often referred to as the “Apple tax”) to advertisers who pay to boost posts on Facebook and Instagram through the iOS app.

This move is a response to Apple’s in-app purchase fees, which apply to digital transactions within apps available on the iOS platform (announced in the updated App Store guidelines back in 2022). Advertisers wanting to avoid the additional 30 per cent fee can do so by opting to boost their posts from the web, using either Facebook.com or Instagram.com via desktop and mobile browsers.

Meta says it is “required to either comply with Apple’s guidelines, or remove boosted posts from our apps” and that, “we do not want to remove the ability to boost posts, as this would hurt small businesses by making the feature less discoverable and potentially deprive them of a valuable way to promote their business.” 

Apple has reportedly responded (a statement in MacRumors), saying that it has “always required that purchases of digital goods and services within apps must use In-App Purchase,” and that because boosting a post “is a digital service — so of course In-App Purchase is required”.

Meta’s introduction of the Apple tax for advertisers on iOS apps highlights the conflict with Apple over digital ad space control and monetisation and this move, aimed at challenging Apple’s app store policies, could make advertising more costly and complicated for small businesses.

Sustainability-in-Tech : Dirt-Powered ‘Forever’ Fuel Cell

Researchers at Northwestern University in the US have created a fuel cell that harvests energy from microbes living in soil so that it can potentially last forever (or as long as there are soil microbes).

Why? 

As Bill Yen (who led the research) suggests, the value may lie in its ability to supply power to IoT devices and other devices in wild areas where solar panels may not work well and where having to replace batteries may be challenging.

For example, talking about the IoT (on the Northwestern University website) Mr Yen says of the growing number of devices: “If we imagine a future with trillions of these devices, we cannot build every one of them out of lithium, heavy metals and toxins that are dangerous to the environment. We need to find alternatives that can provide low amounts of energy to power a decentralised network of devices.” 

Mr Yen also highlights how, putting a sensor out in the wild (e.g. in a farm or in a wetland), can mean being “constrained to putting a battery in it or harvesting solar energy” and points out that “Solar panels don’t work well in dirty environments because they get covered with dirt, do not work when the sun isn’t out and take up a lot of space.” 

Makes Sense To Use Energy From The Existing Environment

In tests, the revolutionary new fuel cell was used to power sensors measuring soil moisture and detecting touch, a capability that the researchers say could be valuable for situations like tracking passing animals.

To tackle the issues of the limitations of relying on normal batteries or solar panels in unsuitable areas, the researchers concluded that harvesting energy from the existing environment (e.g. energy from the soil that farmers are monitoring anyway) is a practical and sensible option.

How Does The Cell Work? 

After two years of research and 4 different versions, the fuel cell is essentially an updated and improved version of a Microbial Fuel Cell (MFC), an idea that’s been around since 1911! In essence, an MFC generates electricity using bacteria in the soil in the following way:

– Bacteria in the soil break down organic matter, releasing electrons in the process.

– These electrons travel through a wire from the anode (where bacteria are) to the cathode (another chamber), generating electricity.

– In the cathode, a reaction uses these electrons (plus oxygen and protons) to form water, keeping electrons flowing as long as there’s “food” for bacteria.

The Combination of Ubiquitous Microbes and A Simply Engineered System

Northwestern’s George Wells, a senior author on the study, says the key drivers of the success of the fuel cell design are the fact that it uses microbes that are “ubiquitous; they already live in soil everywhere” and that it has a “very simple engineered systems to capture their electricity”. 

Special Features 

The features that make the MFC made by the researchers at Northwestern University so successful are:

– Its geometry. Rather than using a traditional design where the anode and cathode are parallel to one another, this version leverages a perpendicular design.

– The conductor that captures the microbes’ electrons is made of inexpensive and abundant carbon felt, and the anode (made of an inert, conductive metal) is horizontal to the ground’s surface, with the cathode sitting vertically atop the anode.

– Although the entire device is buried, the vertical design ensures that the top end is flush with the ground’s surface.

– A 3D-printed top prevents debris from falling inside.

– A hole on top and an empty air chamber running alongside the cathode allow consistent airflow.

– With the lower end of the cathode being deep beneath the surface, this ensures that it stays hydrated from the moist, surrounding soil (even if the surface soil is dried out in the sunlight).

– Part of the cathode is coated with waterproofing material to allow it to breathe during a flood and, after a potential flood, the vertical design helps the cathode to dry out gradually rather than all at once.

More Power 

The Northwestern researchers claim that the power produced by their fuel cell can outlast similar technologies by 120 per cent.

What Does This Mean For Your Organisation? 

This is an example not just of how an old technology has been re-vamped and supercharged, but also how a relatively simple solution fuelled by nature can be the answer to modern world challenges.

This simple, cheap device, that uses a potentially endless supply of free, natural energy as its power source could be of huge value in areas like precision agriculture to feed the world. For example, farmers wanting to improve crop yields can now have a long-lasting, no-maintenance, natural way to power the sensors/devices needed to measure things like levels of moisture, nutrients, and contaminants in soil. This cell will also free farmers from the task of having to travel around a 100+ acres farm cleaning solar panels or changing batteries. Another major advantage of the product’s design is the fact that some of it can be 3D printed and all the components could be purchased in a hardware shop.

All this means it has a wide potential geographic reach. The fact that there’s already a plan to make the next version from fully biodegradable materials, avoiding using any conflict minerals in its manufacture is also a big environmental plus. In short, this simple, cheap, and highly effective cell could offer opportunities and fuel results that are dramatically greater than the sum of its parts.

Tech Tip – Developing A Consistent Brand Voice Using ChatGPT

If you want to develop a comprehensive guide on your brand’s voice and writing style to ensure consistency across all company communications, you can use ChatGPT to help you. Here’s how:

– Open ChatGPT and input a description of the attributes of your brand’s voice (e.g. professional / friendly / authoritative) and any specific do’ss and don’ts in your communication (e.g. usage of jargon, or tone adjustments for different audiences).

– Ask ChatGPT to compile a set of guidelines that detail how to communicate in your brand’s voice, including examples of appropriate and inappropriate phrases.

– For example, to draft an email in the brand’s voice, , state the purpose of the email and any key information that needs to be included and ask ChatGPT to draft the email based on the provided brand voice summary and content specifics.

– Review the draft, and revise if necessary.

– The brand voice guidelines can be applied in a similar way to all other types of communications you write using ChatGPT.

Featured Article : Google’s AI Saves Your Conversations For 3 Years

If you’ve ever been concerned about the privacy aspects of AI, you may be very surprised to learn that conversations you have with Google’s new Gemini AI apps are “retained for up to 3 years” by default.

Up To Three Years 

With Google now launching its Gemini Advanced chatbot as part of its ‘Google One AI Premium plan’ subscription, and with its Ultra, Pro, and Nano LLMs now forming the backbone of its AI services, Google’s Gemini Apps Privacy Hub was updated last week. The main support document on the Hub which states how Google collects data from users of its Gemini chatbot apps for the web, Android and iOS made interesting reading.

One particular section that has been causing concern and has attracted some unwelcome publicity is the “How long is reviewed data retained?” section. This states that “Gemini Apps conversations that have been reviewed by human reviewers…. are not deleted when you delete your Gemini Apps activity because they are kept separately and are not connected to your Google Account. Instead, they are retained for up to 3 years”. Google clarifies this in its feedback at the foot of the support page saying, “Reviewed feedback, associated conversations, and related data are retained for up to 3 years, disconnected from your Google Account”. It may be of some comfort to know, therefore, that the conversations aren’t linked to an identifier Google account.

Why Human Reviewers? 

Google says its “trained” human reviewers check conversations to see if Gemini Apps’ responses are “low-quality, inaccurate, or harmful” and that “trained evaluators” can “suggest higher-quality responses”. This oversight can then be used “create a better dataset” for Google’s generative machine-learning models to learn from so its “models can produce improved responses in the future.” Google’s point is that human reviewers ensure a kind of quality control both in responses and how and what the models learn in order to make Google’s Gemini-based apps “safer, more helpful, and work better for all users.” Google also makes the point that the human reviewers may also be required by law (in some cases).

That said, some users may be alarmed that their private conversations are being looked at by unknown humans. Google’s answer to that is the advice: “Don’t enter anything you wouldn’t want a human reviewer to see or Google to use” and “don’t enter info you consider confidential or data you don’t want to be used to improve Google products, services, and machine-learning technologies.” 

Why Retain Conversations For 3 Years? 

Apart from improving performance and quality, other reasons why Google may retain data for years could include:

– The retained conversations act as a valuable dataset for machine learning models, thereby helping with continuous improvement of the AI’s understanding, language processing abilities, and response generation, ensuring that the chatbot becomes more efficient and effective in handling a wide range of queries over time. For services using AI chatbots as part of their customer support, retained conversations could allow for the review of customer interactions which could help in assessing the quality of support provided, understanding customer needs and trends, and identifying areas for service improvement.

– Depending on the jurisdiction and the industry, there may be legal requirements to retain communication records for a certain period, i.e. compliance and being able to settle disputes.

– To help monitor for (and prevent) abusive behaviour, and to detect potential security threats.

– Research and development to help advance the field of AI, natural language processing, and machine learning, which could contribute to innovations, more sophisticated AI models, and better overall technology offerings.

Switching off Gemini Apps Activity 

Google does say, however, that users can control what’s shared with reviewers by turning off Gemini Apps Activity. This will mean that any future conversations won’t be sent for human review or used to improve its generative machine-learning models, although conversations will be saved with the account for up to 72 hours (to allow Google to provide the service and process any feedback).

Also, even if you turn off the setting or delete your Gemini Apps activity, other settings including Web & App Activity or Location History “may continue to save location and other data as part of your use of other Google services.”

There’s also the complication that Gemini Apps is integrated and used with other Google services (which Gemini Advanced – formerly Bard, has been designed for integration), and “they will save and use your data” (as outlined by their policies and Google’s overall Privacy Policy).

In other words, there is a way you can turn it off but just how fully turned off that may be is not clear due to links and integration with Google’s other services.

What About Competitors? 

When looking at Gemini’s competitors, retention of conversations for a period of time by default (in non-enterprise accounts) is not unusual. For example:

– OpenAI saves all ChatGPT content for 30 days whether its conversation history feature is switched off or not (unless the subscription is an enterprise-level plan, which has a custom data retention policy).

– Looking at Microsoft and the use of Copilot, the details are more difficult to find but details about using Copilot in Teams it appears that the farthest Copilot can process is 30 days – indicating a possibly similar retention time to ChatGPT.

How Models Are Trained

How AI models are trained, what they are trained on and whether there has been consent and or payment for usage of that data is still an ongoing argument with major AI providers facing multiple legal challenges. This indicates how there is still a lack of understanding, clarity and transparency around how generative AI models learn.

What About Your Smart Speaker? 

Although we may have private conversations with a generative AI chatbot, many of us may forget that we may have many more private conversations with our smart speaker in the room listening, which also retains conversations. For example, Amazon’s Alexa retains recorded conversations for an indefinite period although it does provide users with control over their voice recordings. For example, users have the option to review, listen to, and delete them either individually or all at once through the Alexa app or Amazon’s website. Users also have the option to set up automatic deletion of recordings after a certain period, such as 3 or 18 months – but 18 months may still sound an alarming amount of time to have a private conversation stored in distant cloud data centres anyway.

What Does This Mean For Your Business? 

Retaining private conversations for what sounds like a long period of time (3 years) and having unknown human reviewers look at those private conversations are likely to be the alarming parts of Google’s privacy information about how its Gemini chatbot is trained and maintained.

The fact that it’s a default (i.e. it’s up to the user to find out about it and turn off the feature), with a 72-hour retention period afterwards and no guarantee that conversations still won’t be shared due to Google’s interrelated and integrated products may also not feel right to many. The fact too that our only real defence is not to share anything at all faintly personal or private with a chatbot, which may not be that easy given that many users need to provide information to get the right quality response may also be jarring.

It seems that for enterprise users, more control over conversations is available but it seems like businesses need to ensure clear guidelines are in place for staff about exactly what kind of information they can share with chatbots in the course of their work. Overall, this story is another indicator of how there appears to be a general lack of clarity and transparency about how chatbots are trained in this new field and the balance of power still appears to be more in the hands of tech companies providing the AI. With many legal cases on the horizon about how chatbots are trained, we may expect to see more updates to AI privacy policies soon. In the meantime, we can only hope that AI companies are true to their guidelines and anonymise and aggregate data to protect user privacy and comply with existing data protection laws such as GDPR in Europe or CCPA in California.

Each week we bring you the latest tech news and tips that may relate to your business, re-written in an techy free style. 

Archives