Quality Over Quantity: the Counter-Intuitive GenAI Key

28 June 2024 at 17:51

It’s been almost two years since OpenAI launched ChatGPT, driving increased mainstream awareness of and access to Generative AI tools. In that time, new tools and solutions seem to be launching daily. There is also a growing trend of building bigger models that consume larger quantities of training data, often with mixed results ranging from hallucinations or categorically incorrect facts to the regurgitation of opinions as universal truth, proving the old adage that sometimes “less is more”.

Quality over Quantity

So, if using more data doesn’t translate into better results… what does? It comes down to another tried and true saying – “quality over quantity.”

At McAfee, we maniacally focus on data quality. A well-developed Generative AI model is nothing without high-quality, curated datasets to fuel them. When the quantity of data is prioritized over quality, the results are often disappointing.

How do we produce quality data? Using millions of worldwide sensors, our AI engineers and AI data specialists focus on clues that point to threats. But that’s just the first step. Our teams then curate the data to improve the quality and maximize data diversity, reducing sources of bias, cross-pollinating data sources, and enriching and standardizing samples, just to name a few of the dozens of operations conducted to ensure we’re building datasets of the highest and purest quality.

All of this translates into the most comprehensive and robust AI-based protection for our customers: more than 1.5M threat detections per week across malware, scams, phishing, smishing, and more than half a billion web categorizations to help ensure a safe digital journey while browsing the Internet.

Human/AI Partnership

As the capabilities of AI tools increase, so does the conversation around how technology removes humans from the equation. The reality is that humans are still an integral part of the process and key to any successful Generative AI strategy. AI is only as good as the data it’s trained on, and in McAfee’s case, the guidance provided by cybersecurity experts. Thus, Cybersecurity AI specialists curating data is crucial to the development of all of our AI systems as it mitigates potential sources of error, resulting in accurate and trusted AI solutions, and allowing us to scale and share human expertise to better protect millions of customers worldwide.

Tackling cyber threats is a tall order that comes with intrinsic challenges. For example, modern scams are more subtle and less obvious even to experts, and quite often it is just the implicit intent that sets it apart from genuine (non-scam) content. Being context-aware can help navigate this landscape to more effectively detect and stop threats before they reach customers. What is more, we believe transparency and education are paramount for building a safer digital world. This is why we also invest in building explainable AI that helps users understand why a threat has been flagged and provides clues they can use to identify future threats.

Only the Beginning

The GenAI journey has only just begun. There is still a lot of work to do and a lot to look forward to as this technology continues to evolve. While it’s easy, as developers, to get caught up in the excitement, it’s also important to identify and focus on an ultimate goal and the responsible and safe steps to get there. At McAfee, we pledge to protect our customers, and we believe in the synergistic interaction between AI and Human Threat Intelligence. Together, we can deliver a trusted, world-class AI protection experience.

The post Quality Over Quantity: the Counter-Intuitive GenAI Key appeared first on McAfee Blog.

McAfee Blogs
Generative AI: Cross the Stream Where it is ShallowestGerman Lancioni 7 February 2024 at 18:04

Generative AI: Cross the Stream Where it is Shallowest

McAfee Blogs

By: German Lancioni

7 February 2024 at 18:04

The explosive growth of Generative AI has sparked many questions and considerations not just within tech circles, but in mainstream society in general. Both the advancement of the technology, and the easy access means that virtually anyone can leverage these tools, and much of 2023 was spent discovering new ways that Generative AI could be used to solve problems or better our lives.

However, in the rush to apply this transformative technology, we should also keep in mind “Maslow’s Hammer.” Attributed to Abraham Maslow, best known for outlining a hierarchy of needs, Maslow’s Hammer highlights an over-reliance on a single tool, a concept popularly summarized as “If all you have is a hammer, everything looks like a nail.” As corporations navigate the continuing evolution of AI, we need to be certain that we’re applying it where it makes the most sense, and not just because we can. This will ultimately save time, money, and energy that can be applied to building robust tools and solutions for viable use cases.

Recognizing when to use GenAI and when not to use it is a necessary skill set for full-stack domain-specific data scientists, engineers, and executives.

Running GenAI is expensive and not without tradeoffs. As of today, careless planning of a GenAI application can lead to a negative return on investment (due to the excessive operational cost), scalability and downtime issues (due to limited computing resources), and serious damage to the customer experience and brand reputation (due to the potential generation of improper content, hallucinations, mis/disinformation, misleading advice, etc.). Organizations struggle to control these variables in general, and the negative impacts and limitations must be offset by a huge value proposition.

One interesting aspect that can be observed across industries is the unexpected (but welcomed) side effects of going through the GenAI voyage, as some sort of eye-opening epiphany. How do we balance this risk/reward? What should we be looking at and what are the questions we should be asking to ensure that we’re successfully applying (or not) AI?

Breaking free from the complexity bias: as humans, we tend to favor and give credit to complex solutions only (known as ‘complexity bias’). Unfortunately, this particularly applies to GenAI applications nowadays, as we are influenced and “self-forced” to use GenAI to solve all problems. Just because “it seems to work”, it doesn’t mean it’s the best/optimal solution. It is by following this logic that some teams may have a significant chance of discovering that there are simpler (probably non-GenAI) means of solving some of these real-world problems (or parts of the problem!). Achieving this revelation requires a humble mind that is open to the possibility of considering that we don’t always need the most complex or expensive solution, even if it’s fancy and we can afford it.

It’s not always all or nothing: one aspect that works only for a few companies but not for most is the need to run GenAI all the time. If your business case is not around selling or supporting GenAI infrastructure, then you are likely using GenAI as a tool to accomplish domain-specific goals. If so, what every player in the industry would want is to maximize value while minimizing operational costs. At the current cost of running GenAI, the most obvious answer to achieve that is to avoid running it as much as possible, while still delivering most of the desired value. This delicate trade-off is a smart and elegant way of tackling the problem: not dismissing the value provided by GenAI nor obsessively using it up to the point that yields negative ROI. How do you achieve this? That’s likely the secret sauce of your domain-specific application area.

Ethical downsizing: GenAI models can be (and usually are) quite big. While this might be required for a few scenarios, it’s not necessary for most real-world domain-specific applications, as several GenAI authors are finding out across the industry (e.g., Phi-2). As such, it’s not only important for your business but also for humanity that we learn to downsize and optimize GenAI models as much as possible. It not only brings efficiency to your use case (cost saving, inference speed, lighter footprint, reduced risk, etc.) but also accomplishes a responsible use of the technology that is respectful of human resources. Each time you save a kilowatt or a few seconds of inference per user, you are explicitly contributing to a sustainable future where GenAI is leveraged to maximize value while minimizing environmental impact, and that’s something to be proud of.

Cross the stream where it is shallowest…

The key is to be humble enough to seek the optimal path: keep an open mind to consider non-GenAI solutions to your problems first. If GenAI is truly the best way to go, then find out if you really need to run it all the time or just sometimes. And finally, downsize as much as possible, not just because of cost and speed, but because of social responsibility.

GenAI is clearly having a moment with demonstrated potential. At the same time, being able to recognize the technical and financial downsides of GenAI is as important for the healthy development of the industry. In the same way we don’t use the hammer for every task at home, we should continuously ask: Is this problem worth GenAI? And is the value provided by this technology (when applied to my domain-specific use case) going to exceed the operational shortcomings? It is with this mindset that the industry will make significant and responsible progress in solving problems with a diverse but efficient set of tools. Let’s continue exploring and building the fascinating world of GenAI, without forgetting what our ultimate goals are.

The post Generative AI: Cross the Stream Where it is Shallowest appeared first on McAfee Blog.