Is your data strategy ready for generative AI?

3 mins read

For enterprises embarking on the use of generative AI, it can be challenging to identify the best path forward.

Credit: Zahid - adobe.stock.com

Many will know there’s much to gain from the use of AI with companies providing better customer service, parsing complex information through natural language inputs, and generally making workflows faster. But with this technology comes risk which include, but are not limited to, hallucination, the loss of personal data, weaknesses in model architecture, and bias.

As the world becomes more cognisant of both the opportunities and risks associated with AI – and particularly in the wake of the approval of the EU’s landmark AI regulation – it’s more important than ever that enterprises understand how to leverage these technologies safely and positively. While it presents a real opportunity to address even global and societal challenges such as climate change and disease, this can only be achieved through a considered planning and development process.

One thing we know is that without the right safeguards in place, things can go awry very quickly. A Hitachi Vantara survey found that 65% of UK businesses are concerned they cannot detect a data breach in time to protect themselves. Because AI systems are trained by data, it’s imperative that this data is secure and compliant with all regulatory requirements, or else enterprises may face penalties for non-compliance.

That’s why AI strategy must start with data and have responsibility baked in from the outset. Leadership must be involved in this process too – it’s not an area that can be outsourced. While enterprises may bring in third parties to help with the implementation and advise on the technology, businesses will also require regulatory compliance, ethics, and legal scrutiny to be integrated.

Getting the data under control

Because data is so important we have to ask the question: where will that data live and how will it be used? We also must appreciate the sheer amount of data we’re talking about.

On average, large companies store 35PB of data, according to the same Hitachi Vantara study. What’s more, 60% OF large companies are overwhelmed by the amount of data they store, while 75% are concerned their infrastructure will be unable to scale to meet the demands of organisations. If data continues to grow, it may not be a feasible option to store all the necessary data on the cache. But if businesses rely solely on public data to train and run their generative AI tools, there’s the increased risk of hallucinations because the company doesn’t have direct control over the biases inherent in the data sets.

The solution then is to find a way that keeps the data accessible; can scale to deal with the growing amount of information businesses produce; and gives the reliability, traceability and explainability required for robust AI services. Generative AI also needs speed. For on-premise storage, flash SSDs can deliver much higher IOPS than ‘spinning disc’ HDDs and as such are more efficient. This helps reduce the training time of AI models, while also reducing power consumption.

Get the right fit

That said, there’s no one way to implement AI solutions within a business because everyone has their own goals. However, there are three things that every enterprise should consider if they want to introduce AI technology.

First, understanding what data you have available will inform the AI system you’re able to develop. Next, with a growing range of choices available, enterprises should think about the model they want to use. Then, consider what you want to achieve. What functions are you looking to introduce? This will dictate the data sets you will use, and this will differ for everyone.

The hardware is the final consideration. This should be the third aspect because you want to make sure it is right size for the task at hand, to avoid over- or underspend. There’s also the question of whether to rent it from a hyperscaler or to use hardware that lives on-premise. At Hitachi Vantara, we have found that as things stand, the on-premise option is the most cost effective. However, this can vary according to use case.      

What we know today is that many companies want to leverage the potential of generative AI but feel nervous about the risks associated with the data required. According to Hitachi data, 44% of leaders admit that nobody in their business has a handle on all the data being collected and stored. Implementing AI without a robust strategy could prove costly, because now more than ever, businesses want flexibility and, in many cases, are finding the right fit in hybrid solutions.

What matters most is that the business is equipped to manage the data it needs, has access whenever they need it, and can use it to run applications that perform flawlessly. Generative AI should be built on this approach – and when it is, it can be used to return many benefits for the people it serves.

Author details: Jason Beckett, Head of Technical Sales, Hitachi Vantara