Companies lack the expertise to effectively integrate AI into their operations while they focus on data complications and struggle to develop custom solutions. For Francis Delafaille, successfully preparing data for AI requires strategies tailored to the business needs, thorough audits and understandings of data landscapes and continual analysis of whether utilised technology is optimal.

DEVELOP BESPOKE STRATEGIES

Addressing data issues requires a deep understanding of the problem at hand and capabilities of the business, beyond just the context of AI. “With all data issues the first action that you’re going to take is to learn about the business strategy and develop the context for using data,” says Delafaille.

Initially, clarify the business strategy and establish a context for data and AI utilisation, including engaging with key stakeholders to comprehend their vision and expectations. Once completed, a more in-depth exploration into the problem domain can follow. “If a CEO is looking to enhance their customer experience, the AI solution should be tailored to this end and it’s got to leverage data that provides insights into customers’ behaviour and preferences.” One factor Delafaille says is vital for the entire data lifecycle is collaboration through cross-functional teams, but clarifying that data preparation in particular “requires diverse experts from data engineers who handle the technical aspects to the domain experts who bring that in-depth knowledge of their specific area”.

DataOps, the approach he favours, draws its principles from DevOps, emphasising agility, collaboration and control to ensure the appropriate individuals in an organisation receive the correct data when they need it.

UNDERSTAND THE DATA

With organisational context in mind, dive deep into the business’s data landscape, focusing on the availability, quality, and relevance of data. This includes evaluating data collection, storage and governance to identify gaps for an AI solution. It may involve seeking new data sources, refining collection methods, or enhancing data quality. “We’d want to work with the client at this point to establish some key measures and key KPIs, if you like, to show the data strategy is in place and what it’s aiming to support,” says Delafaille.

To establish strategies in the context of understanding a client’s data portfolio, conducting a data audit is essential. This involves cataloguing the client’s current data assets, evaluating their quality, and identifying any gaps in the data. As we transition to an AI-ready foundation, the focus shifts from simply having large volumes of data to ensuring the data is of high quality by scrutinising key aspects such as the volume, velocity, variety, validity, and value of the data.

“In effect, we are looking to understand the relevance of the data types, their sources and their capability to deliver to the AI models that should be considered.”

If internal data requires enhancement with an additional field, value or subset, further complications can arise with external data and it’s important to exercise caution. There are often commercial constructs and contractual obligations related to how external data is stored and accessed, so it’s important to adhere to such requirements.

ANALYSE AVAILABLE TECHNOLOGIES

Delafaille thinks the choices of infrastructure and tools can be vital to a project’s success. “My preference is scalable cloud- based solutions that will offer you flexibility, high performance and very robust security features,” he says, pointing to the well-proven hyperscalers such as Microsoft and AWS as steady choices.

The landscape in this space is evolving so rapidly and changing so quickly that Delafaille thinks it is too difficult to retrain engineers on every single technology that appears. But the Microsoft, AWS, Databricks and Snowflake stacks are “relatively consistent and are good places to start,” he says.

Future-proofing data infrastructure involves scalable, flexible ecosystems, like cloud-based solutions from AWS or Microsoft Azure, and adopting a modular architecture to easily integrate new technologies. This approach allows for swift adaptation to evolving data types and volumes. Emphasising continuous learning and skill development within data teams is also important to stay abreast of rapid changes in AI and data management, so Delafaille encourages his staff to constantly upskill.

Experimentation, through developing proof of concepts or minimal viable products, is critical to evaluate new technologies. This process helps determine whether it’s worthwhile to invest time and effort in retraining or upskilling. It’s a continuous feedback loop of evaluating what works and what doesn’t, and being open to trying different technologies. Remaining static in technology choices is not an option due to the rapid pace of change in the field.