The following talk was recorded at the Dubai International Food Safety Conference. Sharing public-private data for food safety.
Can we can use data and AI to reduce the global food safety health burden?
In this talk, I’m going to say yes, we can, and I’m going to show you how.
We’re a scientific company, and we use data and a secure data platform combined with the best food safety science safety to help organizations on their food safety journey.
Embracing Uncertainty
We have already heard a lot about why we need better solutions, given all of the uncertainty in the world today.
Let’s think about a future data point or future trend. It could be anything. It could be the ocean temperature, it could be the price of a commodity, it could be the US dollar price for a barrel of oil.
Can we see this variable into the future?
No, we can only see this trend looking backwards. We can see the historic trend of that data point looking backwards, but the future is not as easy to understand. And we’re always warned that past performance doesn’t guarantee future returns or success.
So what does the future really look like?
The future looks more like this. If we look forward from today into the future, there’s uncertainty, and this comes from many things.
From a food safety perspective, many different things can change and bring risks and opportunities into the food supply.
So the future is uncertain moving forward from today. Trends are what we call probabilistic. They’re not deterministic, they’re probabilistic.
We can use scientific methods and data modeling to understand the future better. And when you do that, you can understand the trend and risk profile of any variable in the future.
Using data, you can model any variable you’re interested in, be it food safety, other risk, or different aspects of your food supply. And when you do that, you get an expected value and the risk profile of your variable: high and low. We can look at the distribution of the variable value in the future and use percentiles and statistics to represent and understand the ranges.
Whether we are interested in the number of food safety outbreaks in a country, the number of positive samples in a lab test, etc., we expect a certain number, but we know that could be higher or lower.
The way to understand this better is to get as much data as we can to understand the future scenario so that we can understand that risk and that range that may occur.
Lots of processes in nature follow trends. Data can characterize them, so we need lots of data to help. Since there is so much uncertainty in food safety, there are lots of variables we need to understand. Data sharing unlocks the potential to access this data and the power of artificial intelligence and machine learning so that we can actually de-risk the future.
Data Sharing
A mechanism to gather as much data as possible on events of today
So if you think about the data that you have available in your organization or in your country. You have your private data that you have access to.
You may have put initiatives in place to organize this data. So this data is useful. Okay, that’s a good start. Then you can supplement that with public data that’s out there on the web.
You can access public data sources, for example, from the CDC in the U. S., and the RASF Rapid Alert System in the EU, and you can supplement your private data with public data, and you get a better picture of the ecosystem and data that you need to train and create models.
But this isn’t all the data that is out there. When we start sharing data through what we define as a data trust (a system to share data), you get a fuller picture. We need to start sharing data to get access to the full range of information we need.
When you have all of this data, you can get an understanding of the true nature of the risk and the true opportunities within your industry.
Food supply is global
As we have already heard today, the food supply is global. So, we need to think big when we’re sharing data.
Food safety data is complex
Furthermore, food safety data is complex. There are many aspects to it, right from back in the farm – ranches and farm inspections, crop and water testing, market information, food consumption data, and monitoring data for pesticides. There are many sorts of data to take into account.
Data Trusts for sharing food safety data
So, we have built what we call a data trust. A data trust is two things. It’s a platform technology that facilitates secure and confidential data sharing, and it’s also a set of legal agreements to share and use data.
So a data trust combines the legal agreement between organizations who agree to share data and the technology platform to enable that.
Who shares the data? We have numerous data trust projects going on where industry members share data with their membership associations and others where government organizations share data, and in some cases, both industry and government collaborate to share data with each other. This is when the true power of data can be used to reduce food safety risks.
The benefits of sharing data.
We have seen multiple benefits arise from our data-sharing projects, these include safer food, reduced cost, and identifying opportunities for revenue maximization by understanding the safety and benefits of products better.
Data ownership and protection are vital in these projects. It’s important that the controls are in place to control the data and enforce privacy and protection. Your data is important to you. You still own the data that you have contributed, and you retain control of your data. You can decide to remove it if you wish to remove it.
The data you contribute is aggregated into the master data set and creates a bigger picture for use by all of the participants.
Here’s a schematic of what the collaboration looks like in one of our projects.
A system for sharing data safely
Here we can see many users in one organization are inputting data, and then other organizations are also sharing data through this system. The data is securely aggregated and then anonymized into a central database system. Public data supplements the shared data, and all of that data is organized into a data mart and warehouse. Predictive analytics can now be used because there is enough data to provide the statistical power to develop models and visualize results.
Those dashboards and results are available to the organizations and users who submit the data. This is a description of a data trust in action. We’ve built a number of data trusts for different organizations around the world (which I will show you a few examples in a moment), and I believe the solution to the global food safety problems is to combine these data trusts to gain global insight from all of the data.
The Global Food Safety Data Trust
Now that data has been structured, anonymized and aggregated from multiple data trusts, there is an opportunity to share data between these countries and regions. This can provide an improved understanding of many aspects of food safety risks, and new predictive capabilities can be built.
We are hearing a lot about AI (artificial intelligence) and machine learning these days. It is becoming truly pervasive and available through various new code bases, systems and products.
Artificial Intelligence and Machine Learning for food safety
The large language models, such as ChatGPT, are getting cheaper and more powerful by the month. The main thing you need to take advantage of these powerful models, which are quite commoditized now, is good data. And this is why we need to share data to have the ability to leverage these powerful AI models. They need lots of data. They need big data. Your best chance of getting the volume and quality of data needed is to share and aggregate data.
This is a technical architecture. The key point is you can ingest all of this different data on the left.
The data transfers through various stages and ends up in a data mart where it can be used for predictive modeling (see right-hand side) to produce better results. The key thing about these platforms is you’re moving away from Excel and into larger database systems where you can run machine learning code (e.g. Python code) on the data.
Data and Analytics
There are many different types of machine learning models, which I won’t go into here. But you can see three main types shown where you can achieve different outcomes like predicting relationships using regression type models, you can classify relationships and cluster your data into groups using these machine learning models. All these are possible once you have gathered and structured your data.
Case study: Western Growers GreenLink Platform
I’d like to run quickly through a couple of working examples of data trusts now.
Here’s one that’s working in the western region of the United States. This platform was commissioned by Western Growers and is called Greenlink. Greenlink involves all of the major leafy green (lettuce, cabbage etc.) growers and ranches in the western part of the United States, and the goal is to try to understand how food safety risks arise and can be mitigated.
So we configured a data trust where the growers can upload their data. They’re uploading data on testing of their water supply, data produce tests and information on the inspections and mitigation measures carried out on their ranches.
We then combine the data from the growers. It’s all anonymized and used to produce insights via dashboards.
They can all see the big picture now of their industry. They can see their own private data and compare that to the overall aggregate anonymized data. The data is supplemented by other information, for example, weather patterns, seasonal information, U.S. government data and all of that data comes together to create predictive models that the whole community can then benefit from.
They can upload their data using simple forms or more sophisticated approaches. They can upload Excel files through a web-based system or use APIs to connect their system to Greenlink.
The Greenlink contributors get access to the results of these analytics and models. They get very detailed and rich information on the region and on the different types of issues that are arising.
So there are many benefits accruing to the industry. They’re understanding the issues that can arise and how they can prevent them. They are able to test the effectiveness of their mitigation strategies and understand if issues that arise are sporadic or systematic. Western Growers and their members are learning from the data as they go along. So we’re making great progress there.
Case study: Fiin Food Intelligence Network Data Collection Platform
The second Data Trust case study I want to show is from the United Kingdom. This is headquartered by an organization called Fiin, which Cambden BRI established.
FIN is a consortium involving many retailers and food producers in Europe (some of which are shown here). The data trust was established to understand food fraud better. Food fraud is a very sensitive issue and is tricky to detect. A member company may detect an issue within its supply chain, but they don’t really want to disclose that to the whole industry because it’s a very sensitive topic.
Fiin came up with a data-sharing plan, and we implemented it in our data trust solution to help them confidentially share data to combat food fraud. Each organization can totally anonymously submit data and the platform avails of legal privilege for the dataset as a whole.
The submissions are totally blind, even to us. The end users can upload information on any food fraud detection within their supply chain without identifying themselves or their organization.
Data Trust – Legal privilege and data anonymity
And all this data goes into a centralized system, where they can then share and analyze the information. Nobody knows who submitted what, but they can all learn from the aggregated data. Every quarter, There’s a thorough review and presentation, within the group, on the food fraud issues and risks that have emerged in the data during that quarter. These presentations are always extremely interesting and well-attended by the members.
And we’re looking at plugging AI into this data. So far, it has been humans analyzing and visualizing the data and communicating the learnings from the data trends. We’re now plugging in AI to try to detect hidden trends and to move to a more predictive risk approach, whether those risks are coming from certain regions, categories of food, times of the year, etc.
A very positive collaboration has been established within the members of the project, and they are constantly learning and gaining benefits from their participation.
Case study: FDA Seafood Data Sharing Platform
My final case study is a large data trust in the United States organized by the FDA. We started this project focussed on seafood. 95 percent of seafood in the United States is imported, as Frank spoke about earlier.
We have worked with the U.S. FDA to bring together all of the data they can access on seafood imports and testing.
The project went very well, and this year, the FDA has expanded it from seafood to all foods.
As shown in this diagram, the U.S. FDA is the main organization contributing data, but we are also facilitating the upload of data from many other organizations and users, for example, the U.S. EPA (Environmental Protection Agency), NOAA (the National Oceanic and Atmospheric Administration in the U.S.), and many of the State Department in the United States are contributing data.
These organizations can all contribute data into the data trust, and it all gets combined and visualizations. We are also pulling in some public data from agencies such as the U.S. CDC (Center for Disease Control and Prevention) and other public sources to help the FDA understand what’s going on with their food imports.
There’s been a huge growth in the data available for analysis for the FDA through this project, and this has brought many benefits.
It has validated some of their previous thinking where they suspected risks from certain regions. They now have the data now to back up that knowledge. And they’re gaining new insights and they’re able to target their resources better in the right areas at the right time. For example, they can be more strategic and targeted with their inspections.
Summary: How Data Sharing and AI Can Reduce the Global Food Safety Burden
So, in summary, data sharing and AI are helping to reduce the food safety burden.
The food supply is global, so you need to share data in order to see the big picture.
When you do that and put these connected data trusts in place, you can use the power of machine learning and AI to understand and predict food safety issues and risks.
And doing that will help us all to reduce the global food safety burden.