Data Trust Frameworks for Secure and Anonymous Data Gathering and Food Safety Use Cases

At IAFP Brendan Ring, Commercial Director at Creme Global was invited to talk about Data Trusts and the work we have done with various clients.

Presentation description

Data has been referred to as the new oil of the 21st Century, given its resource value and potential to drive the economy. Like oil, refinement of and access to data are critical to realize its full potential. The FDA’s New Era of Smarter Food Safety, an initiative to leverage technology and other tools and approaches to create a safer, more digital, traceable food system exemplifies the power of data sharing in the food community. Food companies and regulatory bodies around the globe share a common goal to deliver safe food to consumers. Both are also becoming increasingly aware of the value of sharing data, yet only few examples exist of data being shared across the food industry. Data sharing between companies (private), between companies and government regulators (public-private), and between regulating agencies (public) shows great promise for improving traceability, adopting and scaling artificial intelligence (AI) applications, and delivering safer food for all. However, there are significant obstacles to sharing data including privacy, trade secrets, and the potential for regulatory action. Data trusts, in which an independent organization serves as the fiduciary for the data and governs proper use of data, are a potential solution to these problems. In this symposium, speakers will describe what data trusts are, discuss data trust frameworks for secure and anonymous data gathering, provide examples of existing data trusts for food safety and protection, and illustrate the use of big data to predict critical supply chain infrastructure threats.

Automated Transcript

IAFP Brendan – Data Trusts

With the objective of introducing the audience the opportunities to leverage the power of data for food safety applications and through data sharing platforms.

In this symposium, speakers will describe what Data Trusts are,

discuss Data Trust frameworks for secure anonymous data gathering,

provide examples of existing Data Trust for food safety and protection,

and illustrate the use of big data to predict critical supply chain and infrastructure threats.

This session was organized by Joseph Scimeca, my colleague, Stacey Wiggins and myself. I’m Nate Anderson. Joe and I are gonna be co moderating the session.

Our first speaker is Brendan Ring. Brendan is the Commercial Director at Creme Global. A data science company with specific expertise in the food industry. Brendan has worked in a wide range of companies in the food and ICT industries over the last two decades.

He’s very experienced in helping industry scope requirements and develop solutions that provide high value business insights. He completed a degree in Engineering in Dublin City University in Ireland. And an executive MBA in the University of Cambridge in the UK. Brendan will be presenting on Data Trust frameworks for secure anonymous data gathering and food safety use cases.

Brendan.

Thanks very much, Nate. And thank you for the opportunity to be here, as well.

Data Trust. Data, there so much data out there. So many different people have data in all different places and getting access to mining it. Getting some insight from it.

That’s where it’s all at, really. So, that’s what this talk is about. How can you get more people getting access to the data.

So, what’s a Data Trust? There’s various descriptions of it. I was glad to discover that I wasn’t the only person that wasn’t too sure what it was before I had to come along and talk about it.

I guess this is our understanding of a Data Trust and what we would see it as. You might hear me talking about a data sharing platform. That’s kind of what our normal kind of language is on it. But actually, I think a Data Trust has actually kind of captured the whole essence of it and quite nicely.

Is it an agreement? Is it where a number of parties come together and decide, we need to share some data. What do we sign? What do we do? How do we put this agreement or mechanism to cooperate in place.

Or is it actually a platform?

Is it where you, you know, here’s why you come and share the data, load your data here.

No question. It’s both. One can’t coexist without the other. We have worked with an entity. We’re currently scoping a project of where there’s an agreement in place within a number of entities

and they’re like, right, okay, let’s do it.

And it’s like, we have nowhere to put the data. Somebody find a spreadsheet somewhere. Immediately, you’re in trouble, if that’s the route.

Then we look on diversion control issues and data validation issues,

et cetera. Agreement on its own isn’t enough. Similar on the platform,

there’s actually various different case studies on top three,

you kind of see the different things that come up that you have to try and figure out to solve this from a platform perspective.

It’s a combination of both. It’s pulling the,

it’s having somewhere to put it and an agreement in place. Essentially, at a score,

you know,

in all the different guises,

that’s what a Data Trust is. Who shares the data? Is it industry? Is it private industry?

Is it the regulators? When you talk about a Data Trust,

who actually is putting that data in there and getting it in there? Again,

it’s both. You can have a Data Trust associated with private data or regulatory data. And you’ll see as I talk through that. Well then,

who can see the data? Like,

okay,

that’s an important question. Sometimes quite sensitive data going in there.

That is absolutely in an entirely user defined case dependent. It’s all about who gets to see what and where,

and agreeing that up front. When you talk about agreement to share data,

well,

the most important part of that is exactly the granularity of who gets to see what,

where.

So when you talk about an industry only scenario,

you can actually start from the very simple here and get out of it quite complex. So, from a simple perspective, even within one department and one single company,

being able to actually pull that data out,

because it might be in multiple different formats or you might have PDFs and spreadsheets, or you might have some historical data,

or you might have something coming in from a sensor. So actually,

you can use a Data Trust for that,

even though it’s actually quite a narrow field. And then you get that data,

structure all that data in,

get it into your platform and actually get some insight from it. So,

when we build out our Data Trust,

it’s not just like,

okay,

kind of a big void where you dump everything into. It’s very much about bring it along through that process. There are early talks about the data engineering part of it. So typically,

that’s not the big challenge. That’s not where you need to sync a load of time into. There’s a lot of very powerful tools out there. Yesterday I was talking about 15 different ways to spell Walmart.

Three lines of a script and you’ve solved that when you have the right platform and tools in place. The issues is all around coming up with the business case. They’re the barrier. So I’ll be talking a bit more on that. So then, as you expand out that kind of industry only scenarios.

So some of the big companies that we work with branches all over the globe. So many employees that the security to access is the very same as if it was a public database. So,

they need to go to exactly the same amount as like assigning user names as if it’s literally the public coming in.

Because, we have more than a couple hundred thousand employees. You’re essentially required the same level of security. We’ve had to put those kinds of multifactor authentication in place even within a company for when you’re getting data from all across the globe.

And then as you expand that out again,

with multiple companies getting involved.

There’s a lot more sensitivity around anonymizing the data. Who get to see what kind of aggregated data. And even within one company only certain people get to see certain levels of data, or can access or download it.

In the world of food and safety, you’re in the world of people misinterpreting data and litigation, so you have to be really careful who gets access to what and who can see what. Even internally in the company.

I have a couple of examples to talk about. Mostly what I will be talking about is multi-company and data sharing. That tends to be where the interesting stuff comes along, I guess.

So very similar from a regulatory perspective. Even within one regulator,

even within multiple regulators,

within one country and multiple regulatory agencies, within multiple countries and regulatory agencies. You can imagine how that kind of expands out.

We tend not to be so concerned about anonymizing the data when it’s inter regulatory. You’ve kind of self-selected into a much smaller group in that scenario. But again,

using a Data Trust is pretty much exactly the same technologies,

just about defining how you want to build that up and present it out.

Some nice projects actually coming together in that space, as well. So obviously,

this fits nice into the new era of smarter food safety. Frank Yiannas, he’s been talking about it for quite a while, but there’s been a lot of effort and time going into this, over the last couple of years.

Data is how this is going to happen into the future. This is the path forward in this case.

The ideal scenario, the holy grail,

we are certainly a long ways from this. But there are case studies. There are examples of where it’s happened,

where you have industry and they have their collection of data and they’re making some element of it available to the regulator.

And likewise

from the regulatory side,

tons of data or what can be made available to industry. So again,

Data Trust is kind of the thing that kind of pulls all this together and develops. It’s just coming up with the agreements coming up with the case.

Why do you want to do this? Having a talk at IAFP on data standardization interoperability. So,

There’s elements required for that. People that are talking the same language but

That doesn’t mean defining exactly the right template and everyone fills in exactly the same.

It’s more about understanding the terminology. That’s where the effort goes into rather than the engineering side.

Why share data? This is really the crux of it for us.

In my role talking to companies,

if you can get over this hurdle,

all the remaining technical hurdles, all the hurdles can be solved quite easily.

Safer food,

to discover anomalies. We’ll be talking about the Fiin case study later. It’s quite nice in terms of,

if you’re looking at your own data on its own, it’s like one tiny slice of what’s happening. When you look at aggregate data you get to stand back and see the full picture.

You get to see what’s going on everywhere. You can produce safer food by sharing data,

by using a Data Trust.

Reduced costs.

So the Western Growers project, it’s a nice example on that.

Where industries are trying to figure out. Well, a tight margin industry,

like a lot of food industries it has a tight margin.

In order to reduce cost you’re trying to figure out, what’s the most cost effective mechanism of implementing food safety and food protection measures.

Sometimes it’s fairly substantial capital investment. So, if you’re relying on your own data in terms of what may work or whatnot, it’s much more challenging to discover what’s going to give the best return investment, the best bang for buck in terms of implementing the solution.

There’s opportunity for companies to save significant quantities of money by making the right decision as to what works and what’s not.

Maximize revenue

The case study we’re talking about here. When you want to discover if the regulators are going to put very restrictive controls in place because in the absence of data, you need to take very conservative estimates as to what’s actually happening in reality.

Well then that’s going to be quite punitive on industry and actually, can just be unworkable.

So this is where industry has come together,

shared actual data,

and was like, okay,

now we know what the exposures are. Now we can actually maximize the quantities in the appropriate location.

Then, the opportunity arises to maximize your revenue.

Quick little comment on big data and AI and ML.

There tends to be far more food scientists here than data scientists. So i’ll demystify some of these terms. So when you talk about data, big data,

When you just take data on its own,

just putting it into a chart,

you can get so much insight from it.

You don’t have to do anything fancy with it at all. Looking at a table,

the human eye can ‘t pick up anything.

Put that into a chart, or into a plot, all of a sudden,

you can tap into the power of the human eye, which is actually really, really powerful at seeing a trend or observing an anomaly or something that is really difficult to train a computer to do,

but actually through the combination of very simple data and a bit of visualization, the human eye is really built around being able to spot something quickly.

AI then is when you’re starting to automate that process,

when you’re starting to bring in data in an automatic manner

and how you can kind of configure all of that to happen on a continuous basis. It tends not to be fancy robots doing cool things where they can work around and run. That’s one tiny aspect of it. But in our industry when you talk about AI, that’s important. That’s what’s relevant.

Machine learning is the next step of where all that data is flowing in and it’s feeding into a model that’s giving an insight.

As new data comes in, it’s continuing to learn from that. So that’s simply all it is. It’s just rerunning,

it’s automating the rerunning of optimizing a parameter. When you run your model the first time,

you’re trying to optimize, you’re trying to get the best fit,

the best score. And that’s really all the machine learning is.

It just automates that process. So it’s continuing to learn. It’s as simple as that. There’s a ton of different ways that that can be done and different models with different things. But at the end of the day, that’s really as simple as it is. It’s not black magic.

So a couple of case studies.

This Fiin data collection platform

It is really quite a unique one. This came about following a food safety scandal in UK and Europe about 8 or 10 years ago.

There was a report published by professor Chris Elliott from Queen’s University, saying industry needs to get together and share data. You all had some probably awareness of this. But as an industry, nobody knew across the board.

So essentially what the platform is doing, it’s gathering data from the companies there. It’s gathering incoming inspection data from this range of companies. I think there’s probably even a few new ones. Walmart actually is in there now. And Mars and Nestle. So this is even already out of date.

They originally were set up using spreadsheets and email .

We were pushed by [inaudible] to figure out a way of, okay, how do we do this on a more robust basis.

So one of the novel things about this is that we have a double blind anonymization step in this.

So because industry are sharing incoming inspection data there’s still kind of some concerns from a regulatory perspective in relation to this data. So in order to protect everybody and in this case, this is actually the only case I have for this. So even us as data aggregators, we don’t know who submits the data. That’s been anonymized from us. There’s a mechanism that facilitates that. Mock user names are set up and they’re shared out.

Previously, before we had this platform in place, if there was a mistake in the data, which there often was when you have a spreadsheet,

I would come back to the legal company to go buy the law firm and they were going back to the company going, what does this value here mean?

Then back again to the law firm, back to the company.

So it was really inefficient. It could take a month to get a response back.

So now, through the platform, you can do all that communication online.

Well, for one, because we’re using data validation techniques,

there’s very few errors. But even when there is,

then we can communicate anonymously on the platform.

They’re the kind of things in terms of usability that actually make implementation of these kind of Data Trust, t. hat’s what makes it work. They’re the things that kind of go wrong when you’re in the trenches doing this. And these are the solutions that really help to fix it.

As I mentioned there’s 55 companies. And behind that,

there’s a lot of third party labs that are doing the testing on this data. We’ve set up different mechanisms. Originally, it is very much kind of,

okay,

we’ll come and fill in a template more and more.

Now, we’re actually setting up APIs automatically ingest this data.

So there’s actually a second step that I didn’t show here, as well. If you have a data submitter and a data approver, cause usually there’s a lot of people involved in submitting the data and especially third party labs.

But if you’re responsible for what goes into the database from your own company perspective, you wanna get a final look at it before it gets submitted into the master database.

Usually there’s many submitters, one approver per company, and they’re the ones that goes down through it before it gets submitted to sign off.

And they’re kind of the mechanics of, if you’re sitting there and somebody’s like,

“oh, you mean you’re sucking all my data into this platform”. It’s out of control. So they’re the important things that actually make this happen, that give people comfort to facilitate this.

As I’ve mentioned it comes from all those companies and sits in the law firm. The reason of the law firm in there is to actually give the data some legal privilege. So essentially, when the companies got together and they were like, “how are we going to come up with sharing this data?”

There was a couple of factors to put into place. So one’s, like,

We’ll stick with food integrity. We won’t talk about food safety. So you’re kind of back a little bit from the regulators being compelled to come and “give me what you have” type approach. There’s a step back from that. Having the law form in there gives us some legal previlige.

And then, obviously, the double blind anonymization mean that works, as well. So, when they came up with this concept, actually there’s this guy, ron McNorton, he is now in food standards, Scotland, but he’s a former detective chief inspector for food crime. That was his previous role.

As he moved into the regulatory office in food center, scotland’s equivalent of the FDA. He was approached because he seemed to be very forward thinking in this space and he saw the value of information,

like a good policeman, or attorney, as good as their informant as such. He saw that by facilitating industry to collect this data, giving them a little bit of breathing space,

because there is way more value in industry knowing this data right across,

and it was worth more than taking a very punitive approach from a regulatory perspective.

He was really the crux of it.

So Food Centre Scotland was the first ones to sign up,

to facilitate this and FSA, food safety in Ireland, was next then the UK. There’s discussions on going here locally, as well to facilitate it because people see the value in this. Again, we didn’t come up with this concept. We just came along with a platform afterwards. So, you know, full credit to team [inaudible] and Queen’s University, and the original founders of this, to have kind of cracked open the mechanism of being able to do this.

Again, it presented the agreement and the business reason. The technology wasn’t the problem.

We’ve talked through these already. So, they’re now getting insights that they couldn’t possibly get on their own. It was like the wisdom of the crowd as such. And now that we have that data now, we’re starting to pull in other data sets because now we have a platform. So we can pull in many other data sets and now we’re moving on to prediction and prevention.

Traditionally, Fiin has been taking an historic look at the data but now you can actually start to build out on this and ingest lots of other data and actually move to more prediction / prevention.

That’s quite an exciting project and definitely has a lot of legs, and hopefully it’s going to expand, [inaudible] decided to bond as well, and globally. Yeah. It’s cool.

So Western growers,

this is close to peoples heart here, romaine lettuce and Salinas valley and all the rest of it. You don’t have to mention that anywhere in the US. This is a really challenging space.

Essentially there was a mandate given by FDA, you guys need to get your act together.

We really need to understand how to implement best practice. Western Growers decided, let’s set up a data sharing platform to understand how do we implement best practice? How do we figure out what people are doing here? Where is the risk coming from?

There’s risk of E. coli contamination, but is it from nearby ranches? Or actually, is there other things going on? Is it weather dependent? Is it independent? Does it depend on what factories they are going through? Is it dependent on wild animals coming through?

It’s like the Edward Deming’s quote, “in God we trust, all others bring data”.

That’s kind of where we are here. So, you need to actually just get the data, you need to understand what’s going on. Otherwise, you’re another person with an opinion. That’s really what this platform is about, is trying to get away from the opinions. Come with data, come with facts. See what’s going on. And really try to get to the bottom of it.

Again, very similar kind of a structure here. Pulling data in from a lot of different sources. Very sensitive data. And actually in this case, we do quite a bit of work to figure out who gets to see what, and then what granularity. Because there’s our customer Western Growers in this case, but the actual users are the Growers, and various other people along that supply chain, as well. And some of them are customers of each other and so you have to really careful as to how you step back up from the data and how anonymized you make it. So, that can be challenging.

So again,

I talked about this already. Really the big thing for them is how do you optimize the most cost effective preventative measures? That’s challenging. What’s actually causing it? Until you know that then you don’t really know what the most effective measures are to implement it.

Last case study.

Not in the food safety industry, but it’s still in consumer safety industry. That’s actually where our company originally started about 15 years ago doing exposure assessments.

So we’re approached by these two different entities, Research Institute For Fragrance Materials (RIFM) based in New York. And Cosmetics Europe.

So essentially the challenge was, so it’s quite a complex process in how fragrances are made

and how they end up in products. So how are we going to figure out how to set limits and tally the data intake limits on all of these different parameters.

Because we were doing the exposure science, we’re essentially the sigma symbol in this diagram . So we do that exposure assessment, but in order for us to get the most accurate data we set up our data sharing platform to facilitate the fragrance manufacturers to share their formulations.

So you can imagine extraordinarily business sensitive data. Not like regulatory sensitive, but it’s literally how would they make up the different fragrances, the different scents that they have? There’s a lot in those fragrances, there’s a lot of very similar chemicals used to make those fragrances. But usually, it’s just in different concentrations and some small other additions.

So right now we have what goes into the fragrances. And we have another data set that we get from the product manufacturers,

like,

okay,

we use these fragrances and these concentrations in all of these different products, all these different cosmetics products.

So now we can build that data set up. And the third data set that feeds in is the use habits and practices from consumers. What quantity of different products are people using all across the US? And we can come with exactly what concentration of chemical from all of those different sources is being used. And then there’s some scientific literature data that comes in in terms of retention or absorption into the body and retention. So, it’s bringing in some publicly available data. So we talked about it from a Data Trust. So in terms of scientific literature, very private, confidential data.

Other use habits and practice data which is a paid for data from Kantar.

The Data Trust pulls all of that together. And then at the end of it you get your exposure model.

So you get safer products onto the market. And also you get to maximize revenues because now you know where your maximum concentrations can be.

And next you can run the model in reverse and see what concentration and what headroom is available for different chemicals in different products.

So industry is finding quite a lot of use for that.

Just to end with a couple of quick couple of takeaways,

So doing these types of data projects and implementing it,

it’s a balance essentially. We’ve talked about data standardization and interoperability so you could have the most flexible system where everybody can use whatever they like. All of their own systems. So that’s really easy to roll out and to allow people to get involved.

Or you can go quite on the opposite side,

like, here’s the template. This is exactly what you have to do. There’s only dropdowns everywhere. It’s absolutely formulaic and rigid. And it’s completely inflexible and, really difficult, then, for an industry to implement on site.

So that’s why it’s always a balance.

You’re always trying to find where are we in the spectrum? How can we facilitate some level of flexibility? You’re never starting on green field. It’s always, like, we have a ton of historic data, that’s where you’re starting. So you’re already starting with those templates and you’re trying to suck that in and get that built in.

When you talk about to getting that agreement in place, that’s where a lot of the discussions actually happen. How do you facilitate getting the most easy to implement but also not have the complete wild west of everybody using every different format and really being drowned in data engineering when you’re trying to do it.

So, what we’ve found at it’s core is aligning on terminology,

that’s where it’s at. That’s where the time and effort needs to be spent. Everybody needs to understand what does this term mean in this scenario, for this circumstance? And sometimes you actually even have language to add to the complexity of the matter, as well. You deal with it as a multinational. They deal with this all the time. But in this, the requirement, especially if you’re gonna get some insight that’s going to lead to either capital investment, or recall, or whatever it might be, you need to be damn sure everybody understands exactly what we’re talking about. That there is a document there.

People, naturally as a human, you recollect stuff differently in six months time when you’re talking about these projects that run over the long haul,

your frame of references change by the time they relook at this in six months, a year, or 18 months time.

So having those documents in place and really being rigorous on that, and trying to eliminate as much confusion. It’s always going be there but you’re just trying to minimize it as much as possible.

Find a business benefit. That’s it.

This is what it comes down to. What’s the business benefit? All the other stuff can be fixed. All of the other mechanisms.

The platform, they’re not the limitations. That’s not the challenge in building this. It’s like, how can you identify the benefits so people will, or the company will get over the hurdle of figuring out, okay, how do we put this new business process in place to facilitate this mechanism of sharing data? It’s not something that’s typically part of someone’s day job.

Once it gets in place and it’s up and running, then it’s actually really easy because there’s a mechanism, there’s an owner, there’s somebody to do it. There’s a process and we know how it works. It’s always that initial hurdle.

Actually, using a Data Trust can facilitate better communication interaction up and down the supply chain, as well.

That’s it. Thank you very much.

You might also like

Balancing Sweetness: A How-to Guide

When it comes to beverages, the reformulation challenge is notably simpler, as the sweetness of the sugars can be replaced with far smaller quantities of sweeteners which delivers the desired calorie reduction.

Read more

Petrarch and the mathematics of love

Love is the theme of the great majority of songs, poems, books, films, theatre show and any sort of artistic expression. Probably it is not too far-fetched to say that everyone wants to love and be loved; all of us dream of finding the “twin-soul” with whom share the rest of the life and being surrounded by many friends. “All you need is love”, according to The Beatles.

Read more

Get weekly industry insights from Creme Global