Outliers – Dealing with Extremes in Cosmetic Exposure

In an exposure assessment there can be a majority of consumers who tend to have the most influence on the average calculation. For example, in the area of cosmetics exposure we find that people's habits and usages can vary greatly; some individuals may use a very large amount of product on themselves, while other individuals may use very little. The high-use consumers will bring up the average, while the low-use consumers can bring down the average in a population. One problem which can arise is that the high-end and low-end users don’t always balance each other out. Therefore, using an average to describe the typical cosmetic exposure in a demographic can be deceptive and may not tell us whether that user population is at risk from over exposure to chemicals in cosmetics. In cosmetic exposure, it is the average and the high-end users who are of interest in a risk assessment.

High-end (extreme) users of cosmetics can be called ‘outliers’, that is, they lie outside of the typical range of product use. Extreme consumers skew the data, causing an imbalance, and bring up the average. However, even though the average may be brought up, it still might appear to be typical or ‘safe’, which can be potentially misleading. Chemical exposure from cosmetics should be represented in a way that acknowledges high-end and low-end users, but also describes the range of exposure in which most of the population lies.

We will now look at how data can be represented so that any imbalance, or skewness, and outliers can be accounted for. Let’s take an example of a sample range of consumer exposures to Product X. The list of numbers below represents the low to high product use (measured in mg) for eleven subjects, on one particular day. We notice that the first ten consumers use a relatively low amount (5 – 45 mg), in comparison to the last consumer in the list, whose product use jumps to 95 mg. Clearly, this data set has more low-end consumers than high-end consumers – a very skewed data range.

         5   7   10   12   15   16   20   25   35   45   95

From the above range, we find that we get an average of 26 mg, which might be a safe/typical amount of Product X, but this average figure is far away from the high-end (95 mg) consumer, and might be misleading to analysts.

So how do we present this data in a way that demonstrates the range where the majority of product use amounts fell, without ignoring the high-end consumers? In this case, we can use what is known as a ‘Box and Whisker’ plot (or box plot for short), invented in 1977 by John Tukey. A box plot is a good way of visually representing where most of the data lies in a range. If we put the above data range into a box and whisker plot we would get the following figure.

Firstly, the consumer who ranked very high (95 mg) can be described as an outlier. That is, they are so different from the rest of the data range that they do not contribute to our box and whisker plot. We can see that the outlier is shown as a single point that lies outside of the range where the majority of the data lie.

We see that the plot also contains a ‘box’ and two ‘whiskers’ protruding from each end of the box. The box and whiskers effectively attempts to split the data into four ranges which represent where most of the data lie (excluding the outliers). Note, there are different formulae which can be used in deciding which datum point(s) are considered outliers, and outliers themselves can also be included in the box and whisker range if so desired.

From our example, we see that one half of the product use data lie within a narrow range from 5 mg to 16 mg. The rest of the data, between 16 mg and 45 mg, are much more spread out.

In the above example, we were dealing with only 11 subjects from a sample. At Creme Global, we can perform analyses on more than 150 million lines of data from real consumer usages and habits, whether it be exposure to substances/chemicals in food, packaging, cosmetics, pesticides or cleaning products. We inevitably find that our data can become naturally skewed, depending on consumer habits. Hence, the box plot is just one of the tools we use to visually represent data, and how that data is spread.

In the figure below we see a sample output from a Creme Global analysis on exposure of a chemical, used in various cosmetic and personal care products, to various body parts. We see how several box plots can be used side-by-side to exemplify which body parts are exposed to a larger range of the chemical, and the spread of the data.

In this example it is illustrated that the underarms have a higher dermal exposure per unit surface area (micrograms per centimetre squared) in comparison to all other body parts. This may indicate that consumers are safe, or under certain conditions, may warrant further investigation.

It is this type of extensive data analysis and output reporting of pertinent information which ultimately allows people to make better decisions in assessing consumer exposures and their associated benefits or risks.

Check out www.cremeglobal.com for more information on our capabilities and expert models.

Written by Mark Lambe on April 5 2013

Signup for our newsletter