## Chemistry Statistics Lesson 1

this video covers the first lesson in

statistics from the viewpoint of a

chemist as chemists we deal with data

every day we make measurements on

chemical systems and interpret them to

make inferences about what is going on

chemically in fact the beginning of the

science of modern chemistry is often

said to coincide with the use of

quantitative measurements by the French

scientist Antoine Lavoisier who worked

during the period of the 1760s through

1794 in order to make valid inferences

about numerical data we need to

understand how reliable our measurements

are no doubt you already know about

random processes that can lead to errors

in our measurements so here we are going

to discuss how these errors lead to a

distribution of the data and how that

impinge is on the conclusions that we

can draw from our data consider for a

moment the measurement of glucose in

blood once blood sugar level is very

important to a person who has diabetes

so diabetic individuals usually have

their own devices for monitoring it

here is an electrochemical device that

measures glucose directly from a drop of

blood this device uses disposable test

strips that are inserted into the meter

the user applies a sterile Lance to draw

some blood the tip of the test strip

draws up some sample by capillary action

suppose you apply this device to a drop

of your blood and find that it's a bit

on the high side say 215 milligrams of

glucose per deciliter of blood is some

course of action appropriate before

doing anything drastic you might be

interested in knowing how good a number

that is that is how close can you expect

that value to be to the true

concentration of glucose in your blood

maybe you should take another

measurement and see what the

reproducibility is so you do so ah this

is a little bit lower but which one do

you trust now both measurements were

performed according to the prescribed

procedure how about another perhaps the

average is a good number in general we

would expect that the more measurements

that we average the more reliably the

average represents the true value let's

collect some more values in fact if we

take many more measurements say a

hundred we might see them pile up in a

classical bell-shaped curve now we have

a bit more information about a

measurement we can calculate an average

and if only random errors are operating

then we can expect that the average will

be a good estimate of the true value

furthermore we see that the curve is

symmetric about the average and the

width of the curve indicates how

reproducible the measurements are we

should keep in mind that the variability

in our numbers can be due to some random

changes in the measurement process such

as noise in the device but it also is

possible that there are real

fluctuations in the glucose level from

sample to sample perhaps from moment to

moment the more reproducible the data

the skinnier the curve the wider the

curve the less certainty that we have in

a given measurement of its being close

to the true value the good news is that

the random processes follow the

mathematics of probability theory as

described in the 19th century by the

prints of mathematicians Carl Friedrich

Gauss we can make some generalizations

about the distribution of our data

assuming that

follows a normal curve or Gaussian

distribution general we graph our

measurements with the measured value on

the horizontal axis and the frequency

that a given value is observed on the

vertical axis if we record a very large

number of measurements perhaps hundreds

then the data will be symmetrically

distributed about the average such a

large data set we call the average the

population average and represented with

the Greek letter mu we will convey to

others the width of our distribution

curve by stating the distance between

the average value and the measured value

at the inflection point on the curve

this distance is called the standard

deviation for the data set note that it

has the same units as the measured value

we use the Greek letter Sigma to

represent the standard deviation of a

very large population so for a quick

review mu equals the population average

the average of a very large number of

measurements and Sigma equals the

population standard deviation L showed

that the population standard deviation

can be calculated from this equation

where n is the number of measurements or

data points in the set now in chemical

analysis settings we rarely have the

luxury of taking hundreds of

measurements to determine a single

system it is usually not practical use

of time and resources think about that

for a second would you want to prick

your finger 200 times to determine your

current glucose level so usually we're

dealing with a much smaller data set we

can calculate an average and a standard

deviation for a small set turns out that

the following equation yields a much

better estimate of the larger population

standard deviation we use an S to

represent the sample standard deviation

and x-bar is the average for the sample

set before closing this lesson so let's

look at some other useful terms that we

should be familiar with the square of

the standard deviation shows up in lots

of equations it is often called the

variance for the data set we also note

that the standard deviation is often

what people are referring to when they

talk about the random error or

uncertainty in the data

finally is also useful to think about

how big the error is with respect to the

measured quantity so we define the

relative error as the ratio of the

standard deviation to the average