LIMIT AND DERIVATIVE
MATHEMATICS EDUCATION DEPARTEMENT
FACULTY OF MATHEMATICS AND NATURAL SINCE
GANESHA UNIVERSITY OF EDUCATION
SINGARAJA
2011
DEFINITION OF LIMIT
The
concept of a "limit" is used to describe the value that
a function or sequence "approaches" as the input or index
approaches some value. The concept of limit allows one to, in a complete space, define a new point from a Cauchy sequence of previously defined points. Limits are
essential to calculus (and mathematical
analysis in
general) and are used to define continuity, derivatives and integrals.
The concept of a limit of a sequence is further generalized to the
concept of a limit of a topological net, and is closely related to limit and direct limit in category theory.
In formulas, limit is usually abbreviated
as lim as in lim(an) = a or
represented by the right arrow (→) as in an → a.
Limit of a function
Whenever a point x is
within δ units of c,f(x) is within ε units of L
For all x > S, f(x)
is within ε of L
Suppose f(x) is a real-valued function and c is
a real number. The expression
means that f(x) can be made to be as
close to L as desired by making x sufficiently
close to c. In that case, it can be stated that "the limit
of fof x, as x approaches c,
is L". Note that this statement can be true even if f(c)
≠ L. Indeed, the function f(x) need not even be
defined at c.
For example, if
then f(1) is not defined, yet as x approaches
1, f(x) approaches 2:
f(0.9)
|
f(0.99)
|
f(0.999)
|
f(1.0)
|
f(1.001)
|
f(1.01)
|
f(1.1)
|
1.900
|
1.990
|
1.999
|
⇒ undef ⇐
|
2.001
|
2.010
|
2.100
|
Thus, f(x) can be made arbitrarily close
to the limit of 2 just by making x sufficiently close to 1.
Karl Weierstrass formalized the definition of the limit of a function
into what became known as the (ε, δ)-definition of
limit in
the 19th century.
In addition to limits at finite values, functions can also
have limits at infinity. For example, consider
§ f(100) = 1.9900
§ f(1000) = 1.9990
§ f(10000) = 1.9999
As x becomes extremely large, the value
of f(x) approaches 2, and the value of f(x)
can be made as close to 2 as one could wish just by pickingx sufficiently
large. In this case, the limit of f(x) as x approaches
infinity is 2. In mathematical notation,
Limit of a sequence
Consider the following sequence: 1.79, 1.799, 1.7999,... It
can be observed that the numbers are "approaching" 1.8, the limit of
the sequence.
Formally, suppose x1, x2,
... is a sequence of real numbers. It can be stated that the real number L is
the limit of this sequence, namely:
to mean
For every real number ε > 0, there exists a natural number n0 such that for all n > n0,
|xn − L| < ε.
Intuitively, this means that eventually all elements of the
sequence get as close as needed to the limit, since the absolute value |xn − L| is the
distance between xn and L. Not every
sequence has a limit; if it does, it is called convergent, and if it does not, it is divergent. One can
show that a convergent sequence has only one limit.
The limit of a sequence and the limit of a function are
closely related. On one hand, the limit of a sequence is simply the limit at
infinity of a function defined on natural numbers. On the other hand, a limit of a function f at x,
if it exists, is the same as the limit of the sequence f(an) where an is
any arbitrary sequence whose limit is x, and where an is
never equal to x. Note that one such sequence would be x + 1/n.
Convergence and fixed point
A formal definition of convergence can be stated as follows.
Suppose pn as n goes
from 0 to is a sequence
that converges to a fixed point p, with for
all n. If positive constants λ and α exist with
then pn as n goes
from 0 to converges
to p of order α, with asymptotic error constant λ
Given a function f(x) = x with
a fixed point p, there is a nice checklist for checking the
convergence of p.
1) First check that p is indeed a fixed point:
f(p) = p
2) Check for linear convergence. Start by finding .
If....
3) If it is found that there is something better than linear
the expression should be checked for quadratic convergence. Start by
finding If....
|
then there is quadratic
convergence provided that is
continuous
|
|
then there is something even
better than quadratic convergence
|
does not exist
|
then there is convergence that is
better than linear but still not quadratic
|
Derivative
The derivative is a
measure of how a function changes as its input changes.
Loosely speaking, a derivative can be thought of as how much one quantity is
changing in response to changes in some other quantity; for example, the
derivative of the position of a moving object with respect to time is the
object's instantaneous velocity (conversely, integrating a car's velocity over time
yields the distance traveled).
The derivative of a function at
a chosen input value describes the best linear approximation of the function near that
input value. For a real-valued function of a single real
variable, the derivative at a point equals the slope of the tangent
line to the graph of the function at that point. In higher
dimensions, the derivative of a function at a point is a linear transformation called the linearization.[1]
A closely related notion is the differential of a function.
The process of finding a
derivative is called differentiation. The reverse process is called antidifferentiation.
The fundamental theorem of calculus
states that antidifferentiation is the same as integration.
Differentiation and integration constitute the two fundamental operations in
single-variable calculus.
Differentiation and the derivative
Differentiation is a
method to compute the rate at which a dependent output y changes with
respect to the change in the independent input x. This rate of change is
called the derivative of y with respect to x. In more
precise language, the dependence of y upon x means that y
is a function of x. This functional
relationship is often denoted y = ƒ(x), where ƒ
denotes the function. If x and y are real
numbers, and if the graph of y is plotted against x,
the derivative measures the slope of this graph at each point.
The simplest case is when y
is a linear function of x, meaning that the graph
of y against x is a straight line. In this case, y = ƒ(x)
= m x + b, for real numbers m and b, and the
slope m is given by
where the symbol Δ (the
uppercase form of the Greek letter Delta)
is an abbreviation for "change in." This formula is true because
y
+ Δy = ƒ(x+ Δx) = m (x + Δx) +
b = m x + b + m Δx = y + mΔx.
It follows that Δy = m
Δx.
This gives an exact value for
the slope of a straight line. If the function ƒ is not linear (i.e. its
graph is not a straight line), however, then the change in y divided by
the change in x varies: differentiation is a method to find an exact
value for this rate of change at any given value of x.
Rate of change as a limiting
value
Figure 1.
The tangent
line at (x, ƒ(x))
Figure 2.
The secant
to curve y= ƒ(x) determined by points (x, ƒ(x))
and (x+h, ƒ(x+h))
Figure 3.
The tangent line as limit of secants
The idea, illustrated by Figures
1-3, is to compute the rate of change as the limiting value of the ratio of the differences Δy / Δx
as Δx becomes infinitely small.
In Leibniz's notation, such an infinitesimal
change in x is denoted by dx, and the derivative of y with
respect to x is written
suggesting the ratio of two
infinitesimal quantities. (The above expression is read as "the derivative
of y with respect to x", "d y by d x", or "d
y over d x". The oral form "d y d x" is often used
conversationally, although it may lead to confusion.)
The most common approach
to turn this intuitive idea into a precise definition uses limits, but there
are other methods, such as non-standard analysis.
Definition via difference quotients
Let ƒ be a real valued
function. In classical geometry, the tangent line to the graph of the function ƒ
at a real number a was the unique line through the point (a, ƒ(a))
that did not meet the graph of ƒ transversally, meaning that the line
did not pass straight through the graph. The derivative of y with
respect to x at a is, geometrically, the slope of the tangent
line to the graph of ƒ at a. The slope of the tangent line is
very close to the slope of the line through (a, ƒ(a)) and
a nearby point on the graph, for example (a + h, ƒ(a
+ h)). These lines are called secant
lines. A value of h close to zero gives a good approximation to the
slope of the tangent line, and smaller values (in absolute
value) of h will, in general, give better approximations.
The slope m of the secant line is the difference between the y
values of these points divided by the difference between the x values,
that is,
This expression is Newton's
difference quotient. The derivative is the
value of the difference quotient as the secant lines approach the tangent line.
Formally, the derivative of the function ƒ at a is the limit
of the difference quotient as h
approaches zero, if this limit exists. If the limit exists, then ƒ is differentiable at a. Here ƒ′
(a) is one of several common notations for the derivative (see below).
Equivalently, the derivative
satisfies the property that
which has the intuitive
interpretation (see Figure 1) that the tangent line to ƒ at a
gives the best linear approximation
to ƒ near a (i.e.,
for small h). This interpretation is the easiest to generalize to other
settings.
Substituting 0 for h in
the difference quotient causes division
by zero, so the slope of the tangent line cannot be found directly using
this method. Instead, define Q(h) to be the difference quotient
as a function of h:
Q(h) is the slope
of the secant line between (a, ƒ(a)) and (a + h,
ƒ(a + h)). If ƒ is a continuous function, meaning that its graph is
an unbroken curve with no gaps, then Q is a continuous function away
from the point h = 0. If the limit exists,
meaning that there is a way of choosing a value for Q(0) that makes the
graph of Q a continuous function, then the function ƒ is
differentiable at the point a, and its derivative at a equals Q(0).
In practice, the existence of a
continuous extension of the difference quotient Q(h) to h =
0 is shown by modifying the numerator to cancel h in the denominator.
This process can be long and tedious for complicated functions, and many
shortcuts are commonly used to simplify the process.
Example
The squaring function ƒ(x)
= x² is differentiable at x = 3, and its derivative there is 6.
This result is established by calculating the limit as h approaches zero
of the difference quotient of ƒ(3):
The last expression shows that
the difference quotient equals 6 + h when h ≠ 0 and is
undefined when h = 0, because of the definition of the difference
quotient. However, the definition of the limit says the difference quotient
does not need to be defined when h = 0. Hence the slope of the graph of
the squaring function at the point (3, 9) is 6, and so its derivative at x
= 3 is ƒ '(3) = 6.
More generally, a similar
computation shows that the derivative of the squaring function at x = a
is ƒ '(a) = 2a.
Continuity and differentiability
This function
does not have a derivative at the marked point, as the function is not
continuous there.
If y = ƒ(x)
is differentiable at a, then ƒ
must also be continuous at a. As an example, choose a
point a and let ƒ be the step
function that returns a value, say 1, for all x less than a,
and returns a different value, say 10, for all x greater than or equal
to a. ƒ cannot have a derivative at a. If h is
negative, then a + h is on the low part of the step, so the
secant line from a to a + h is very steep, and as h
tends to zero the slope tends to infinity. If h is positive, then a
+ h is on the high part of the step, so the secant line from a to
a + h has slope zero. Consequently the secant lines do not
approach any single slope, so the limit of the difference quotient does not
exist.
The absolute
value function is continuous, but fails to be differentiable at x = 0
since the tangent slopes do not approach the same value from the left as they
do from the right.
However, even if a function is
continuous at a point, it may not be differentiable there. For example, the absolute
value function y = |x| is continuous at x = 0, but it
is not differentiable there. If h is positive, then the slope of the
secant line from 0 to h is one, whereas if h is negative,
then the slope of the secant line from 0 to h is negative one.
This can be seen graphically as a "kink" or a "cusp" in the
graph at x = 0. Even a function with a smooth graph is not
differentiable at a point where its tangent
is vertical: For instance, the function y = 3√x
is not differentiable at x = 0.
In summary: for a function ƒ
to have a derivative it is necessary for the
function ƒ to be continuous, but continuity alone is not sufficient.
Most functions that occur in
practice have derivatives at all points or at almost
every point. Early in the history of calculus, many mathematicians assumed
that a continuous function was differentiable at most points. Under mild
conditions, for example if the function is a monotone
function or a Lipschitz function, this is true. However, in
1872 Weierstrass found the first example of a function that is continuous
everywhere but differentiable nowhere. This example is now known as the Weierstrass function. In 1931, Stefan
Banach proved that the set of functions that have a derivative at some
point is a meager
set in the space of all continuous functions.[5]
Informally, this means that hardly any continuous functions have a derivative
at even one point.
The derivative as a function
Let ƒ be a function that
has a derivative at every point a in the domain of ƒ. Because every point a
has a derivative, there is a function that sends the point a to the
derivative of ƒ at a. This function is written f′(x) and
is called the derivative function or the derivative of ƒ.
The derivative of ƒ collects all the derivatives of ƒ at all the
points in the domain of ƒ.
Sometimes ƒ has a
derivative at most, but not all, points of its domain. The function whose value
at a equals f′(a) whenever f′(a) is defined and elsewhere
is undefined is also called the derivative of ƒ. It is still a function,
but its domain is strictly smaller than the domain of ƒ.
Using this idea, differentiation
becomes a function of functions: The derivative is an operator whose domain is the set of all
functions that have derivatives at every point of their domain and whose range
is a set of functions. If we denote this operator by D, then D(ƒ)
is the function f′(x). Since D(ƒ) is a function, it
can be evaluated at a point a. By the definition of the derivative
function, D(ƒ)(a) = f′(a).
For comparison, consider the
doubling function ƒ(x) =2x; ƒ is a real-valued
function of a real number, meaning that it takes numbers as inputs and has
numbers as outputs:
The operator D, however,
is not defined on individual numbers. It is only defined on functions:
Because the output of D
is a function, the output of D can be evaluated at a point. For
instance, when D is applied to the squaring function,
D outputs the doubling
function,
which we named ƒ(x).
This output function can then be evaluated to get ƒ(1) = 2, ƒ(2) =
4, and so on.
Higher derivatives
Let ƒ be a differentiable
function, and let f′(x) be its derivative. The derivative of f′(x)
(if it has one) is written f′′(x) and is called the second
derivative of ƒ. Similarly, the derivative of a second
derivative, if it exists, is written f′′′(x) and is called the third
derivative of ƒ. These repeated derivatives are called higher-order
derivatives.
If x(t) represents
the position of an object at time t, then the higher-order derivatives
of x have physical interpretations. The second derivative of x is
the derivative of x′(t), the velocity, and by definition this is
the object's acceleration. The third derivative of x is
defined to be the jerk, and the fourth derivative is defined to be the
jounce.
A function ƒ need not
have a derivative, for example, if it is not continuous. Similarly, even if ƒ
does have a derivative, it may not have a second derivative. For example, let
Calculation shows that ƒ
is a differentiable function whose derivative is
f′(x) is twice the
absolute value function, and it does not have a derivative at zero. Similar
examples show that a function can have k derivatives for any
non-negative integer k but no (k + 1)-order derivative. A
function that has k successive derivatives is called k times
differentiable. If in addition the kth derivative is continuous,
then the function is said to be of differentiability class Ck.
(This is a stronger condition than having k derivatives. For an example,
see differentiability class.) A function that
has infinitely many derivatives is called infinitely differentiable or smooth.
On the real line, every polynomial function is infinitely
differentiable. By standard differentiation rules, if a polynomial of
degree n is differentiated n times, then it becomes a constant
function. All of its subsequent derivatives are identically zero. In
particular, they exist, so polynomials are smooth functions.
The derivatives of a function ƒ
at a point x provide polynomial approximations to that function near x.
For example, if ƒ is twice differentiable, then
in the sense that
If ƒ is infinitely
differentiable, then this is the beginning of the Taylor
series for ƒ.
Inflection point
A point where the second
derivative of a function changes sign is called an inflection point. At
an inflection point, the second derivative may be zero, as in the case of the
inflection point x=0 of the function y=x3, or
it may fail to exist, as in the case of the inflection point x=0 of the
function y=x1/3. At an inflection point, a function
switches from being a convex function to being a concave
function or vice versa.
Notations for differentiation
Leibniz's notation
The notation for derivatives
introduced by Gottfried Leibniz is one of the earliest. It is
still commonly used when the equation y = ƒ(x) is
viewed as a functional relationship between dependent and independent variables.
Then the first derivative is denoted by
and was once thought of as an infinitesimal
quotient. Higher derivatives are expressed using the notation
for the nth derivative of
y = ƒ(x) (with respect to x). These are abbreviations
for multiple applications of the derivative operator. For example,
With Leibniz's notation, we can
write the derivative of y at the point x = a in
two different ways:
Leibniz's notation allows one to
specify the variable for differentiation (in the denominator). This is
especially relevant for partial differentiation. It also makes the chain rule
easy to remember:
Lagrange's notation
Sometimes referred to as prime
notation, one of the most common modern notations for differentiation is
due to Joseph-Louis Lagrange and uses the prime
mark, so that the derivative of a function ƒ(x) is denoted ƒ′(x)
or simply ƒ′. Similarly, the second and third derivatives are denoted
and
To denote the number of
derivatives beyond this point, some authors use Roman numerals in superscript, whereas others place the
number in parentheses:
or
The latter notation generalizes
to yield the notation ƒ (n) for the nth derivative of
ƒ — this notation is most useful when we wish to talk about the derivative as
being a function itself, as in this case the Leibniz notation can become
cumbersome.
Newton's notation
Newton's notation for differentiation, also
called the dot notation, places a dot over the function name to represent a
time derivative. If y = ƒ(t), then
and
denote, respectively, the first
and second derivatives of y with respect to t. This notation is
used exclusively for time derivatives, meaning that the independent
variable of the function represents time. It is very common in physics and in
mathematical disciplines connected with physics such as differential equations. While the notation
becomes unmanageable for high-order derivatives, in practice only very few
derivatives are needed.
Euler's notation
Euler's
notation uses a differential operator D, which is
applied to a function ƒ to give the first derivative Df. The
second derivative is denoted D2ƒ, and the nth
derivative is denoted Dnƒ.
If y = ƒ(x)
is a dependent variable, then often the subscript x is attached to the D
to clarify the independent variable x. Euler's notation is then written
or ,
although this subscript is often
omitted when the variable x is understood, for instance when this is the
only variable present in the expression.
Euler's notation is useful for
stating and solving linear differential equations.
Computing the derivative
The derivative of a function
can, in principle, be computed from the definition by considering the
difference quotient, and computing its limit. In practice, once the derivatives
of a few simple functions are known, the derivatives of other functions are more
easily computed using rules for obtaining derivatives of more
complicated functions from simpler ones.
Derivatives of elementary functions
Most derivative computations
eventually require taking the derivative of some common functions. The
following incomplete list gives some of the most frequently used functions of a
single real variable and their derivatives.
- Derivatives of powers: if
where r is any real number,
then
wherever this function is
defined. For example, if f(x) = x1
/ 4, then
and the derivative function is
defined only for positive x, not for x = 0. When r = 0,
this rule implies that f′(x) is zero for x ≠ 0, which is
almost the constant rule (stated below).
- Exponential and logarithmic functions:
- Trigonometric functions:
- Inverse trigonometric functions:
Rules for finding the derivative
In many cases, complicated limit
calculations by direct application of Newton's difference quotient can be
avoided using differentiation rules. Some of the most basic rules are the
following.
- Constant rule: if ƒ(x) is constant, then
- Sum rule:
for
all functions ƒ and g and all real numbers a and b.
- Product rule:
for
all functions ƒ and g.
- Quotient rule:
for
all functions ƒ and g where g ≠ 0.
- Chain rule: If f(x) = h(g(x)), then
Example computation
The derivative of
is
Here the second term was
computed using the chain rule and third using the product
rule. The known derivatives of the elementary functions x2,
x4, sin(x), ln(x) and exp(x) = ex,
as well as the constant 7, were also used.
Derivatives in higher dimensions
Derivatives of vector valued functions
A vector-valued function y(t) of
a real variable sends real numbers to vectors in some vector
space Rn. A vector-valued function can be split up
into its coordinate functions y1(t), y2(t),
…, yn(t), meaning that y(t) = (y1(t),
..., yn(t)). This includes, for example, parametric
curves in R2 or R3. The coordinate
functions are real valued functions, so the above definition of derivative
applies to them. The derivative of y(t) is defined to be the vector, called the tangent vector, whose coordinates
are the derivatives of the coordinate functions. That is,
Equivalently,
if the limit exists. The
subtraction in the numerator is subtraction of vectors, not scalars. If the
derivative of y exists for every value of t, then y′ is
another vector valued function.
If e1, …, en
is the standard basis for Rn, then y(t)
can also be written as y1(t)e1 + … +
yn(t)en. If we assume that
the derivative of a vector-valued function retains the linearity property, then the
derivative of y(t) must be
because each of the basis
vectors is a constant.
This generalization is useful,
for example, if y(t) is the position vector of a particle at time
t; then the derivative y′(t) is the velocity vector
of the particle at time t.
Partial derivatives
Suppose that ƒ is a
function that depends on more than one variable. For instance,
ƒ can be reinterpreted as
a family of functions of one variable indexed by the other variables:
In other words, every value of x
chooses a function, denoted fx, which is a function of one
real number.[9]
That is,
Once a value of x is
chosen, say a, then f(x,y) determines a function fa
that sends y to a² + ay + y²:
In this expression, a is
a constant, not a variable, so fa is a function
of only one real variable. Consequently the definition of the derivative for a
function of one variable applies:
The above procedure can be
performed for any choice of a. Assembling the derivatives together into
a function gives a function that describes the variation of ƒ in the y
direction:
This is the partial derivative
of ƒ with respect to y. Here ∂ is a rounded d called the partial
derivative symbol. To distinguish it from the letter d, ∂ is
sometimes pronounced "der", "del", or "partial"
instead of "dee".
In general, the partial
derivative of a function ƒ(x1, …, xn)
in the direction xi at the point (a1 …, an)
is defined to be:
In the above difference
quotient, all the variables except xi are held fixed. That
choice of fixed values determines a function of one variable
and, by definition,
In other words, the different
choices of a index a family of one-variable functions just as in the
example above. This expression also shows that the computation of partial
derivatives reduces to the computation of one-variable derivatives.
An important example of a
function of several variables is the case of a scalar-valued function ƒ(x1,...xn)
on a domain in Euclidean space Rn (e.g., on R²
or R³). In this case ƒ has a partial derivative ∂ƒ/∂xj
with respect to each variable xj. At the point a,
these partial derivatives define the vector
This vector is called the gradient
of ƒ at a. If ƒ is differentiable at every point in some
domain, then the gradient is a vector-valued function ∇ƒ
that takes the point a to the vector ∇f(a). Consequently the
gradient determines a vector field.
Directional derivatives
If ƒ is a real-valued
function on Rn, then the partial derivatives of ƒ
measure its variation in the direction of the coordinate axes. For example, if ƒ
is a function of x and y, then its partial derivatives measure
the variation in ƒ in the x direction and the y direction.
They do not, however, directly measure the variation of ƒ in any other
direction, such as along the diagonal line y = x. These are
measured using directional derivatives. Choose a vector
The directional derivative
of ƒ in the direction of v at the point x is the limit
In some cases it may be easier
to compute or estimate the directional derivative after changing the length of
the vector. Often this is done to turn the problem into the computation of a
directional derivative in the direction of a unit vector. To see how this
works, suppose that v = λu. Substitute h = k/λ into
the difference quotient. The difference quotient becomes:
This is λ times the difference
quotient for the directional derivative of f with respect to u.
Furthermore, taking the limit as h tends to zero is the same as taking
the limit as k tends to zero because h and k are multiples
of each other. Therefore Dv(ƒ) = λDu(ƒ).
Because of this rescaling property, directional derivatives are frequently
considered only for unit vectors.
If all the partial derivatives
of ƒ exist and are continuous at x, then they determine the
directional derivative of ƒ in the direction v by the formula:
This is a consequence of the
definition of the total derivative. It follows that the directional
derivative is linear in v, meaning that Dv
+ w(ƒ) = Dv(ƒ) + Dw(ƒ).
The same definition also works
when ƒ is a function with values in Rm. The above
definition is applied to each component of the vectors. In this case, the
directional derivative is a vector in Rm.
The total derivative, the total differential and the Jacobian
When ƒ is a function from
an open subset of Rn to Rm,
then the directional derivative of ƒ in a chosen direction is the best
linear approximation to ƒ at that point and in that direction. But when n
> 1, no single directional derivative can give a complete picture of the
behavior of ƒ. The total derivative, also called the (total) differential, gives a complete
picture by considering all directions at once. That is, for any vector v
starting at a, the linear approximation formula holds:
Just like the single-variable
derivative, ƒ ′(a) is chosen so that the error in this
approximation is as small as possible.
If n and m are
both one, then the derivative ƒ ′(a) is a number and the
expression ƒ ′(a)v is the product of two numbers. But
in higher dimensions, it is impossible for ƒ ′(a)
to be a number. If it were a number, then ƒ ′(a)v
would be a vector in Rn while the other terms would be
vectors in Rm, and therefore the formula would not
make sense. For the linear approximation formula to make sense, ƒ ′(a)
must be a function that sends vectors in Rn to vectors
in Rm, and ƒ ′(a)v must denote
this function evaluated at v.
To determine what kind of
function it is, notice that the linear approximation formula can be rewritten
as
Notice that if we choose another
vector w, then this approximate equation determines another approximate
equation by substituting w for v. It determines a third
approximate equation by substituting both w for v and a + v
for a. By subtracting these two new equations, we get
If we assume that v is
small and that the derivative varies continuously in a, then ƒ ′(a
+ v) is approximately equal to ƒ ′(a), and therefore the
right-hand side is approximately zero. The left-hand side can be rewritten in a
different way using the linear approximation formula with v + w
substituted for v. The linear approximation formula implies:
This suggests that ƒ ′(a)
is a linear transformation from the vector space Rn
to the vector space Rm. In fact, it is possible to
make this a precise derivation by measuring the error in the approximations.
Assume that the error in these linear approximation formula is bounded by a
constant times ||v||, where the constant is independent of v but
depends continuously on a. Then, after adding an appropriate error term,
all of the above approximate equalities can be rephrased as inequalities. In
particular, ƒ ′(a) is a linear transformation up to a small
error term. In the limit as v and w tend to zero, it must
therefore be a linear transformation. Since we define the total derivative by
taking a limit as v goes to zero, ƒ ′(a) must be a linear
transformation.
In one variable, the fact that
the derivative is the best linear approximation is expressed by the fact that
it is the limit of difference quotients. However, the usual difference quotient
does not make sense in higher dimensions because it is not usually possible to
divide vectors. In particular, the numerator and denominator of the difference
quotient are not even in the same vector space: The numerator lies in the
codomain Rm while the denominator lies in the domain Rn.
Furthermore, the derivative is a linear transformation, a different type of
object from both the numerator and denominator. To make precise the idea that ƒ ′ (a)
is the best linear approximation, it is necessary to adapt a different formula
for the one-variable derivative in which these problems disappear. If ƒ :
R → R, then the usual definition of the derivative may be
manipulated to show that the derivative of ƒ at a is the unique
number ƒ ′(a) such that
This is equivalent to
because the limit of a function
tends to zero if and only if the limit of the absolute value of the function
tends to zero. This last formula can be adapted to the many-variable situation
by replacing the absolute values with norms.
The definition of the total
derivative of ƒ at a, therefore, is that it is the unique
linear transformation ƒ ′(a) : Rn
→ Rm such that
Here h is a vector in Rn,
so the norm in the denominator is the standard length on Rn.
However, ƒ′(a)h is a vector in Rm,
and the norm in the numerator is the standard length on Rm.
If v is a vector starting at a, then ƒ ′(a)v
is called the pushforward of v by ƒ and
is sometimes written ƒ*v.
If the total derivative exists
at a, then all the partial derivatives and directional derivatives of ƒ
exist at a, and for all v, ƒ ′(a)v is the
directional derivative of ƒ in the direction v. If we write ƒ
using coordinate functions, so that ƒ = (ƒ1, ƒ2,
..., ƒm), then the total derivative can be expressed using
the partial derivatives as a matrix. This matrix is called the Jacobian
matrix of ƒ at a:
The existence of the total
derivative ƒ′(a) is strictly stronger than the existence of all
the partial derivatives, but if the partial derivatives exist and are
continuous, then the total derivative exists, is given by the Jacobian, and
depends continuously on a.
The definition of the total
derivative subsumes the definition of the derivative in one variable. That is,
if ƒ is a real-valued function of a real variable, then the total
derivative exists if and only if the usual derivative exists. The Jacobian matrix
reduces to a 1×1 matrix whose only entry is the derivative ƒ′(x).
This 1×1 matrix satisfies the property that ƒ(a + h) − ƒ(a)
− ƒ ′(a)h is approximately zero, in other
words that
Up to changing variables, this
is the statement that the function is
the best linear approximation to ƒ at a.
The total derivative of a
function does not give another function in the same way as the one-variable
case. This is because the total derivative of a multivariable function has to
record much more information than the derivative of a single-variable function.
Instead, the total derivative gives a function from the tangent
bundle of the source to the tangent bundle of the target.
The natural analog of second,
third, and higher-order total derivatives is not a linear transformation, is
not a function on the tangent bundle, and is not built by repeatedly taking the
total derivative. The analog of a higher-order derivative, called a jet, cannot be a linear transformation because
higher-order derivatives reflect subtle geometric information, such as
concavity, which cannot be described in terms of linear data such as vectors.
It cannot be a function on the tangent bundle because the tangent bundle only
has room for the base space and the directional derivatives. Because jets
capture higher-order information, they take as arguments additional coordinates
representing higher-order changes in direction. The space determined by these
additional coordinates is called the jet bundle.
The relation between the total derivative and the partial derivatives of a
function is paralleled in the relation between the kth order jet of a function
and its partial derivatives of order less than or equal to k.