Scores
Each homework assumes writing R code, which should be demonstrated in the class.
Homework 1 (up to 10 points)
Take from internet (say, from the
Czech Hydrometeorological Institute) data for
air pollution and temperature at one of the Ostrava stations for a 24-hours
period. Draw histogram of air pollution data in two ways: using built-in R
function, and using your own function. Plot air pollution against temperature.
Try to draw conclusions.
Homework 2 (up to 10 points)
Take some dataset of considerable volume -- either from internet, or from an R
package. This should be a kind of data which has a "bell-shaped" form, for
example biometric data (height, weight), data from financial markets,
temperature, etc.
1. Try to fit this dataset into the normal distribution.
2. Try other distributions (look, for example, at
Wikipedia,
or at the
list of distributions available in R),
try to play with parameters, compare various distributions -- how good they
approximate this dataset.
Homework 3 (up to 5 points)
Write R function accepting as an argument a vector (discrete statistical
distribution), and returning the mode of this statistical distribution.
Homework 4 (up to 10 points)
Demonstrate in R validity of the Central Limit Theorem when taking means of a
big number of independent iterates of the same distribution, for different
distributions. Also demonstrate that the condition of distributions to be
identical is necessary.
Homework 5 (up to 10 points)
Write R function calculating the standard deviation of a vector
\({\bf x} = (x_1, \dots, x_n)\)
(discrete statistical distribution) as discussed earlier in the class, i.e.
\[
\sigma({\bf x}) =
\sqrt{\frac{\big(x_1 - \bar{\bf x}\big)^2 + \dots +
\big(x_n - \bar{\bf x}\big)^2}{n}} .
\]
Compare the result with the built-in R function. Comment on the result.
Homework 6 (up to 10 points)
Write R functions computing skewness and kurtosis of a given vector of numerical
data (discrete statistical distribution). Demonstrate how the functions work.
The next two homeworks are, stricty speaking, not about computations in R, so
they can be submitted as a clearly (!) written piece of (mathematical) text,
either on paper or by email. However, you can involve computer calculations as
you see fit.
Homework 7 (up to 10 points)
Suppose some physical measurement always takes value in the interval
\([0,2a]\), where \(a\) is some positive number. The constant dataset of
\(2n\) such measurements, each of the same value \(a\) has, obviously,
mean \(a\), standard deviation \(0\), and its skewness and kurtosis are not
defined (why?). Now let us "deform" this constant dataset by replacing \(k\)
(where \(0 \le k \le n\)) values by
\(a - \varepsilon\), and \(k\) values by \(a + \varepsilon\) (where
\(0 \le \varepsilon \le a\)). The "deformed" dataset
\[
(\underbrace{a, \dots, a}_{2n-2k},
\underbrace{a - \varepsilon, \dots, a - \varepsilon}_k,
\underbrace{a + \varepsilon, \dots, a + \varepsilon}_k)
\]
has, obviously, the same mean \(a\), but the standard deviation is no longer
zero, and skewness and kurtosis are defined.
1. Assuming \(a\) and \(n\) are fixed, express standard deviation, skewness and
kurtosis as a function of \(\varepsilon\) and \(k\).
2. What is the maximum and minimum possible values of standard deviation,
skewness, and kurtosis? For which values of \(\varepsilon\) and \(k\) these
extreme values are attained?
Homework 8 (up to 10 points)
Compute skewness and kurtosis of the uniform distribution taking (with an equal
frequency)
\(n+1\) values \(0,\frac 1n, \frac 2n, \dots, \frac{n-1}n, 1\).
Homework 9 (up to 10 points)
Provide in R a numerical evidence of the statement considered in the class:
if \(X_1, \dots, X_n\) are identically distributed indepenedent random variables
with mean \(m\) and standard deviation \(\sigma\), then their mean,
\(\frac{X_1 + \dots + X_n}{n}\), is distributed with mean \(m\) and
standard deviation \(\frac{\sigma}{\sqrt{n}}\).
Homework 10 (up to 10 points)
Take a text of considerable length and count the number of occurencies of each
letter of the alphabet. Do as many statitistical analysis as possible of the so
obtained distribution of letters.
Homework 11 (up to 10 points)
Count the number of your "friends" on facebook, or in any environment of such
sort, and for each friend count the number of his friends. Plot the histogram of so obtained distribution. Where
are you located in this distribution -- above or below mean?
Invent a notion of weight appropriate in this case, and compute the
weighted mean of the number of friends among all your friends. Compare it with
the unweighted mean. Comment on the results.
Homework 12 (up to 10 points)
Write R code generating numerical examples of Simpson's paradox discussed at
Class 3 (rate of admission to two or more departments among men and women, see,
for example, an old synopsis).
Homework 13 (up to 5 points)
Write R function computing the confidence interval for a sample from a
normal distribution. (Figure out yourself which arguments this function should
accept and what it should return).
Homework 14 (up to 10 points)
Test for normality (using the methods discussed at Class 4, i.e., normal scores
and Q-Q plots) some (relatively big) dataset you have encountered before at your
homeworks.
Homework 15 (up to 5 points)
Whether each of the following matrices is a correlation matrix?
(Substantiate your answer).
\[
\left(\begin{matrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{matrix}\right)
\quad
\left(\begin{matrix}
1 & 0.1 & 0.5 \\
0.1 & 1 & -1.1 \\
0.5 & -1.1 & 1
\end{matrix}\right)
\quad
\left(\begin{matrix}
0.9 & 0.1 & 0.5 \\
0.1 & 1 & 0.1 \\
0.5 & 0.1 & 1
\end{matrix}\right)
\quad
\left(\begin{matrix}
1 & -0.1 & 0.5 \\
0.1 & 1 & 0.1 \\
0.5 & 0.1 & 1
\end{matrix}\right)
\quad
\left(\begin{matrix}
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end{matrix}\right)
\]
Homework 16 (up to 20 points)
Explore the convergence pattern of iterated correlation matrices.
Homework 17 (up to 5 points)
Build a linear regression between two datasets of your choice. Test how good
the regression is. Comment on the results.
Homework 18 (up to 10 points)
(Pruim, Exercise 6.24)
The object returned by lm() includes a vector named effects.
(If you call the result model, you can access this vector with
model$effects). What are the values in this vector?
(Hint: Think geometrically, make a reasonable guess, and then do some
calculations to confirm your guess. You may use one of the data sets used
earlier, or design your own data set if that helps you figure out what is going
on.)
Homework 19 (up to 5 points)
What is the problem with the function
icorr demonstrated in the class for
computation of iterative correlation matrices? Fix it.
(Hint: try to apply it iteratively to the matrix
\(
\left(\begin{matrix}
1 & 2 \\
3 & 4
\end{matrix}\right)
\)
).
Homework 20 (up to 5 points)
When discussing generalized additive models in the class, we looked at an
example from a book demonstrating superiority/flexibility of generalized
additive models over linear models. Devise further example(s) of this sort.
Homeworks due April 29
Homework 21 (up to 10 points)
Demonstrate clustering capabilities of R on data where the notion of distance is
not so obvious. This can be facebook "friends" from Homework 11, or any other
"interesting" data. Try to use different functions (kmeans,
hclust, etc.) and to compare them.
Homework 22 (up to 5 points)
(Pruim, Exercise 2.51)
A child's game includes a spinner with four colors on it. Each color is one
quarter of the circle. You want to test the spinner to see if it is fair, so you
decide to spin the spinner 50 times and count the number of blues. You do this
and record 8 blues. Carry out a hypothesis test, carefully showing the four
steps. Do it "by hand" (using R but not binom.test). Then check your
work using binom.test.
Homework 23 (up to 15 points)
Try to extend the "null-alternative hypotheses" paradigm to choice among several
(more than two) hypotheses. Can we utilize for this existing R functions?
Created: Wed Feb 11 2026
Last modified: Wed Apr 15 2026 17:58:27 CEST