Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. It helps us make sense of information and draw meaningful conclusions.
1. Introduction to Data
Data: A collection of facts, figures, or observations.
Raw Data: Data collected in its original form, before any organization or analysis.
Example: The marks of 10 students in a math test are: 78, 92, 65, 88, 70, 95, 82, 75, 68, 90.
Types of Data:
Qualitative Data (Categorical): Describes qualities or characteristics that cannot be measured numerically.
Examples: Hair color (black, brown, blonde), favorite fruit (apple, banana, orange), gender (male, female).
Quantitative Data (Numerical): Represents quantities that can be measured or counted.
Examples: Height (160 cm, 175 cm), number of siblings (2, 3), temperature (25°C, 30°C).
Discrete Data: Can only take specific, distinct values (often whole numbers).
Example: Number of cars in a parking lot, number of students in a class.
Continuous Data: Can take any value within a given range.
Example: Height, weight, time, temperature.
2. Organization of Data
Organizing data makes it easier to understand and analyze.
Array: Arranging data in ascending or descending order.
Inclusive: Both limits are included (e.g., 30-39, 40-49).
Exclusive: The upper limit is excluded (e.g., 30-40 means values from 30 up to, but not including, 40). This is usually preferred for continuous data. In our example above, 40 is included in the 40-50 interval, not 30-40.
3. Graphical Representation of Data
Visualizing data helps in understanding patterns and trends.
Bar Graph: Used to represent categorical data. Bars are of uniform width, and the height of each bar is proportional to its frequency. There are gaps between bars.
Example: Favorite colors of 20 students.
| Color | Frequency | | :—– | :——– | | Red | 7 | | Blue | 5 | | Green | 3 | | Yellow | 5 | (Imagine a bar graph here with colors on the x-axis and frequency on the y-axis, with separate bars for each color.)
Histogram: Used to represent continuous quantitative data. Bars are adjacent to each other, indicating the continuous nature of the data. The width of the bars represents the class interval, and the height represents the frequency.
Example: From the grouped frequency distribution of student marks (30-40, 40-50, etc.).
(Imagine a histogram here with class intervals on the x-axis and frequency on the y-axis, with adjacent bars.)
Frequency Polygon: Can be drawn with or without a histogram. It connects the midpoints of the tops of the bars of a histogram. For ungrouped data, it connects the points (x, f).
Example (Using class marks from the grouped frequency table): Plot points: (35, 3), (45, 3), (55, 5), (65, 5), (75, 5), (85, 5), (95, 4). Connect these points. To make it a polygon, you typically add midpoints of the imaginary preceding and succeeding class intervals with zero frequency (e.g., 25 with 0 frequency and 105 with 0 frequency).
(Imagine a frequency polygon overlayed on a histogram, or as a standalone line graph connecting the midpoints.)
Pie Chart (Circle Graph): Used to represent parts of a whole. The circle is divided into sectors, where the area of each sector is proportional to the frequency of the category it represents.
Central Angle for a category = (Frequency of Category / Total Frequency) * 360°
Example: Transport modes for 50 students going to school.
| Mode | Frequency | Calculation | Central Angle | | :—— | :——– | :——————————– | :———— | | Bus | 20 | (20/50) * 360° | 144° | | Car | 10 | (10/50) * 360° | 72° | | Bicycle | 15 | (15/50) * 360° | 108° | | Walk | 5 | (5/50) * 360° | 36° | | Total | 50 | | 360° | (Imagine a pie chart divided into sectors with the calculated angles.)
4. Measures of Central Tendency
These are values that tend to cluster around the center of a dataset, providing a typical or representative value.
Mean (Arithmetic Mean): The sum of all observations divided by the total number of observations.
Step Deviation Method (for very large data or when di are multiples of a common factor): xˉ=A+h∑fi∑fiui, where h is the class size and ui=hxi−A.
h = 10 Mode = 60+(2(8)−5−58−5)×10=60+(16−103)×10=60+(63)×10=60+0.5×10=60+5=65 marks
5. Relationship Between Mean, Median, and Mode (Empirical Formula)
For a moderately skewed distribution, there’s an approximate relationship: Mode ≈ 3 Median – 2 Mean
Example: If Mean = 67.33 and Median = 68, Mode ≈3(68)−2(67.33)=204−134.66=69.34 (Note: This is an approximation and might not perfectly match the calculated mode from the formula).
6. Cumulative Frequency Curve (Ogive)
An ogive is a graph showing cumulative frequencies.
Less Than Ogive: Plotted by taking upper class limits on the x-axis and corresponding ‘less than’ cumulative frequencies on the y-axis.
Example (from marks data):
| Marks (Upper Limit) | Less Than Cumulative Frequency | | :—————— | :—————————– | | < 40 | 3 | | < 50 | 6 | | < 60 | 11 | | < 70 | 16 | | < 80 | 21 | | < 90 | 26 | | < 100 | 30 | (Imagine a smooth curve rising from left to right, connecting these points. The x-axis would be Marks and y-axis Cumulative Frequency.)
More Than Ogive: Plotted by taking lower class limits on the x-axis and corresponding ‘more than’ cumulative frequencies on the y-axis.
Example (from marks data):
| Marks (Lower Limit) | More Than Cumulative Frequency | | :—————— | :—————————– | | > 30 | 30 | | > 40 | 27 | | > 50 | 24 | | > 60 | 19 | | > 70 | 14 | | > 80 | 9 | | > 90 | 4 | (Imagine a smooth curve falling from right to left, connecting these points.)
Finding Median from Ogive: The median is the x-coordinate of the point where the ‘less than’ ogive intersects the line y=n/2 (or where the ‘less than’ and ‘more than’ ogives intersect).
Example: For n=30, n/2 = 15. Draw a horizontal line from y=15 on the ‘less than’ ogive. The x-coordinate where it intersects the curve will be the median (around 68 in our example).