Statistics - VCA - Vijayapur

Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. It helps us make sense of information and draw meaningful conclusions.

1. Introduction to Data

Data: A collection of facts, figures, or observations.

Raw Data: Data collected in its original form, before any organization or analysis.
- Example: The marks of 10 students in a math test are: 78, 92, 65, 88, 70, 95, 82, 75, 68, 90.
Types of Data:
- Qualitative Data (Categorical): Describes qualities or characteristics that cannot be measured numerically.
  - Examples: Hair color (black, brown, blonde), favorite fruit (apple, banana, orange), gender (male, female).
- Quantitative Data (Numerical): Represents quantities that can be measured or counted.
  - Examples: Height (160 cm, 175 cm), number of siblings (2, 3), temperature (25°C, 30°C).
  - Discrete Data: Can only take specific, distinct values (often whole numbers).
    - Example: Number of cars in a parking lot, number of students in a class.
  - Continuous Data: Can take any value within a given range.
    - Example: Height, weight, time, temperature.

2. Organization of Data

Organizing data makes it easier to understand and analyze.

Array: Arranging data in ascending or descending order.
- Example (Raw Data): 78, 92, 65, 88, 70, 95, 82, 75, 68, 90
- Example (Ascending Order): 65, 68, 70, 75, 78, 82, 88, 90, 92, 95
Frequency: The number of times a particular observation occurs in a dataset.
Frequency Distribution Table: A table that shows the frequency of each observation or class interval.
- Ungrouped Frequency Distribution: Used when the number of distinct observations is small.
  - Example: Number of siblings for 15 students: 2, 1, 0, 3, 2, 1, 1, 0, 2, 3, 1, 0, 2, 1, 3.
  | Number of Siblings (x) | Frequency (f) | | :——————— | :———— | | 0 | 3 | | 1 | 5 | | 2 | 4 | | 3 | 3 | | Total | 15 |
- Grouped Frequency Distribution: Used when the range of observations is large, and data is grouped into class intervals.
  - Class Interval: A range of values.
  - Lower Limit: The smallest value in a class interval.
  - Upper Limit: The largest value in a class interval.
  - Class Size (or Width): The difference between the upper and lower class limits (or consecutive lower/upper limits).
  - Class Mark (or Mid-point): (Lower Limit + Upper Limit) / 2
  - Example: Marks of 30 students (out of 100): 35, 42, 58, 61, 70, 72, 85, 90, 95, 30, 48, 55, 63, 75, 80, 88, 92, 40, 50, 60, 65, 78, 83, 87, 91, 52, 59, 66, 73, 81. Let’s create class intervals of size 10. | Class Interval | Frequency (f) | Class Mark | | :————- | :———— | :——— | | 30-40 | 3 | 35 | | 40-50 | 3 | 45 | | 50-60 | 5 | 55 | | 60-70 | 5 | 65 | | 70-80 | 5 | 75 | | 80-90 | 5 | 85 | | 90-100 | 4 | 95 | | Total | 30 | |
    - Inclusive vs. Exclusive Class Intervals:
      - Inclusive: Both limits are included (e.g., 30-39, 40-49).
      - Exclusive: The upper limit is excluded (e.g., 30-40 means values from 30 up to, but not including, 40). This is usually preferred for continuous data. In our example above, 40 is included in the 40-50 interval, not 30-40.

3. Graphical Representation of Data

Visualizing data helps in understanding patterns and trends.

Bar Graph: Used to represent categorical data. Bars are of uniform width, and the height of each bar is proportional to its frequency. There are gaps between bars.
- Example: Favorite colors of 20 students.
| Color | Frequency | | :—– | :——– | | Red | 7 | | Blue | 5 | | Green | 3 | | Yellow | 5 | (Imagine a bar graph here with colors on the x-axis and frequency on the y-axis, with separate bars for each color.)
Histogram: Used to represent continuous quantitative data. Bars are adjacent to each other, indicating the continuous nature of the data. The width of the bars represents the class interval, and the height represents the frequency.
- Example: From the grouped frequency distribution of student marks (30-40, 40-50, etc.).
(Imagine a histogram here with class intervals on the x-axis and frequency on the y-axis, with adjacent bars.)
Frequency Polygon: Can be drawn with or without a histogram. It connects the midpoints of the tops of the bars of a histogram. For ungrouped data, it connects the points (x, f).
- Example (Using class marks from the grouped frequency table): Plot points: (35, 3), (45, 3), (55, 5), (65, 5), (75, 5), (85, 5), (95, 4). Connect these points. To make it a polygon, you typically add midpoints of the imaginary preceding and succeeding class intervals with zero frequency (e.g., 25 with 0 frequency and 105 with 0 frequency).
(Imagine a frequency polygon overlayed on a histogram, or as a standalone line graph connecting the midpoints.)
Pie Chart (Circle Graph): Used to represent parts of a whole. The circle is divided into sectors, where the area of each sector is proportional to the frequency of the category it represents.
- Central Angle for a category = (Frequency of Category / Total Frequency) * 360°
- Example: Transport modes for 50 students going to school.
| Mode | Frequency | Calculation | Central Angle | | :—— | :——– | :——————————– | :———— | | Bus | 20 | (20/50) * 360° | 144° | | Car | 10 | (10/50) * 360° | 72° | | Bicycle | 15 | (15/50) * 360° | 108° | | Walk | 5 | (5/50) * 360° | 36° | | Total | 50 | | 360° | (Imagine a pie chart divided into sectors with the calculated angles.)

4. Measures of Central Tendency

These are values that tend to cluster around the center of a dataset, providing a typical or representative value.

Mean (Arithmetic Mean): The sum of all observations divided by the total number of observations.
- For ungrouped data: xˉ=n∑xi
  - Example: Marks: 65, 70, 75, 80, 85 xˉ=565+70+75+80+85=5375=75
- For grouped data (using frequencies): xˉ=∑fi∑fixi (where xi are individual observations or class marks)
  - Example: Number of siblings:
  | xi | fi | fixi | | :—- | :—- | :——– | | 0 | 3 | 0 | | 1 | 5 | 5 | | 2 | 4 | 8 | | 3 | 3 | 9 | | Sum | 15 | 22 | xˉ=1522≈1.47 siblings
  - Example (using class marks for grouped data):
  | Class Interval | Class Mark (xi) | Frequency (fi) | fixi | | :————- | :—————– | :—————- | :——– | | 30-40 | 35 | 3 | 105 | | 40-50 | 45 | 3 | 135 | | 50-60 | 55 | 5 | 275 | | 60-70 | 65 | 5 | 325 | | 70-80 | 75 | 5 | 375 | | 80-90 | 85 | 5 | 425 | | 90-100 | 95 | 4 | 380 | | Sum | | 30 | 2020 | xˉ=302020≈67.33 marks
- Assumed Mean Method (for large data): xˉ=A+∑fi∑fidi, where A is the assumed mean and di=xi−A.
  - Example (using class marks from previous example, let A=65):
  | Class Interval | Class Mark (xi) | fi | di=xi−65 | fidi | | :————- | :—————– | :—- | :————— | :——– | | 30-40 | 35 | 3 | -30 | -90 | | 40-50 | 45 | 3 | -20 | -60 | | 50-60 | 55 | 5 | -10 | -50 | | 60-70 | 65 | 5 | 0 | 0 | | 70-80 | 75 | 5 | 10 | 50 | | 80-90 | 85 | 5 | 20 | 100 | | 90-100 | 95 | 4 | 30 | 120 | | Sum | | 30| | 70 | xˉ=65+3070=65+2.33=67.33
- Step Deviation Method (for very large data or when di are multiples of a common factor): xˉ=A+h∑fi∑fiui, where h is the class size and ui=hxi−A.
  - Example (using previous example, h=10):
  | Class Interval | Class Mark (xi) | fi | ui=(xi−65)/10 | fiui | | :————- | :—————– | :—- | :——————– | :——– | | 30-40 | 35 | 3 | -3 | -9 | | 40-50 | 45 | 3 | -2 | -6 | | 50-60 | 55 | 5 | -1 | -5 | | 60-70 | 65 | 5 | 0 | 0 | | 70-80 | 75 | 5 | 1 | 5 | | 80-90 | 85 | 5 | 2 | 10 | | 90-100 | 95 | 4 | 3 | 12 | | Sum | | 30| | 7 | xˉ=65+10307=65+37=65+2.33=67.33
Median: The middle value of a dataset when arranged in ascending or descending order.
- For ungrouped data:
  - If ‘n’ (number of observations) is odd: Median = ((n+1)/2)th observation.
    - Example: 65, 68, 70, 75, 78, 82, 88 (n=7, odd) Median = ((7+1)/2)th=4th observation = 75
  - If ‘n’ is even: Median = (Average of (n/2)th and (n/2+1)th observations).
    - Example: 65, 68, 70, 75, 78, 82, 88, 90 (n=8, even) Median = (Average of (8/2)th and (8/2+1)th observations) Median = (Average of 4th (75) and 5th (78) observations) = (75+78)/2=76.5
- For grouped data: Median = L+(f2n−cf)×h Where:
  - L = Lower limit of the median class.
  - n = Total number of observations (sum of frequencies).
  - cf = Cumulative frequency of the class preceding the median class.
  - f = Frequency of the median class.
  - h = Class size of the median class.
  Steps:
  1. Create a cumulative frequency table.
  2. Find n/2.
  3. Identify the median class: The class interval whose cumulative frequency is just greater than or equal to n/2.
  - Example (using marks data):
  | Class Interval | Frequency (fi) | Cumulative Frequency (cf) | | :————- | :—————- | :———————— | | 30-40 | 3 | 3 | | 40-50 | 3 | 6 | | 50-60 | 5 | 11 | | 60-70 | 5 | 16 | Median Class (n=30, n/2 = 15; cf just greater than 15 is 16) | 70-80 | 5 | 21 | | 80-90 | 5 | 26 | | 90-100 | 4 | 30 |
  - n = 30, n/2 = 15
  - Median Class = 60-70
  - L = 60
  - cf (preceding median class) = 11
  - f (of median class) = 5
  - h = 10 Median = 60+(515−11)×10=60+(54)×10=60+8=68 marks
Mode: The observation that occurs most frequently in a dataset.
- For ungrouped data:
  - Example: 2, 3, 5, 2, 7, 2, 8, 9, 2. The mode is 2 (occurs 4 times).
  - A dataset can have one mode (unimodal), multiple modes (multimodal), or no mode if all observations have the same frequency.
    - Example (Bimodal): 1, 2, 2, 3, 4, 4, 5. Modes are 2 and 4.
- For grouped data: Mode = L+(2f1−f0−f2f1−f0)×h Where:
  - L = Lower limit of the modal class.
  - f1 = Frequency of the modal class.
  - f0 = Frequency of the class preceding the modal class.
  - f2 = Frequency of the class succeeding the modal class.
  - h = Class size of the modal class.
  Steps:
  1. Identify the modal class: The class interval with the highest frequency.
  - Example (let’s use a modified marks data for a clear modal class):
  | Class Interval | Frequency (fi) | | :————- | :—————- | | 30-40 | 3 | | 40-50 | 3 | | 50-60 | 5 | | 60-70 | 8 | Modal Class (highest frequency is 8) | 70-80 | 5 | | 80-90 | 3 | | 90-100 | 2 |
  - Modal Class = 60-70
  - L = 60
  - f1 = 8
  - f0 = 5
  - f2 = 5
  - h = 10 Mode = 60+(2(8)−5−58−5)×10=60+(16−103)×10=60+(63)×10=60+0.5×10=60+5=65 marks

5. Relationship Between Mean, Median, and Mode (Empirical Formula)

For a moderately skewed distribution, there’s an approximate relationship: Mode ≈ 3 Median – 2 Mean

Example: If Mean = 67.33 and Median = 68, Mode ≈3(68)−2(67.33)=204−134.66=69.34 (Note: This is an approximation and might not perfectly match the calculated mode from the formula).

6. Cumulative Frequency Curve (Ogive)

An ogive is a graph showing cumulative frequencies.

Less Than Ogive: Plotted by taking upper class limits on the x-axis and corresponding ‘less than’ cumulative frequencies on the y-axis.
- Example (from marks data):
| Marks (Upper Limit) | Less Than Cumulative Frequency | | :—————— | :—————————– | | < 40 | 3 | | < 50 | 6 | | < 60 | 11 | | < 70 | 16 | | < 80 | 21 | | < 90 | 26 | | < 100 | 30 | (Imagine a smooth curve rising from left to right, connecting these points. The x-axis would be Marks and y-axis Cumulative Frequency.)
More Than Ogive: Plotted by taking lower class limits on the x-axis and corresponding ‘more than’ cumulative frequencies on the y-axis.
- Example (from marks data):
| Marks (Lower Limit) | More Than Cumulative Frequency | | :—————— | :—————————– | | > 30 | 30 | | > 40 | 27 | | > 50 | 24 | | > 60 | 19 | | > 70 | 14 | | > 80 | 9 | | > 90 | 4 | (Imagine a smooth curve falling from right to left, connecting these points.)
Finding Median from Ogive: The median is the x-coordinate of the point where the ‘less than’ ogive intersects the line y=n/2 (or where the ‘less than’ and ‘more than’ ogives intersect).
- Example: For n=30, n/2 = 15. Draw a horizontal line from y=15 on the ‘less than’ ogive. The x-coordinate where it intersects the curve will be the median (around 68 in our example).

error: Content is protected !!