Different Data Plotting
1. Line Plot
- Purpose: A line plot shows the trend of a variable over time or ordered categories.
- When to use: Ideal for showing time-series data or any continuous variable where the trend is important.
- Description: The x-axis represents the time or sequential order, and the y-axis represents the value of the variable. Data points are connected by lines to show how the variable changes.
Example: A plot showing the stock price of a company over several months.
import matplotlib.pyplot as plt
import numpy as np
# Example Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
stock_price = [100, 120, 130, 125, 140]
plt.plot(months, stock_price, marker='o')
plt.title('Stock Price Over Time')
plt.xlabel('Month')
plt.ylabel('Stock Price')
plt.show()
Visualization: A line graph with months on the x-axis and stock price on the y-axis, showing the trend of stock prices.
2. Scatter Plot
- Purpose: A scatter plot displays the relationship between two variables.
- When to use: Ideal for visualizing correlations or relationships between two continuous variables.
- Description: Each point represents an observation with its x and y values. Useful for spotting trends, clusters, or outliers.
Example: A plot showing the relationship between the number of hours studied and exam scores.
import seaborn as sns
# Example Data
hours_studied = [1, 2, 3, 4, 5, 6, 7, 8, 9]
exam_scores = [50, 55, 60, 65, 70, 75, 80, 85, 90]
sns.scatterplot(x=hours_studied, y=exam_scores)
plt.title('Relationship between Hours Studied and Exam Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.show()
Visualization: A scatter plot with hours studied on the x-axis and exam scores on the y-axis. Each point represents a student’s data.
3. Bar Plot
- Purpose: A bar plot compares categories or groups using rectangular bars.
- When to use: Ideal for comparing discrete categories, such as sales figures, counts, or any group-based data.
- Description: The x-axis represents the categories, and the y-axis represents the numerical value for each category. The bars’ height corresponds to the value.
Example: A plot comparing the sales of different products.
products = ['Product A', 'Product B', 'Product C', 'Product D']
sales = [500, 600, 400, 700]
plt.bar(products, sales)
plt.title('Sales Comparison of Products')
plt.xlabel('Product')
plt.ylabel('Sales')
plt.show()
Visualization: A bar chart with products on the x-axis and sales on the y-axis, showing the comparison of sales across four products.4. Histogram
- Purpose: A histogram visualizes the distribution of a continuous variable.
- When to use: Ideal for showing the distribution of data, such as frequency or probability.
- Description: The x-axis represents the range of values of a continuous variable, and the y-axis shows the frequency or count of values within each range (bin).
Example: A plot showing the distribution of exam scores in a class.
scores = [55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 65, 72, 88, 92, 94]
plt.hist(scores, bins=5, edgecolor='black')
plt.title('Distribution of Exam Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()
Visualization: A histogram with scores on the x-axis and frequency on the y-axis. The bins show how the scores are distributed across different ranges.
5. Box Plot
- Purpose: A box plot represents the distribution of a variable and displays outliers.
- When to use: Ideal for showing the spread of data, identifying outliers, and comparing distributions across categories.
- Description: The box represents the interquartile range (IQR), the line inside the box is the median, and the "whiskers" represent the range of data (excluding outliers). Outliers are shown as individual points.
Example: A plot showing the distribution of salaries across different departments.
departments = ['HR', 'IT', 'Sales', 'Finance']
salaries = [[40000, 45000, 60000, 55000, 52000], # HR salaries
[70000, 80000, 75000, 82000, 78000], # IT salaries
[40000, 50000, 48000, 45000, 47000], # Sales salaries
[60000, 65000, 62000, 64000, 66000]] # Finance salaries
plt.boxplot(salaries, labels=departments)
plt.title('Salary Distribution by Department')
plt.ylabel('Salary')
plt.show()
Visualization: A box plot comparing the distribution of salaries in four departments, showing the range, median, and potential outliers.
6. Heatmap
- Purpose: A heatmap shows the correlation or relationship between variables using colors.
- When to use: Ideal for visualizing correlation matrices or any relationship between multiple variables.
- Description: The x and y axes represent variables, and the color intensity represents the strength or magnitude of the relationship between them.
Example: A plot showing the correlation between different factors in a dataset (e.g., age, height, weight).
import seaborn as sns
import numpy as np
# Example Data: Correlation matrix
data = np.random.rand(5, 5)
correlation_matrix = np.corrcoef(data)
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Heatmap of Variable Correlations')
plt.show()
Visualization: A heatmap displaying how strongly different variables in a dataset are correlated with each other, with colors representing correlation values.
7. Violin Plot
- Purpose: A violin plot combines a box plot and a kernel density plot to represent the distribution of a variable.
- When to use: Ideal for visualizing the distribution of data, especially when comparing multiple categories.
- Description: The width of the "violin" at different values represents the distribution's density. It combines aspects of a box plot and a probability density plot to show both summary statistics and the shape of the distribution.
Example: A plot comparing the distribution of exam scores across different student groups.
groups = ['Group A', 'Group B', 'Group C']
scores = [np.random.normal(70, 10, 100), # Group A
np.random.normal(80, 12, 100), # Group B
np.random.normal(85, 8, 100)] # Group C
sns.violinplot(data=scores)
plt.title('Violin Plot of Exam Scores by Group')
plt.xlabel('Group')
plt.ylabel('Score')
plt.show()
Visualization: A violin plot comparing the distribution of scores in three groups, showing both the density of the distribution and the summary statistics.
Plotting Code
Comments
Post a Comment