--- Original DataFrame Info --- Original Data Types: Sell_Through_Rate object Damage_Rate object Snapshot_Date object Manufacturing_Date object Selling_Date object Product_Type object Edition_Type object Target_Gender object Price_Bucket object Sneaker_Grade object Price int64 dtype: object First few rows of original DataFrame: Sell_Through_Rate Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade Price 0 85% 5% 2023-01-01 2022-10-01 2023-01-10 Running Limited Male High A 150 1 92% 2% 2023-01-02 2022-10-15 2023-01-20 Basketball Standard Female Medium B 120 2 78% 10% 2023-01-03 2022-11-01 2023-01-15 Casual Limited Unisex Low C 80 3 65% 1% 2023-01-04 2022-09-01 2023-01-25 Running Standard Male Medium A 110 4 99% 0% 2023-01-05 2022-12-01 2023-01-08 Basketball Limited Female High B 200 --- Starting Data Cleaning Process --- --- Cleaning Percentage Columns --- Converted 'Sell_Through_Rate' to decimal (e.g., 0.85) Converted 'Damage_Rate' to decimal (e.g., 0.05) --- Parsing Date Columns --- Parsed 'Snapshot_Date' to datetime format. Parsed 'Manufacturing_Date' to datetime format. Parsed 'Selling_Date' to datetime format. --- Missing Values Analysis --- No missing values found in the DataFrame. --- Data Types After Conversions --- Sell_Through_Rate float64 Damage_Rate float64 Snapshot_Date datetime64[ns] Manufacturing_Date datetime64[ns] Selling_Date datetime64[ns] Product_Type object Edition_Type object Target_Gender object Price_Bucket object Sneaker_Grade object Price int64 dtype: object --- Basic Statistics for Numerical Columns --- count mean std min 25% 50% 75% max Sell_Through_Rate 5.00 0.84 0.13 0.65 0.78 0.85 0.92 0.99 Damage_Rate 5.00 0.04 0.04 0.00 0.01 0.02 0.05 0.10 Price 5.00 132.00 45.50 80.00 110.00 120.00 150.00 200.00 --- Unique Values in Categorical Columns --- 'Product_Type': Running 2 Basketball 2 Casual 1 Name: Product_Type, dtype: int64 'Edition_Type': Limited 3 Standard 2 Name: Edition_Type, dtype: int64 'Target_Gender': Male 2 Female 2 Unisex 1 Name: Target_Gender, dtype: int64 'Price_Bucket': High 2 Medium 2 Low 1 Name: Price_Bucket, dtype: int64 'Sneaker_Grade': A 2 B 2 C 1 Name: Sneaker_Grade, dtype: int64 --- Data cleaning complete. Ready for feature engineering! --- First few rows of cleaned DataFrame: Sell_Through_Rate Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade Price 0 0.85 0.05 2023-01-01 2022-10-01 2023-01-10 Running Limited Male High A 150 1 0.92 0.02 2023-01-02 2022-10-15 2023-01-20 Basketball Standard Female Medium B 120 2 0.78 0.10 2023-01-03 2022-11-01 2023-01-15 Casual Limited Unisex Low C 80 3 0.65 0.01 2023-01-04 2022-09-01 2023-01-25 Running Standard Male Medium A 110 4 0.99 0.00 2023-01-05 2022-12-01 2023-01-08 Basketball Limited Female High B 200
--- Initial Data Inspection --- Sneakers DataFrame (sneakers.csv): Shape: (500, 22) First 5 rows:
Product_Name | Product_Type | Total_Produced | Total_Sold | Damaged_Units | Month | Year | Edition_Type | Unit_Price | Target_Gender | Sell_Through_Rate | Damage_Rate | Unsold_Inventory | Estimated_Revenue | Quarter | Snapshot_Date | Price_Bucket | Manufacturing_Date | Selling_Date | Performance_Score | Sneaker_Grade | Units_Received | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Nike Air Force 1 '07 Sneakers | Creamy | 24592 | 5819 | 9012 | September | 2017 | Limited | 115 | Men | 23.66% | 36.65% | 9761 | 669185 | Q3 | 9/1/2017 | High | 9/1/2017 | 11/6/2017 | -12.98 | E | 24592 |
1 | Adidas Originals Samba OG Shoes | Creamy | 81482 | 16395 | 38698 | September | 2019 | Standard | 100 | Men | 20.12% | 47.49% | 26389 | 1639500 | Q3 | 9/1/2019 | Mid | 9/1/2019 | 9/30/2019 | -27.37 | E | 81482 |
2 | Air Jordan 1 Mid Shoes | Peanut Butter | 76237 | 8478 | 26062 | January | 2023 | Special Release | 110 | Women | 11.12% | 34.19% | 41697 | 932580 | Q1 | 1/1/2023 | High | 1/1/2023 | 3/28/2023 | -23.06 | E | 76237 |
3 | Red Tape Casual Sneakers | Brownie | 46463 | 5425 | 10463 | October | 2022 | Anniversary | 35 | Men | 11.68% | 22.52% | 30575 | 189875 | Q4 | 10/1/2022 | Low | 10/1/2022 | 12/15/2022 | -10.84 | E | 46463 |
4 | Nike Court Vision Low Shoes | Peanut Butter | 54118 | 11698 | 6078 | April | 2019 | Standard | 80 | Women | 21.62% | 11.23% | 36342 | 935840 | Q2 | 4/1/2019 | Mid | 4/1/2019 | 5/6/2019 | 10.38 | D | 54118 |
Info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 500 entries, 0 to 499 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Product_Name 500 non-null object 1 Product_Type 500 non-null object 2 Total_Produced 500 non-null int64 3 Total_Sold 500 non-null int64 4 Damaged_Units 500 non-null int64 5 Month 500 non-null object 6 Year 500 non-null int64 7 Edition_Type 500 non-null object 8 Unit_Price 500 non-null int64 9 Target_Gender 500 non-null object 10 Sell_Through_Rate 500 non-null object 11 Damage_Rate 500 non-null object 12 Unsold_Inventory 500 non-null int64 13 Estimated_Revenue 500 non-null int64 14 Quarter 500 non-null object 15 Snapshot_Date 500 non-null object 16 Price_Bucket 500 non-null object 17 Manufacturing_Date 500 non-null object 18 Selling_Date 500 non-null object 19 Performance_Score 500 non-null float64 20 Sneaker_Grade 500 non-null object 21 Units_Received 500 non-null int64 dtypes: float64(1), int64(8), object(13) memory usage: 86.1+ KB ================================================================================ Sneakers with Dates DataFrame (snicker_dataset_with_dates.csv): Shape: (3, 5) First 5 rows:
Product_ID | Snapshot_Date | Manufacturing_Date | Selling_Date | Stock_Quantity | |
---|---|---|---|---|---|
0 | 1 | 2023-01-01 | 2022-10-01 | 2023-01-10 | 100 |
1 | 2 | 2023-01-02 | 2022-10-15 | 2023-01-20 | 50 |
2 | 3 | 2023-01-03 | 2022-11-01 | 2023-01-15 | 75 |
Info: <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Product_ID 3 non-null int64 1 Snapshot_Date 3 non-null object 2 Manufacturing_Date 3 non-null object 3 Selling_Date 3 non-null object 4 Stock_Quantity 3 non-null int64 dtypes: int64(2), object(3) memory usage: 252.0+ bytes
--- Column Name Mapping --- Original column names in sneakers_df: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received'] Original column names in sneakers_dates_df: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity'] Common columns to use for merging: {'Manufacturing_Date', 'Snapshot_Date', 'Selling_Date'} --- Data Cleaning Summary --- Before cleaning: ------------------------------ sneakers_df shape: (500, 22) sneakers_dates_df shape: (3, 5) Missing values in sneakers_df:
missing_count | missing_percentage |
---|
Missing values in sneakers_dates_df:
missing_count | missing_percentage |
---|
Task 2.0 - Initial setup complete. Next steps will depend on the output above.
Shape of cleaned data: (500, 22) First few rows of cleaned data:
Product_Name | Product_Type | Total_Produced | Total_Sold | Damaged_Units | Month | Year | Edition_Type | Unit_Price | Target_Gender | Sell_Through_Rate | Damage_Rate | Unsold_Inventory | Estimated_Revenue | Quarter | Snapshot_Date | Price_Bucket | Manufacturing_Date | Selling_Date | Performance_Score | Sneaker_Grade | Units_Received | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Nike Air Force 1 '07 Sneakers | Creamy | 24592 | 5819 | 9012 | September | 2017 | Limited | 115 | Men | 23.66% | 36.65% | 9761 | 669185 | Q3 | 9/1/2017 | High | 9/1/2017 | 11/6/2017 | -12.98 | E | 24592 |
1 | Adidas Originals Samba OG Shoes | Creamy | 81482 | 16395 | 38698 | September | 2019 | Standard | 100 | Men | 20.12% | 47.49% | 26389 | 1639500 | Q3 | 9/1/2019 | Mid | 9/1/2019 | 9/30/2019 | -27.37 | E | 81482 |
Remaining missing values:
Series([], dtype: int64)
Date range in data: Earliest Manufacturing Date: 2015-01-01 00:00:00 Latest Selling Date: 2025-02-04 00:00:00 Preview of Days_to_Sell calculation: Average days to sell: 51.76 Days to sell distribution: count 500.00 mean 51.76 std 22.03 min 15.00 25% 33.00 50% 50.50 75% 72.00 max 90.00 dtype: float64
Original shape before one-hot encoding: (500, 24) Shape after one-hot encoding: (500, 33) Features selected for clustering: ['Total_Produced', 'Total_Sold', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Estimated_Revenue', 'Days_to_Sell', 'Is_Limited_Edition'] Shape of scaled features: (500, 8) First few rows of scaled features:
Total_Produced | Total_Sold | Unit_Price | Sell_Through_Rate | Damage_Rate | Estimated_Revenue | Days_to_Sell | Is_Limited_Edition | |
---|---|---|---|---|---|---|---|---|
0 | -1.12 | -1.09 | 1.26 | -1.25 | 0.73 | -0.93 | 0.65 | 2.13 |
1 | 1.06 | -0.60 | 0.61 | -1.39 | 1.29 | -0.43 | -1.03 | -0.47 |
2 | 0.86 | -0.97 | 1.04 | -1.74 | 0.61 | -0.79 | 1.56 | -0.47 |
3 | -0.28 | -1.11 | -2.22 | -1.72 | 0.01 | -1.17 | 1.06 | -0.47 |
4 | 0.01 | -0.82 | -0.26 | -1.33 | -0.58 | -0.79 | -0.76 | -0.47 |
['X_scaled_df', 'cleaned_sneakers_data', 'sneakers_clean', 'sneakers_dates_clean', 'sneakers_dates_df', 'sneakers_df', 'sneakers_encoded']
Columns in sneakers_encoded: ['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E'] First few rows:
Product_Name | Total_Produced | Total_Sold | Damaged_Units | Month | Year | Edition_Type | Unit_Price | Sell_Through_Rate | Damage_Rate | Unsold_Inventory | Estimated_Revenue | Quarter | Snapshot_Date | Manufacturing_Date | Selling_Date | Performance_Score | Units_Received | Days_to_Sell | Is_Limited_Edition | Product_Type_Brownie | Product_Type_Classic | Product_Type_Creamy | Product_Type_Crisper | Product_Type_Peanut Butter | Product_Type_White Chocolate | Target_Gender_Women | Price_Bucket_Low | Price_Bucket_Mid | Sneaker_Grade_B | Sneaker_Grade_C | Sneaker_Grade_D | Sneaker_Grade_E | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Nike Air Force 1 '07 Sneakers | 24592 | 5819 | 9012 | September | 2017 | Limited | 115 | 0.24 | 0.37 | 9761 | 669185 | Q3 | 9/1/2017 | 2017-09-01 | 2017-11-06 | -12.98 | 24592 | 66 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
1 | Adidas Originals Samba OG Shoes | 81482 | 16395 | 38698 | September | 2019 | Standard | 100 | 0.20 | 0.47 | 26389 | 1639500 | Q3 | 9/1/2019 | 2019-09-01 | 2019-09-30 | -27.37 | 81482 | 29 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
--- sneakers_encoded --- Columns: ['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E'] Shape: (500, 33) --- sneakers_df --- Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received'] Shape: (500, 22) --- sneakers_clean --- Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition'] Shape: (500, 24) --- sneakers_dates_clean --- Columns: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity'] Shape: (3, 5)
Selecting features for clustering... Selected features: Sell_Through_Rate Unit_Price Days_to_Sell Is_Limited_Edition Damage_Rate Total_Sold 0 0.24 115 66 1 0.37 5819 1 0.20 100 29 0 0.47 16395 Scaling features... Success! Scaled features (first 2 rows):
Sell_Through_Rate | Unit_Price | Days_to_Sell | Is_Limited_Edition | Damage_Rate | Total_Sold | |
---|---|---|---|---|---|---|
0 | -1.25 | 1.26 | 0.65 | 2.13 | 0.73 | -1.09 |
1 | -1.39 | 0.61 | -1.03 | -0.47 | 1.29 | -0.60 |
Feature scaling complete! Ready for clustering. Shape of scaled features: (500, 6)
Calculating inertia and silhouette scores for different K values...
K=2: Inertia = 2284.61, Silhouette Score = 0.2124
K=3: Inertia = 1904.79, Silhouette Score = 0.2398
K=4: Inertia = 1673.78, Silhouette Score = 0.2008
K=5: Inertia = 1486.38, Silhouette Score = 0.2106
K=6: Inertia = 1371.55, Silhouette Score = 0.2040
K=7: Inertia = 1257.42, Silhouette Score = 0.2140
K=8: Inertia = 1179.24, Silhouette Score = 0.2087
K=9: Inertia = 1106.97, Silhouette Score = 0.2153
K=10: Inertia = 1046.70, Silhouette Score = 0.2023
Analysis complete! Review the plots to determine the optimal K. Look for the 'elbow' in the left plot and the highest silhouette score in the right plot.
Training final K-Means model with K=4...
Cluster Distribution: 0 138 1 129 2 146 3 87 Name: Cluster, dtype: int64 Analyzing cluster profiles... Cluster Profiles (mean values):
Sell_Through_Rate | Unit_Price | Days_to_Sell | Is_Limited_Edition | Total_Sold | Count | |
---|---|---|---|---|---|---|
Cluster | ||||||
0 | 0.56 | 88.22 | 32.93 | 0.00 | 23856.01 | 138 |
1 | 0.83 | 83.29 | 58.97 | 0.02 | 52741.26 | 129 |
2 | 0.32 | 88.01 | 62.67 | 0.00 | 15478.42 | 146 |
3 | 0.55 | 83.10 | 52.62 | 1.00 | 26306.30 | 87 |
Identifying 'Resell Goldmine' clusters... Potential 'Resell Goldmine' clusters: [] No clusters met the 'Resell Goldmine' criteria. Consider adjusting the criteria.
No clusters meet both high sell-through and limited edition criteria. Finding best cluster based on highest sell-through rate... Best cluster based on sell-through rate: 1 ================================================================================ RESELL GOLDMINE ANALYSIS ================================================================================ 🎯 Best Performing Cluster: 1 📊 Cluster Size: 129 sneakers 🔥 Average Sell-Through Rate: 83.0% 💰 Average Price: $83.29 ⏱️ Average Days to Sell: 59.0 days 🏆 Limited Edition Ratio: 2% ======================================== TOP 10 SNEAKERS IN BEST CLUSTER ======================================== 1. Lacoste Carnaby Evo 🔸 Price: $95 🔸 Sell-Through: 99.8% 🔸 Days to Sell: 68 days 🔸 Limited Edition: No 2. Puma Smash V2 Sneakers 🔸 Price: $65 🔸 Sell-Through: 99.7% 🔸 Days to Sell: 33 days 🔸 Limited Edition: No 3. Air Jordan 1 Mid Shoes 🔸 Price: $110 🔸 Sell-Through: 99.7% 🔸 Days to Sell: 89 days 🔸 Limited Edition: Yes 4. Columbia Trailstorm Waterproof 🔸 Price: $110 🔸 Sell-Through: 99.6% 🔸 Days to Sell: 81 days 🔸 Limited Edition: No 5. Saucony Jazz Original 🔸 Price: $80 🔸 Sell-Through: 99.5% 🔸 Days to Sell: 60 days 🔸 Limited Edition: No 6. Red Tape Casual Sneakers 🔸 Price: $35 🔸 Sell-Through: 99.2% 🔸 Days to Sell: 34 days 🔸 Limited Edition: No 7. Air Jordan 1 Mid Shoes 🔸 Price: $110 🔸 Sell-Through: 99.1% 🔸 Days to Sell: 59 days 🔸 Limited Edition: Yes 8. Skechers D'Lites 🔸 Price: $75 🔸 Sell-Through: 99.1% 🔸 Days to Sell: 73 days 🔸 Limited Edition: No 9. Skechers D'Lites 🔸 Price: $75 🔸 Sell-Through: 98.8% 🔸 Days to Sell: 16 days 🔸 Limited Edition: No 10. Red Tape Casual Sneakers 🔸 Price: $35 🔸 Sell-Through: 98.6% 🔸 Days to Sell: 56 days 🔸 Limited Edition: No ======================================== RECOMMENDATIONS ======================================== 1. Focus on acquiring sneakers from Cluster 1 2. Consider both limited and standard editions based on performance 3. Monitor the top performers for restocks or resale opportunities 4. Track how these products perform over time to refine your strategy ✅ Analysis complete! Results saved to 'sneakers_with_clusters.csv'
================================================================================ ANALYSIS COMPLETE ================================================================================ 📊 CLUSTER SUMMARY: Cluster Size Avg_Sell_Through_Rate Avg_Unit_Price 1 129 0.83 83.29 0 138 0.56 88.22 3 87 0.55 83.10 2 146 0.32 88.01 💾 FILES SAVED: 1. sneaker_cluster_summary_20250618_072913.csv - Cluster summary statistics 2. sneakers_with_clusters_20250618_072913.csv - Full dataset with cluster assignments 🎯 NEXT STEPS: 1. Review the cluster profiles in the saved CSV files 2. Focus on clusters with high sell-through rates 3. Consider price points and days to sell when making purchasing decisions 4. Monitor the performance of top products over time ✅ Analysis complete! You can now explore the saved CSV files for detailed insights.