--- Original DataFrame Info ---
Original Data Types:
 Sell_Through_Rate     object
Damage_Rate           object
Snapshot_Date         object
Manufacturing_Date    object
Selling_Date          object
Product_Type          object
Edition_Type          object
Target_Gender         object
Price_Bucket          object
Sneaker_Grade         object
Price                  int64
dtype: object

First few rows of original DataFrame:
   Sell_Through_Rate Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade  Price
0               85%          5%    2023-01-01         2022-10-01   2023-01-10      Running      Limited          Male         High             A    150
1               92%          2%    2023-01-02         2022-10-15   2023-01-20   Basketball     Standard        Female       Medium             B    120
2               78%         10%    2023-01-03         2022-11-01   2023-01-15       Casual      Limited        Unisex          Low             C     80
3               65%          1%    2023-01-04         2022-09-01   2023-01-25      Running     Standard          Male       Medium             A    110
4               99%          0%    2023-01-05         2022-12-01   2023-01-08   Basketball      Limited        Female         High             B    200

--- Starting Data Cleaning Process ---

--- Cleaning Percentage Columns ---
Converted 'Sell_Through_Rate' to decimal (e.g., 0.85)
Converted 'Damage_Rate' to decimal (e.g., 0.05)

--- Parsing Date Columns ---
Parsed 'Snapshot_Date' to datetime format.
Parsed 'Manufacturing_Date' to datetime format.
Parsed 'Selling_Date' to datetime format.

--- Missing Values Analysis ---
No missing values found in the DataFrame.

--- Data Types After Conversions ---
Sell_Through_Rate            float64
Damage_Rate                  float64
Snapshot_Date         datetime64[ns]
Manufacturing_Date    datetime64[ns]
Selling_Date          datetime64[ns]
Product_Type                  object
Edition_Type                  object
Target_Gender                 object
Price_Bucket                  object
Sneaker_Grade                 object
Price                          int64
dtype: object

--- Basic Statistics for Numerical Columns ---
                   count   mean   std   min    25%    50%    75%    max
Sell_Through_Rate   5.00   0.84  0.13  0.65   0.78   0.85   0.92   0.99
Damage_Rate         5.00   0.04  0.04  0.00   0.01   0.02   0.05   0.10
Price               5.00 132.00 45.50 80.00 110.00 120.00 150.00 200.00

--- Unique Values in Categorical Columns ---

'Product_Type':
Running       2
Basketball    2
Casual        1
Name: Product_Type, dtype: int64

'Edition_Type':
Limited     3
Standard    2
Name: Edition_Type, dtype: int64

'Target_Gender':
Male      2
Female    2
Unisex    1
Name: Target_Gender, dtype: int64

'Price_Bucket':
High      2
Medium    2
Low       1
Name: Price_Bucket, dtype: int64

'Sneaker_Grade':
A    2
B    2
C    1
Name: Sneaker_Grade, dtype: int64

--- Data cleaning complete. Ready for feature engineering! ---

First few rows of cleaned DataFrame:
    Sell_Through_Rate  Damage_Rate Snapshot_Date Manufacturing_Date Selling_Date Product_Type Edition_Type Target_Gender Price_Bucket Sneaker_Grade  Price
0               0.85         0.05    2023-01-01         2022-10-01   2023-01-10      Running      Limited          Male         High             A    150
1               0.92         0.02    2023-01-02         2022-10-15   2023-01-20   Basketball     Standard        Female       Medium             B    120
2               0.78         0.10    2023-01-03         2022-11-01   2023-01-15       Casual      Limited        Unisex          Low             C     80
3               0.65         0.01    2023-01-04         2022-09-01   2023-01-25      Running     Standard          Male       Medium             A    110
4               0.99         0.00    2023-01-05         2022-12-01   2023-01-08   Basketball      Limited        Female         High             B    200
--- Initial Data Inspection ---

Sneakers DataFrame (sneakers.csv):
Shape: (500, 22)

First 5 rows:
Product_Name Product_Type Total_Produced Total_Sold Damaged_Units Month Year Edition_Type Unit_Price Target_Gender Sell_Through_Rate Damage_Rate Unsold_Inventory Estimated_Revenue Quarter Snapshot_Date Price_Bucket Manufacturing_Date Selling_Date Performance_Score Sneaker_Grade Units_Received
0 Nike Air Force 1 '07 Sneakers Creamy 24592 5819 9012 September 2017 Limited 115 Men 23.66% 36.65% 9761 669185 Q3 9/1/2017 High 9/1/2017 11/6/2017 -12.98 E 24592
1 Adidas Originals Samba OG Shoes Creamy 81482 16395 38698 September 2019 Standard 100 Men 20.12% 47.49% 26389 1639500 Q3 9/1/2019 Mid 9/1/2019 9/30/2019 -27.37 E 81482
2 Air Jordan 1 Mid Shoes Peanut Butter 76237 8478 26062 January 2023 Special Release 110 Women 11.12% 34.19% 41697 932580 Q1 1/1/2023 High 1/1/2023 3/28/2023 -23.06 E 76237
3 Red Tape Casual Sneakers Brownie 46463 5425 10463 October 2022 Anniversary 35 Men 11.68% 22.52% 30575 189875 Q4 10/1/2022 Low 10/1/2022 12/15/2022 -10.84 E 46463
4 Nike Court Vision Low Shoes Peanut Butter 54118 11698 6078 April 2019 Standard 80 Women 21.62% 11.23% 36342 935840 Q2 4/1/2019 Mid 4/1/2019 5/6/2019 10.38 D 54118
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 22 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Product_Name        500 non-null    object 
 1   Product_Type        500 non-null    object 
 2   Total_Produced      500 non-null    int64  
 3   Total_Sold          500 non-null    int64  
 4   Damaged_Units       500 non-null    int64  
 5   Month               500 non-null    object 
 6   Year                500 non-null    int64  
 7   Edition_Type        500 non-null    object 
 8   Unit_Price          500 non-null    int64  
 9   Target_Gender       500 non-null    object 
 10  Sell_Through_Rate   500 non-null    object 
 11  Damage_Rate         500 non-null    object 
 12  Unsold_Inventory    500 non-null    int64  
 13  Estimated_Revenue   500 non-null    int64  
 14  Quarter             500 non-null    object 
 15  Snapshot_Date       500 non-null    object 
 16  Price_Bucket        500 non-null    object 
 17  Manufacturing_Date  500 non-null    object 
 18  Selling_Date        500 non-null    object 
 19  Performance_Score   500 non-null    float64
 20  Sneaker_Grade       500 non-null    object 
 21  Units_Received      500 non-null    int64  
dtypes: float64(1), int64(8), object(13)
memory usage: 86.1+ KB

================================================================================

Sneakers with Dates DataFrame (snicker_dataset_with_dates.csv):
Shape: (3, 5)

First 5 rows:
Product_ID Snapshot_Date Manufacturing_Date Selling_Date Stock_Quantity
0 1 2023-01-01 2022-10-01 2023-01-10 100
1 2 2023-01-02 2022-10-15 2023-01-20 50
2 3 2023-01-03 2022-11-01 2023-01-15 75
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Product_ID          3 non-null      int64 
 1   Snapshot_Date       3 non-null      object
 2   Manufacturing_Date  3 non-null      object
 3   Selling_Date        3 non-null      object
 4   Stock_Quantity      3 non-null      int64 
dtypes: int64(2), object(3)
memory usage: 252.0+ bytes
--- Column Name Mapping ---
Original column names in sneakers_df: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received']

Original column names in sneakers_dates_df: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity']

Common columns to use for merging: {'Manufacturing_Date', 'Snapshot_Date', 'Selling_Date'}

--- Data Cleaning Summary ---
Before cleaning:
------------------------------
sneakers_df shape: (500, 22)
sneakers_dates_df shape: (3, 5)

Missing values in sneakers_df:
missing_count missing_percentage
Missing values in sneakers_dates_df:
missing_count missing_percentage
Task 2.0 - Initial setup complete. Next steps will depend on the output above.
Shape of cleaned data: (500, 22)

First few rows of cleaned data:
Product_Name Product_Type Total_Produced Total_Sold Damaged_Units Month Year Edition_Type Unit_Price Target_Gender Sell_Through_Rate Damage_Rate Unsold_Inventory Estimated_Revenue Quarter Snapshot_Date Price_Bucket Manufacturing_Date Selling_Date Performance_Score Sneaker_Grade Units_Received
0 Nike Air Force 1 '07 Sneakers Creamy 24592 5819 9012 September 2017 Limited 115 Men 23.66% 36.65% 9761 669185 Q3 9/1/2017 High 9/1/2017 11/6/2017 -12.98 E 24592
1 Adidas Originals Samba OG Shoes Creamy 81482 16395 38698 September 2019 Standard 100 Men 20.12% 47.49% 26389 1639500 Q3 9/1/2019 Mid 9/1/2019 9/30/2019 -27.37 E 81482
Remaining missing values:
Series([], dtype: int64)
Date range in data:
Earliest Manufacturing Date: 2015-01-01 00:00:00
Latest Selling Date: 2025-02-04 00:00:00

Preview of Days_to_Sell calculation:
Average days to sell: 51.76
Days to sell distribution:
count   500.00
mean     51.76
std      22.03
min      15.00
25%      33.00
50%      50.50
75%      72.00
max      90.00
dtype: float64
Original shape before one-hot encoding: (500, 24)
Shape after one-hot encoding: (500, 33)

Features selected for clustering: ['Total_Produced', 'Total_Sold', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Estimated_Revenue', 'Days_to_Sell', 'Is_Limited_Edition']
Shape of scaled features: (500, 8)

First few rows of scaled features:
Total_Produced Total_Sold Unit_Price Sell_Through_Rate Damage_Rate Estimated_Revenue Days_to_Sell Is_Limited_Edition
0 -1.12 -1.09 1.26 -1.25 0.73 -0.93 0.65 2.13
1 1.06 -0.60 0.61 -1.39 1.29 -0.43 -1.03 -0.47
2 0.86 -0.97 1.04 -1.74 0.61 -0.79 1.56 -0.47
3 -0.28 -1.11 -2.22 -1.72 0.01 -1.17 1.06 -0.47
4 0.01 -0.82 -0.26 -1.33 -0.58 -0.79 -0.76 -0.47
['X_scaled_df',
 'cleaned_sneakers_data',
 'sneakers_clean',
 'sneakers_dates_clean',
 'sneakers_dates_df',
 'sneakers_df',
 'sneakers_encoded']
Columns in sneakers_encoded:
['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E']

First few rows:
Product_Name Total_Produced Total_Sold Damaged_Units Month Year Edition_Type Unit_Price Sell_Through_Rate Damage_Rate Unsold_Inventory Estimated_Revenue Quarter Snapshot_Date Manufacturing_Date Selling_Date Performance_Score Units_Received Days_to_Sell Is_Limited_Edition Product_Type_Brownie Product_Type_Classic Product_Type_Creamy Product_Type_Crisper Product_Type_Peanut Butter Product_Type_White Chocolate Target_Gender_Women Price_Bucket_Low Price_Bucket_Mid Sneaker_Grade_B Sneaker_Grade_C Sneaker_Grade_D Sneaker_Grade_E
0 Nike Air Force 1 '07 Sneakers 24592 5819 9012 September 2017 Limited 115 0.24 0.37 9761 669185 Q3 9/1/2017 2017-09-01 2017-11-06 -12.98 24592 66 1 0 0 1 0 0 0 0 0 0 0 0 0 1
1 Adidas Originals Samba OG Shoes 81482 16395 38698 September 2019 Standard 100 0.20 0.47 26389 1639500 Q3 9/1/2019 2019-09-01 2019-09-30 -27.37 81482 29 0 0 0 1 0 0 0 0 0 1 0 0 0 1
--- sneakers_encoded ---
Columns: ['Product_Name', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition', 'Product_Type_Brownie', 'Product_Type_Classic', 'Product_Type_Creamy', 'Product_Type_Crisper', 'Product_Type_Peanut Butter', 'Product_Type_White Chocolate', 'Target_Gender_Women', 'Price_Bucket_Low', 'Price_Bucket_Mid', 'Sneaker_Grade_B', 'Sneaker_Grade_C', 'Sneaker_Grade_D', 'Sneaker_Grade_E']
Shape: (500, 33)

--- sneakers_df ---
Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received']
Shape: (500, 22)

--- sneakers_clean ---
Columns: ['Product_Name', 'Product_Type', 'Total_Produced', 'Total_Sold', 'Damaged_Units', 'Month', 'Year', 'Edition_Type', 'Unit_Price', 'Target_Gender', 'Sell_Through_Rate', 'Damage_Rate', 'Unsold_Inventory', 'Estimated_Revenue', 'Quarter', 'Snapshot_Date', 'Price_Bucket', 'Manufacturing_Date', 'Selling_Date', 'Performance_Score', 'Sneaker_Grade', 'Units_Received', 'Days_to_Sell', 'Is_Limited_Edition']
Shape: (500, 24)

--- sneakers_dates_clean ---
Columns: ['Product_ID', 'Snapshot_Date', 'Manufacturing_Date', 'Selling_Date', 'Stock_Quantity']
Shape: (3, 5)
Selecting features for clustering...

Selected features:
   Sell_Through_Rate  Unit_Price  Days_to_Sell  Is_Limited_Edition  Damage_Rate  Total_Sold
0               0.24         115            66                   1         0.37        5819
1               0.20         100            29                   0         0.47       16395

Scaling features...

Success! Scaled features (first 2 rows):
Sell_Through_Rate Unit_Price Days_to_Sell Is_Limited_Edition Damage_Rate Total_Sold
0 -1.25 1.26 0.65 2.13 0.73 -1.09
1 -1.39 0.61 -1.03 -0.47 1.29 -0.60
Feature scaling complete! Ready for clustering.
Shape of scaled features: (500, 6)
Calculating inertia and silhouette scores for different K values...
K=2: Inertia = 2284.61, Silhouette Score = 0.2124
K=3: Inertia = 1904.79, Silhouette Score = 0.2398
K=4: Inertia = 1673.78, Silhouette Score = 0.2008
K=5: Inertia = 1486.38, Silhouette Score = 0.2106
K=6: Inertia = 1371.55, Silhouette Score = 0.2040
K=7: Inertia = 1257.42, Silhouette Score = 0.2140
K=8: Inertia = 1179.24, Silhouette Score = 0.2087
K=9: Inertia = 1106.97, Silhouette Score = 0.2153
K=10: Inertia = 1046.70, Silhouette Score = 0.2023
No description has been provided for this image
Analysis complete! Review the plots to determine the optimal K.
Look for the 'elbow' in the left plot and the highest silhouette score in the right plot.
Training final K-Means model with K=4...
Cluster Distribution:
0    138
1    129
2    146
3     87
Name: Cluster, dtype: int64

Analyzing cluster profiles...

Cluster Profiles (mean values):
Sell_Through_Rate Unit_Price Days_to_Sell Is_Limited_Edition Total_Sold Count
Cluster
0 0.56 88.22 32.93 0.00 23856.01 138
1 0.83 83.29 58.97 0.02 52741.26 129
2 0.32 88.01 62.67 0.00 15478.42 146
3 0.55 83.10 52.62 1.00 26306.30 87
Identifying 'Resell Goldmine' clusters...

Potential 'Resell Goldmine' clusters: []

No clusters met the 'Resell Goldmine' criteria. Consider adjusting the criteria.
No clusters meet both high sell-through and limited edition criteria.
Finding best cluster based on highest sell-through rate...
Best cluster based on sell-through rate: 1
================================================================================
                         RESELL GOLDMINE ANALYSIS
================================================================================

🎯 Best Performing Cluster: 1
📊 Cluster Size: 129 sneakers
🔥 Average Sell-Through Rate: 83.0%
💰 Average Price: $83.29
⏱️  Average Days to Sell: 59.0 days
🏆 Limited Edition Ratio: 2%

========================================
TOP 10 SNEAKERS IN BEST CLUSTER
========================================

1. Lacoste Carnaby Evo
   🔸 Price: $95
   🔸 Sell-Through: 99.8%
   🔸 Days to Sell: 68 days
   🔸 Limited Edition: No

2. Puma Smash V2 Sneakers
   🔸 Price: $65
   🔸 Sell-Through: 99.7%
   🔸 Days to Sell: 33 days
   🔸 Limited Edition: No

3. Air Jordan 1 Mid Shoes
   🔸 Price: $110
   🔸 Sell-Through: 99.7%
   🔸 Days to Sell: 89 days
   🔸 Limited Edition: Yes

4. Columbia Trailstorm Waterproof
   🔸 Price: $110
   🔸 Sell-Through: 99.6%
   🔸 Days to Sell: 81 days
   🔸 Limited Edition: No

5. Saucony Jazz Original
   🔸 Price: $80
   🔸 Sell-Through: 99.5%
   🔸 Days to Sell: 60 days
   🔸 Limited Edition: No

6. Red Tape Casual Sneakers
   🔸 Price: $35
   🔸 Sell-Through: 99.2%
   🔸 Days to Sell: 34 days
   🔸 Limited Edition: No

7. Air Jordan 1 Mid Shoes
   🔸 Price: $110
   🔸 Sell-Through: 99.1%
   🔸 Days to Sell: 59 days
   🔸 Limited Edition: Yes

8. Skechers D'Lites
   🔸 Price: $75
   🔸 Sell-Through: 99.1%
   🔸 Days to Sell: 73 days
   🔸 Limited Edition: No

9. Skechers D'Lites
   🔸 Price: $75
   🔸 Sell-Through: 98.8%
   🔸 Days to Sell: 16 days
   🔸 Limited Edition: No

10. Red Tape Casual Sneakers
   🔸 Price: $35
   🔸 Sell-Through: 98.6%
   🔸 Days to Sell: 56 days
   🔸 Limited Edition: No

========================================
RECOMMENDATIONS
========================================
1. Focus on acquiring sneakers from Cluster 1
2. Consider both limited and standard editions based on performance
3. Monitor the top performers for restocks or resale opportunities
4. Track how these products perform over time to refine your strategy

✅ Analysis complete! Results saved to 'sneakers_with_clusters.csv'
================================================================================
                              ANALYSIS COMPLETE
================================================================================

📊 CLUSTER SUMMARY:
 Cluster  Size  Avg_Sell_Through_Rate  Avg_Unit_Price
       1   129                   0.83           83.29
       0   138                   0.56           88.22
       3    87                   0.55           83.10
       2   146                   0.32           88.01

💾 FILES SAVED:
1. sneaker_cluster_summary_20250618_072913.csv - Cluster summary statistics
2. sneakers_with_clusters_20250618_072913.csv - Full dataset with cluster assignments

🎯 NEXT STEPS:
1. Review the cluster profiles in the saved CSV files
2. Focus on clusters with high sell-through rates
3. Consider price points and days to sell when making purchasing decisions
4. Monitor the performance of top products over time

✅ Analysis complete! You can now explore the saved CSV files for detailed insights.