HomeE-commerceLinear Regression — Predicting Ranking Of Laptop computer Manufacturers | by Anaswar...

Linear Regression — Predicting Ranking Of Laptop computer Manufacturers | by Anaswar Jayakumar | Mar, 2024


EDA was the subsequent step of this undertaking, the aim being to get a greater understanding of the information at giant. EDA is comprised of three such parts: descriptive statistics, histograms, and correlation evaluation. For the needs of this text, I’ll focus the EDA extra on the histograms and the correlation evaluation since each have been instrumental within the subsequent regression evaluation portion of this undertaking.

Histograms have been generated to raised perceive the underlying distribution of the unbiased variables whereas correlation evaluation was instrumental in figuring out the predictor variables that can finally be used to foretell the ranking assigned to a given laptop computer primarily based on its specs. Specifically, the EDA centered on the next elements of the laptop computer manufacturers dataset:

  • Laptop computer {Hardware} (Processor, GPU)
  • Laptop computer Reminiscence, Storage
  • Laptop computer Options

Histograms — Laptop computer {Hardware}

Processor

Distributions

  • Processor Model: The histogram for processor model exhibits that almost all laptops have processors from one or two manufacturers, with a smaller quantity from much less frequent manufacturers. This implies that the Indian laptop computer market is dominated by a couple of main processor manufacturers.
  • Processor Tier: The histogram for processor tier can also be positively skewed, with extra laptops within the decrease tiers and and and fewer within the increased tiers. This may very well be as a result of lower-tier processors are extra reasonably priced, or as a result of many customers in India don’t want the excessive efficiency of top-tier processors.
  • Variety of Cores: The histogram for variety of cores is positively skewed as effectively, with most laptops having 4 or 6 cores and and fewer laptops with extra cores. That is doubtless as a result of extra cores are usually dearer, and plenty of customers don’t want the additional processing energy that extra cores present.
  • Variety of Threads: The histogram for variety of threads is just like the variety of cores, with a optimistic skew and most laptops having 8 or 12 threads. Once more, that is doubtless as a result of extra threads are usually dearer, and plenty of customers don’t want the additional processing energy that extra threads present.

Extra Insights

  • As a result of the information is skewed, the imply is probably not essentially the most consultant measure of central tendency. Think about using the median as an alternative. The median splits the information in half, so it’s much less delicate to outliers than the imply. For instance, if there have been a couple of laptops with a really excessive variety of cores, the imply variety of cores may very well be considerably increased than the variety of cores that almost all laptops have.
  • The usual deviation is delicate to outliers, so the interquartile vary (IQR) could also be a greater measure of unfold. The IQR is the vary between the seventy fifth percentile and the twenty fifth percentile of the information. It’s much less delicate to outliers than the usual deviation.

Necessary Issues

  • The information is probably not consultant of all laptops offered in India. For instance, the information could solely embrace laptops from a selected retailer or worth vary. If the information is from a particular retailer, it might be skewed in direction of laptops which are well-liked with that retailer’s clients. If the information is from a selected worth vary, it might be skewed in direction of laptops which are in that worth vary.
  • The histograms solely present the distribution of some options. Different options, reminiscent of display measurement, RAM, and storage capability, may be essential to contemplate. For instance, in case you are eager about gaming laptops, you’ll additionally need to take into account the graphics card.

GPU

Optimistic Skew in GPU Kind

  • Within the histogram for GPU sort, most laptops appear to have a GPU sort worth nearer to 1, with a smaller variety of laptops having the next GPU sort worth. This implies that almost all laptops offered within the Indian market have lower-end graphics playing cards, with a smaller quantity having higher-end graphics playing cards.

Multimodal Distribution in GPU Model

  • The histogram for GPU model seems to have a number of peaks, which is indicative of a multimodal distribution. This implies that there could also be a couple of dominant GPU manufacturers within the Indian laptop computer market, with a smaller variety of laptops from different manufacturers.

Extra Insights

  • It’s essential to contemplate that the imply (common) is probably not essentially the most consultant measure of central tendency for skewed distributions just like the one for GPU sort. The median is likely to be a greater indicator of what a typical worth for GPU sort is.
  • The usual deviation is a measure of unfold, however it may be delicate to outliers. As a result of the distribution of GPU sort is skewed, the interquartile vary (IQR) is likely to be a greater strategy to gauge how unfold out the information is.

Necessary Issues

  • The information used to create these histograms is probably not consultant of all laptops offered in India. For example, the information may very well be from a selected retailer or give attention to a particular worth vary.
  • The histograms solely present the distribution of two options (GPU sort and model). Different elements, reminiscent of the quantity of GPU reminiscence, may also be essential to contemplate, particularly when gaming laptops.

Histograms — Laptop computer Reminiscence, Storage

Optimistic Skew

  • All three histograms (RAM reminiscence, major storage sort, and first storage capability) look like positively skewed. Which means a bigger portion of laptops have decrease values on these metrics, with a smaller tail extending in direction of increased values.
  • Within the case of RAM reminiscence, as an example, this means that almost all laptops offered in India have decrease RAM capacities, with a smaller quantity having the next RAM capability.

Key Insights

  • RAM Reminiscence: The common RAM capability (13.05 GB) is likely to be increased than what most laptops usually have as a result of the optimistic skew pulls the imply in direction of the upper values. The median can be a extra consultant measure of central tendency on this case.
  • Main Storage Kind: The histogram for major storage sort exhibits a single peak at 1, doubtless indicating that almost all laptops use HDD (Onerous Disk Drives) as the first storage sort. SSDs (Stable State Drives) have gotten more and more well-liked, however the knowledge suggests they don’t seem to be as frequent but within the Indian laptop computer market.
  • Main Storage Capability: Just like RAM reminiscence, the common major storage capability (round 606 GB) is likely to be increased than what most laptops usually have as a result of optimistic skew. The median would supply a greater thought of the everyday storage capability.

Necessary Issues

  • The information won’t be consultant of your complete Indian laptop computer market. It may very well be skewed in direction of laptops offered by a particular retailer or inside a selected worth vary.
  • These histograms solely signify RAM reminiscence, major storage sort, and first storage capability. Different elements like display measurement, processor model, and graphics card are additionally essential to contemplate when selecting a laptop computer.

Extra Notes

  • The usual deviations for RAM reminiscence (5.61) and first storage capability (264.23) are each increased than 1, which helps the conclusion that the information is unfold out greater than a traditional distribution. The usual deviation for major storage sort (0.11) could be very low, which reinforces the concept that HDD is the dominant storage sort.

Histograms — Laptop computer Options

Is Contact Display, Show Dimension, Decision Width and Top

Optimistic Skew in Touchscreen, Decision Width, and Decision Top

  • The histograms for touchscreen, show decision (width), and show decision (top) all look like positively skewed. Which means a bigger portion of laptops lack a touchscreen, have decrease resolutions, with a smaller quantity having a touchscreen, increased show resolutions.
  • Within the case of show decision (width) as an example, this means that almost all laptops offered in India have decrease show resolutions, with a smaller quantity having increased resolutions.

Detrimental Skew in Show Dimension

  • The histogram for show measurement seems to be negatively skewed. Which means a bigger portion of laptops offered in India have smaller shows, with a smaller quantity having bigger shows.

Key Insights

  • Touchscreen: The common worth (1.09) for the touchscreen variable is near 1, which implies barely greater than half of the laptops have a touchscreen.
  • Show Dimension: The common show measurement (15.17 inches) is likely to be increased than what most laptops usually have as a result of the adverse skew pulls the imply in direction of the decrease values. The median can be a extra consultant measure of central tendency for show measurement on this case.
  • Show Decision: Just like show measurement, the common show decision (width and top round 2000 and 1183 pixels respectively) is likely to be increased than what most laptops usually have as a result of optimistic skew. The median would supply a greater thought of the everyday show decision.

Necessary Issues

  • The information won’t be consultant of your complete Indian laptop computer market. It may very well be skewed in direction of laptops offered by a particular retailer or inside a selected worth vary.
  • These histograms solely signify touchscreen, show measurement, decision (width and top). Different elements like RAM reminiscence, storage capability, processor model, and graphics card are additionally essential to contemplate when selecting a laptop computer.

Extra Notes

  • The usual deviations for show decision (width: 363.65 and top: 265.39) are each increased than 1, which helps the conclusion that the information is unfold out greater than a traditional distribution for these options.
  • It’s attention-grabbing to notice that the histograms for show decision (width and top) appear very comparable, suggesting that the laptops are likely to have proportional resolutions (e.g., 16:9 facet ratio).

Working System, Yr of Guarantee, Value

Optimistic Skew

  • All three histograms (OS, guarantee yr, and worth) look like positively skewed. Which means a bigger portion of laptops are likely to fall into the classes with decrease values, with a smaller tail extending in direction of increased values.
  • Within the case of worth, as an example, this means that almost all laptops offered in India are priced decrease, with a smaller quantity priced increased.

Key Insights

  • Working System: The common worth (1.16) for the OS variable is near 1, which doubtless signifies that Home windows is the dominant working system, as 1 doubtless represents Home windows on this dataset.
  • Guarantee Yr: The common guarantee interval (round 1.08 years) is near 1 yr, which is likely to be the usual guarantee supplied by most producers.
  • Value: The common worth (round ₹77,385) is likely to be increased than the worth of most laptops offered in India as a result of the optimistic skew pulls the imply in direction of the upper values. The median can be a extra consultant measure of central tendency for worth on this case.

Necessary Issues

  • The information won’t be consultant of your complete Indian laptop computer market. It may very well be skewed in direction of laptops offered by a particular retailer or inside a selected worth vary.
  • These histograms solely signify OS, guarantee yr, and worth. Different elements like RAM reminiscence, storage capability, processor model, and graphics card are additionally essential to contemplate when selecting a laptop computer.

Extra Notes

  • The usual deviations for guarantee yr (0.31) and worth (57775.31) are each increased than 1, which helps the conclusion that the information is unfold out greater than a traditional distribution for these options.
  • The usual deviation for OS (0.69) is lower than 1, however that is doubtless as a result of the OS variable doubtless represents a categorical variable (e.g., 1=Home windows, 2=Mac) and so the usual deviation isn’t a significant measure of unfold on this case.

Correlation matrices have been generated to raised perceive the underlying relationship between the unbiased variables and the dependent variable Ranking which represents a ranking assigned to a given laptop computer primarily based on its specs. The correlation matrices may even be essential in figuring out which variables of curiosity greatest predict a given laptop computer’s ranking. In different phrases, the correlation matrices might be used to find out which variables of curiosity will find yourself being the unbiased variables within the regression mannequin.

Its additionally value noting that variables that both have a correlation better than 0.3 or lower than -0.3 are appropriate variables for predicting ta given laptop computer’s ranking since a correlation of 0.3 signifies a reasonable optimistic relationship whereas a correlation of -0.3 signifies a reasonable adverse relationship. Whereas utilizing the correlation values of the unbiased variables is definitely not a tough and quick rule for selecting the unbiased variables that greatest predict a given laptop computer’s ranking, correlation values definitely function a suggestion for selecting appropriate and applicable predictor variables for predicting a given laptop computer’s ranking.

Laptop computer {Hardware} (Processor, GPU)

For laptop computer {hardware} reminiscent of processor and GPU, correlation values between the dependent variable Ranking and the next unbiased variables have been decided: processorbrand, processortier, numcores, numthreads, gpubrand, and gputype.

Based mostly on the correlation values, the variables numcores, gpubrand, and gputype all have a robust optimistic relationship with the dependent variable whereas the variable numthreads has a really robust optimistic relationship with the dependent variable. However, the variables processorbrand and processortier each have a negligible relationship with the dependent variable Ranking.

Subsequently, the variables numcores, gpubrand, numthreads and gputype are good predictor variables of the dependent variable Ranking whereas the variables processorbrand and processortier aren’t good predictor variables of the dependent variable

Laptop computer Reminiscence, Storage

For laptop computer reminiscence and storage, correlation values between the dependent variable Ranking and the next unbiased variables was decided: rammemory, primarystoragetype, and primarystoragecapacity.

Based mostly on the correlation values, the variables rammemory and primarystoragecapacity each have a robust optimistic relationship with the dependent variable whereas the variable primarystoragetype has a weak adverse relationship with the dependent variable. Subsequently, the variables rammemory and primarystoragecapacity are good predictor variables of the dependent variable Ranking whereas the variable primarystoragetype shouldn’t be a superb predictor variable of the dependent variable

Laptop computer Options

For laptop computer options, correlation values between the dependent variable Ranking and the next unbiased variables was decided: istouchscreen, displaysize, resolutionwidth, resolutionheight, OS, yearofwarranty, and Value. Based mostly on the correlation values, the next conclusions might be drawn:

  • The variable displaysize has a reasonable optimistic relationship with the dependent variable Ranking
  • The variables resolutionwidth, resolutionheight, and Value all have a robust optimistic relationship with the dependent variable Ranking whereas the variable OS has a robust adverse relationship with the dependent variable
  • The variables istouchscreen and yearofwarranty each have a negligible relationship with the dependent variable Ranking

Subsequently, the variable displaysize, resolutionwidth, resolutionheight, OS, and Value are good predictor variables of the dependent variable Ranking whereas the variables istouchscreen and yearofwarranty aren’t good predictor variables of the dependent variable



Supply hyperlink

latest articles

explore more