Analyzing and visualizing multidimensional knowledge is essential for making knowledgeable selections. Nonetheless, coping with high-dimensional monetary knowledge might be difficult. On this tutorial, we are going to discover two highly effective dimensionality discount strategies, t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), to visualise advanced monetary knowledge in a lower-dimensional area. We are going to use actual monetary knowledge obtained from Yahoo Finance and exhibit the best way to apply these strategies to achieve insights into market developments and patterns.
Dimensionality discount strategies like t-SNE and UMAP are important for visualizing high-dimensional knowledge in a lower-dimensional area whereas preserving the underlying construction and relationships. These strategies are extensively utilized in varied domains, together with finance, to achieve a deeper understanding of advanced datasets.
Let’s start by importing the required libraries and downloading actual monetary knowledge utilizing the yfinance
library.
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.manifold import TSNE
import umap
Now, let’s obtain the monetary knowledge for a various set of securities from Yahoo Finance. We are going to retrieve the historic inventory costs for Tesla (TSLA), Bitcoin (BTC-USD) and the S&P 500 index (GSPC) till the top of January 2024.
# Downloading monetary knowledge
tickers = ['TSLA', 'BTC-USD', '^GSPC']
start_date = '2020-01-01'
end_date = '2024-01-31'knowledge = yf.obtain(tickers, begin=start_date, finish=end_date)['Adj Close']
We now have now obtained the historic adjusted closing costs for the chosen securities. Subsequent, we are going to preprocess the info and apply t-SNE and UMAP for dimensionality discount and visualization.
Preprocessing the Knowledge
Earlier than making use of dimensionality discount strategies, it’s important to preprocess the info. We are going to normalize the info and deal with any lacking values to make sure that the enter is appropriate for t-SNE and UMAP.
# Knowledge preprocessing
knowledge = knowledge.dropna() # Drop any lacking values…