F. Murtagh and P. Contreras
Constant Time Search and Retrieval in Big Data, with Linear Time and Space Preprocessing, through Randomly Projected Piling and Sparse Ultrametric Coding
Having reviewed the use of Baire or longest common prefix metric for directly inducing a hierarchical encoding, therefore an ultrametric embedding, of our data, we consider the following. Since our approach is based on random projection, we compare our approach with the now conventional approach, using random projection for dimensionality reduction. We also consider the relationship with Power Iteration Clustering. Further development of this approach has included a new visualization of the ultrametric mapping of data, and the following objective: to approximate an m-adic mapping of the data by a $p$-adic mapping, with $p < m$. A primary aim in this work is to achieve the effective scaling of a multidimensional data cloud, in order to induce a hierarchy on it. Alternatively expressed, we map our metric-endowed data into an ultrametric topology. We finally review how very high dimensional data piling or concentration can be used, in this context, as a basis for analytics that employ Correspondence Analysis. Practical and operational deployment is a primary goal, and areas of application are also discussed.