At Abacus.AI, we believe that the next frontier for language models is in AI-assisted data science. We are developing generative models that can interpret and then execute commands given in natural language (such as, “Train a 10-layer MLP and plot the validation accuracy over time”). To accomplish this, we have curated the first ever “data science” dataset for fine-tuning a language model, consisting of over 1 billion lines of python code. By fine-tuning the model on our own Abacus.AI API’s, the language model allows anyone to easily use Abacus’s platform to perform exploratory data analysis, visualize data, and train powerful machine learning models. Visit our homepage to request a demo.
With this intuition in mind, Abacus.AI is developing a radically new approach to time-series forecasting. Inspired by recent innovations in meta-learning and Bayesian inference on tabular data, we are designing the world’s first foundation model for time-series forecasting. To achieve this, we pretrain a transformer on a mix of real-world and synthetic datasets, across a variety of different forecasting tasks, resulting in a state-of-the-art forecasting model that can run inference on a new dataset in less than a second.
While organizations today might have large amounts of data, their datasets tend to be noisy, incomplete and imbalanced. This results in data scientists and engineers spending most of their precious time pre-processing, cleaning, and featurizing the data. These efforts are often insufficient, and deep learning techniques routinely fail on sparse datasets. Organizations are then forced to use classical machine learning techniques that require enormous amounts of manual feature engineering. At Abacus.AI, we are actively pursuing the following research areas that will enable training on less data.