RicercaMente

THE HISTORY

We are reconstructing the history of AI through the most important scientific papers published over the years.

Below you can see all the papers that our contributors have put together!

Want to help us? Add the papers that are missing

Year	Name	Authors	Topic	Description	Impact	Media	Link
1928	On the Theory of Games of Strategy	John von Neumann	MATH	Introduced the concept of extensive-form games and the minimax theorem, establishing foundational principles for game theory.	Pioneering work that laid the groundwork for game theory, influencing diverse fields from economics to AI decision-making algorithms.	Contributions to the Theory of Games	Link to Paper
1948	A Mathematical Theory of Communication	Claude Shannon	PROGRAMMING	Proposed the fundamental concepts of information theory, including entropy, channel capacity, and the source coding theorem. Revolutionized the understanding of communication and laid the groundwork for data compression and error correction.	Pioneering work that significantly influenced information theory, data science, and communication systems.	The Bell System Technical Journal	Link to Paper
1950	Computing Machinery and Intelligence	Alan Turing	AI	Introduced the concept of the Turing Test, a benchmark for determining a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human.	Pioneering work that laid the foundation for discussions on machine intelligence and artificial general intelligence (AGI).	Mind	Link to Paper
1956	The logic theory machine. A complex information processing system.	Allen Newell and Herbert Simon	MATH	This system is capable of discovering proofs for theorems in symbolic logic.	Unlike usual algorithms, it relies heavily on heuristic methods similar to those observed in human problem-solving tasks.	IRE Transactions on Information Theory	Link to Paper
1984	Classification and Regression Trees	Leo Breiman	AI	Introduced the fundamental concepts of decision trees, a method widely used in data science	Fundamental contribution that has shaped the approach to data analysis through decision trees.	Journal of the American Statistical Association	Link to paper
1989	A tutorial on hidden Markov models and selected applications in speech recognition	Lawrence Rabiner	MATH	Introduction to the basic theory of hidden Markov models and methods of implementation of it to distinct problems in speech recognition.	Crucial breakthrough in the application of Markov methods, more than 14000 citations in papers.	Proceedings of the IEEE	Link to Paper
1989	Multilayer feedforward networks are universal approximators	Kurt Hornik, Maxwell Stinchcombe, Halbert White	AI	This study demonstrates that ordinary artificial neural networks with just a single intermediate layer and any chosen activation functions can approximate any measurable function from one n-dimensional space to another to any desired level of precision.	It is crucial that multilayer feedforward networks are a class of universal approximators.	Neural Networks	Link to Paper
1995	Support-vector networks	Corinna Cortes, Vladimir Vapnik	AI	The support-vector network is a new learning machine for two-group classification problems.	This was previously implemented for the restricted case where the training data can be separated without errors, here is to non-separable training data.	Machine Learning	Link to Paper
1997	Long Short-Term Memory	Sepp Hochreiter and Jürgen Schmidhuber	AI	Introduces Long Short-Term Memory (LSTM), a type of Recurrent Neural Network (RNN) architecture designed to overcome the vanishing gradient problem in traditional RNN.	LSTM address the challenge of learning dependencies over extended time periods, leading to more effective and efficient training on sequential tasks.	Neural Computation	Link to paper
1998	Gradient-based learning applied to document recognition (CNN/GTN)	Yann Lecun, Léon Bottou, Yoshua Bengio and Patrick Haffner	AI	This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task.	It is deployed commercially and reads several million cheques per day.	IEEE	Link to Paper
2009	ImageNet Large Scale Visual Recognition Challenge	Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei	Visual Recognition, Challenge, Datasets	Provides a comprehensive overview of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in the field of object category classification and detection using large-scale datasets. The challenge has been conducted annually since 2010 and has garnered participation from over fifty institutions.	Advancements in Object Recognition, Large scale standardization for computer vision research datasets, Development of Deep Learning (Mostly CNNs)	Springer	Link to Paper
2012	ImageNet Classification with Deep Convolutional Neural Networks	Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton	AI	Proposed the use of a Deep CNN architecture (also known as AlexNet) for image classification. Introduced important techniques such as the use of ReLU activation function, data augmentation and dropouts to prevent overfitting and training on multiple GPUs. The architecture proposed achieved a top-5 error rate of 16,4% on the ImageNet dataset and won the ILSVRC-2012 competition.	The introduction of deeper CNN led to an important turning point in computer vision tasks. AlexNet outperformed existing models at the time and established new standards in the evolution of CNN architectures.	Advances in neural information processing systems	Link to Paper
2013	Efficient Estimation of Word Representations in Vector Space	Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean	Natural Language Processing	Proposed two novel model architectures (CBOW and Skip-gram) for computing dense vector representations of words (word embeddings) from a large corpus based on the surrounding words. These vectors provide state-of-the-art performance for measuring syntactic and semantic word similarities.	The introduction of Word2Vec proved the effectiveness of machine learning in NLP where pre-trained word embeddings are used as input features for downstream tasks like sentiment analysis and semantic similarity.	ArXiv	Link to Paper
2014	Very Deep Convolutional Networks For Large-Scale Image Recognition	Karen Simonyan, Andrew Zisserman	AI	This paper analysed the impact of increased depth of CNN on image classification. The VGGNet improved the state-of-art performance using architectures with up to 19 layers and 3x3 convolutional kernel filters, which are smaller than the filters used in previous models.	This paper confirmed the importance of depth in visual representation and inspired the design of the following models.	International Conference on Learning Representations (ICLR)	Link to Paper
2014	Neural Machine Translation by Jointly Learning to Align and Translate	Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio	AI	The paper investigate neural machine translation. The authotrs conjectured that the use of a fixed-length vector was a bottleneck in improving the performance of the classic RNN-based encoder–decoder architecture, and proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly	Proposed for the first time the attention mechanism that will pave the way to many other future architecture.	International Conference on Learning Representations (ICLR)	Link to Paper
2015	Deep Residual Learning for Image Recognition	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun	AI, Deep Learning, Computer Vision	This paper introduced a new architecture for deep neural networks (ResNet). The key concepts were Residual learning, which addressed the degradation problem in deep networks, and the use of skip connections. These innovations aid in optimizing the layers and improving their ability to find the most appropriate mappings between data.	This research have enhanced the potential of deep neural networks, allowing the training of architecture with many more layers (up to 152 layers). This has led to incredible performance and much higher accuracy in tasks such as image recognition (3.57% error on the ImageNet test set)	ArXiv	Link to paper
2016	Mastering the Game of Go with Deep Neural Networks and Tree Search	David Silver et al.	AI	Presented AlphaGo, a model that defeated the European Go champion by 5 games to 0. It uses Monte Carlo Tree Search (MCTS) to compute its next move, running simulations of possible outcomes.	Demonstrated that AI can tackle complex challenges and achieve excellence in strategic games	Nature	Link to Paper
2017	Attention is All You Need	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin	AI	Revealed the transformer, a new neural network that is a significant milestone in modern Deep Learning models. Shaped the way we think about and approach NLP problems.	Has had a profound impact on NLP research and applications	NIPS	Link to Paper
2021	An Image is worth 16x16 words: Transformers for image recognition at scale	Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby	AI, Vision Transformers, Visual Recognition	This paper introduces ViT, a vision transformer that applies the Transformer architecture directly to sequences of image patches for image classification tasks. ViT achieves comparable or even superior performance to state-of-the-art convolutional networks while requiring less computational resources for training. This suggests that Transformers have the potential to revolutionize computer vision in the same way that they have revolutionized natural language processing.	Currently ViT’s are used from image classification tasks to text-to-image generators (DAll-E 2, Stable Diffusion)	ICLR	Link to Paper
2022	Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains	Uğur Kekevi, Ahmet Arif Aydin Gomez	DATA	The purpose of this paper is to provide researchers of real-time analysis and developers of data-intensive systems with a comparative perspective on real-time data processing by highlighting the key characteristics of real-time data processing technologies, NoSQL storage technologies, their application domains, and selected examples from previous studies.	Has had a profound impact on Data Science research and applications	Computer Science	Link to Paper
2023	Muse: Text-To-Image Generation via Masked Generative Transformers	Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan	COMPUTER VISION	It’s a text-to-image transformer model trained on a masked modeling task in discrete token space	It’s way more efficient than diffusion or autoregressive models	Proceedings of Machine Learning Research	Link to Paper

This site is open source. Improve this page.