A statistical language model is a probability distribution over sequences of words.

$P(w_1, w_2, \ldots w_m) = \prod^{m}_{i = 1}{P(w_i|w_1, w_2, \ldots w_{i-1})}$

Neural networks are universal function approximators that can approximate any functions to arbitrary precisions

In the mathematical theory of artificial neural networks, universal approximation theorems are results that establish the density of an algorithmically generated class of functions within a given function space of interest. Typically, these results concern the approximation capabilities of the feedforward architecture on the space of continuous functions between two Euclidean spaces, and the approximation is with respect to the compact convergence topology.