There is a broad consensus that the representations used by a cognitive agent must be grounded in external reality through a sensori-motor apparatus. These representations must also be sufficiently similar to those used by other agents in the group to enable coordinated action and communication, and they must be acquired autonomously by each agent. This paper first tries to clarify the terminology and issues involved. Then it argues that language plays a crucial role in the learning of grounded representations because it is a source of feedback and constrains the degrees of freedom of the representations used in the group. The idea of a language game is introduced as framework for concretising the structural coupling between concept formation and symbol acquisition and some experiments are briefly discussed.