The Embodied Conversational Agents (ECAs) used in social training must be able to simulate all the different social situations that a learner has to train to. Depending on the application, the ECAs must then be able to express various emotions or various attitudes. Non-verbal signals, such as smiles or gestures, contribute to the expression of attitudes. However, recent findings have demonstrated that non-verbal signals are not interpreted in isolation but along with other signals : for instance, a smile followed by a gaze aversion and a head aversion does not signal amusement, but embarrassment. Non-verbal behavior planning models for ECAs should thus consider complete sequences of non-verbal signals and not only signals independently of one another. However, existing models do not take this into account, or in a limited manner. The main contribution of this thesis is a methodology for the automatic extraction of sequences of non-verbal signals characteristic of attitude variations from a multimodal corpus, and a non-verbal behavior planning model that takes into account sequences of non-verbal signals rather than signals independently. Another consideration in the design of social training systems is to check that users do improve their social skills while using such systems. We investigated the use of ECAs to build a virtual audience aimed at improving users’ public speaking skills. Another contribution of this thesis is the proposal of an architecture for interactive virtual audiences that provide real-time feedback to the learner according to his public speaking performance, and to have evaluated three different feedback strategies.