Handling, processing and delivering data from millions of devices around the world is a complex and remarkable feat that hinges on edge computing systems. While edge computing brings computation and data storage closer, fog computing is what brings analytic services to the edge of the network. It’s an alternative to cloud computing. The EU-funded MARVEL project will develop an Edge-to-Fog-to-Cloud ubiquitous computing framework to enable multimodal perception and intelligence for audio-visual scene recognition, event detection and situational awareness in a Smart City Environment. It will collect, analyse and data-mine multimodal audio-visual streaming data to improve the quality of life and services to citizens within the smart city paradigm, without violating ethical and privacy limits, in an AI-responsible manner. This is achieved via: (i) fusing large scale distributed multi-modal audio-visual data in real-time; (ii) achieving fast time-to-insights; (iii) supporting automated decision making at all levels of the E2F2C stack; and iv) delivering a personalized federated learning approach, where joint multi modal representations and models are co-designed and improved continuously through privacy aware sharing of personalized fog and edge models of all interested parties.