Data driven models are known to be a valid alternative to first principle approaches for modeling. However, in the case of complex and largely unknown systems such as the chemical reactions leading to engine emissions, experience shows that results from data driven models suffer from a significant dependence on the actual data set used for identification and are prone to an excessive complexity. This paper shows how the use of an incremental design of experiments based on polynomial models can be used to determine the appropriate complexity of the data set as well as a suitable measurement profile which yields an adequate excitation for the model parameter estimation. As this paper shows experimentally, this result is not specific to the particular identification approach used, but the same data set can be used e.g. by genetic programming (GP) algorithms which extract also the model structure from data. Results are shown using emission measurements on a modern turbocharged Diesel engine on an emission test bench.