📢 | Brysen verifies that annotated driving datasets are useful for transfer learning of AI models


Brysen verifies that annotated driving datasets are useful for transfer learning of AI models

Wikipedia related words

If there is no explanation, there is no corresponding item on Wikipedia.

data set

data set(English: data set, dataset) isデータA set of.For tabular data, one or more columns in the table that represent a particular variable and each row corresponds to a particular record.Database tableIs called a data set.Each datasetobjectThe values ​​of variables such as height and weight of are arranged.A dataset can also consist of a collection of documents and files[1].

Open dataIn the field of data sets, datasets are also a unit for measuring the amount of information published in public open data repositories.For example(English edition)Aggregates over 50 datasets[2]..Although multiple dataset definitions have been proposed[3], As of 2020, there is no official definition.Some real-time data sources in the dataset[4]And so on, and the existence of such data also makes it difficult to agree on the definition of a dataset.


Some properties define the structure and properties of the dataset.In addition to the number and type of attributes and variables, this property also includesstandard deviation,kurtosisVarious applicable to attributes and variables, such as(English edition)Is included[5].

The values ​​contained in the dataset are, for example,Real numberorIntegerIt may be a number such as (for example, the height of a person isCentimeterOn the other hand, it may be a label expressed by a character string such as a category (for example, the ethnic group to which a person belongs cannot be expressed numerically).[Annotation 1]).More generally, the value isscaleApplies to any of[6]..Usually, the values ​​corresponding to the same variable are of the same type even if the data changes.However, depending on the dataMissingThere is a possibility, and it also needs to be shown in some way[7].

statisticsIn, the dataset is usuallypopulationThesamplingIt is derived from the actual observed values ​​obtained by doing so.Each row of the dataset corresponds to an observation of one element of the population.The dataset is of a particular typesoftwareFor the purpose of testingalgorithmMay be further generated using.Also, if the data is missing or if you doubt that the value is correct,Substitution methodMay be used to complete a dataset[8].

Classic dataset

statisticsClassic datasets are widely used in the literature.

  • (English edition) - Ronald FisherBy1936 Multivariate dataset introduced in[9].
  • MNIST database – A dataset containing images of handwritten numbers commonly used for classification, clustering, and evaluation of image processing algorithms.
  • Categorical data analysis - An Introduction to Categorical Data AnalysisThe dataset used in (Alan Agresti, 2019).
  • Robust statistics - Robust Regression and Outlier Detection((English edition), Leroy, 1986).
  • Time series – In the chat fieldThe Analysis of Time SeriesThe data used in.
  • Extreme values - An Introduction to the Statistical Modeling of Extreme ValuesThe data used in is a snapshot of the data provided by the author of the book, Stuart Coles.
  • Bayesian Data Analysis – The data used in the book of the same name (A. Gelman, JB Carlin, HS Stern, DB Rubin, 1995) is provided online by one of the authors, Andrew Gelman.
  • Anscomb Quartet – A small dataset intended to show the importance of graphing data to avoid statistical fallacy.

  1. ^ Snijders, C .; Matzat, U .; Reips, U.-D. (2012). “'Big Data': Big gaps of knowledge in the field of Internet”. International Journal of Internet Science 7: 1–5. http://www.ijis.net/ijis7_1/ijis7_1_editorial.html. 
  2. ^ "European open data portal". European open data portal. European Commission. 2016/9/23Browse.
  3. ^ "Dataset definition – MELODA". www.meloda.org. 2016/8/17Browse.[Broken link]
  4. ^ Atz, U (2014). “The tau of data: A new metric to assess the timeliness of data in catalogs”. CEDEM 2014 Proceedings. https://project.opendatamonitor.eu/wp-content/uploads/dissemination/OpenDataMonitor_Publication_The-Tau-of-Data.pdf 2021/2/24Browse.. 
  5. ^ Jan M. Å»ytkow, Jan Rauch (1999). Principles of data mining and knowledge discovery. ISBN 978-3-540-66490-1. https://books.google.com/books?id=uTzeRZFmaBgC&pg=PA100 
  6. ^ Junichi Hirayama (2016). “Scale level judgment method for improving data analysis efficiency”. Proceedings of the Japanese Society for Artificial Intelligence National Convention JSAI2016: 2P114in1-2P114in1. two:10.11517 / pjsai.JSAI2016.0_2P114in1. 
  7. ^ Nomura Research Institute, Ltd. 2013, p. 23.
  8. ^ United Nations Statistical Commission; United Nations Economic Commission for Europe (2007) (pdf). Statistical Data Editing: Impact on Data Quality: Volume 3 of Statistical Data Editing, Conference of European Statisticians Statistical standards and studies. United Nations Publications. P. 20. ISBN 978-9211169522 . https://unece.org/fileadmin/DAM/stats/publications/editing/SDE3.pdf 2015/7/19Browse. 
  9. ^ Fisher, RA (1936). “The Use of Multiple Measurements in Taxonomic Problems”. 7 (2): 179-188. two:10.1111 / j.1469-1809.1936.tb02137.x. 

注 釈

  1. ^ Of course, numbers can be assigned for convenience.For example, the Germanic people are 1 and the Han people are 2, but even in that case, unlike the height, the size and ratio of the numbers are meaningless.


