写真
Afureru releases two types of AI robotics teaching materials by Python programming, special price until December 2rd ...
If you write the contents roughly
Furthermore, through the experience of visualizing the process of learning by the robot for smooth running, it will lead to the understanding of the concept of machine learning.
Afureru is an AI robotics teaching material "Robot ..." that develops AI robot running by Python programming. → Continue reading
EdTechZine
"The power to continue learning for all"
"EdTechZine" is an online media of educational ICT (EdTech) for all people who want to learn and want to teach.
We hope that our readers will be able to help themselves to become involved in society and lead a fulfilling life.
Wikipedia related words
If there is no explanation, there is no corresponding item on Wikipedia.
Machine learning
Machine learning(Kikaigakushu,British: machine learning) Is a computer algorithm or its research area that automatically improves by learning from experience.^{[1]}^{[2]},Artificial intelligenceIs considered to be a type of. Learning is done using data called "training data" or "learning data", and some tasks are performed using the learning results.For example, in the pastSpam mailIt is possible to learn by using as training data and perform the task of spam filtering.
Machine learning is closely related to the following areas:
 : Areas focused on computerbased forecasting
 Mathematical optimization: Areas focused on searching for optimal solutions under defined conditions
 Data Mining: In unsupervised learning (described later)Areas focused on^{[Note 1]}^{[4]}
The name machine learning was named in 1959Arthur SamuelCoined by^{[5]}.
Overview
Definition
The following concise definition by Tom M. Mitchell is widely cited, although the definition varies from one author to another:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E^{[6]}.
Computer program from experience E with respect to task class T and performance indicator PLearningThen, it means that the performance measured by P of the task in T is improved by experience E. —
heretaskIs a problem to be solved by the program. For example, in the case of a sales forecasting task, it is a task such as "forecast tomorrow's sales".
経 験Is given to the program as some data.This dataTraining dataOrTraining dataIn the case of a sales forecasting task, for example, sales up to today, which is "past experience", are given as training data.The process of improving the performance of a program using training data, "ProgramTraining"Do" or "ProgramLearningLet me do it. "Also, a set of all data used to train a program (training or learning)data set(Data setTomo).
FinallyPerformanceIs an index for measuring how much performance the program has achieved the task, and in the case of the abovementioned sales forecasting task, for example, an error from the actual sales can be used as the performance index.
Variable type
In machine learning, dataxWhen is a continuous quantityxTheQuantitative variables(quantitative variable), A variable that represents the type of thing, such as a classification category such as "dog" or "cat".Qualitative variables(qualitative variable)^{[7]}^{[8]}..Qualitative variablesCategorical variable(categorical variable),factor(factorAlso called)^{[8]}.
In addition to quantitative and qualitative variables, it takes discrete values ordered as "large", "medium", and "small".Ordered categorical variable(ordered categorical variable)^{[8]}. AlsoNatural languageUnlike qualitative variables, machine learning handles things that are not continuous quantities and do not take values in a finite number of categories, unlike categorical variables.
Types of machine learning tasks
Machine learning tasks can be divided into the following three typical categories.However, these three do not cover all the tasks handled by machine learning, and some tasks belong to multiple categories and some tasks are ambiguous as to which category they belong to.
 Supervised learning
 Inputs and corresponding outputs^{[Note 2]} Generate a function that maps.For exampleSortThe problem is given an example shown in the classification corresponding to the input vector and the output, and the function that maps them is approximated.
 Unsupervised learning
 Build a model from input only (unlabeled example).Data MiningSee also
 Reinforcement learning
 Learn how to act by observing the surrounding environment. Actions always affect the environment, and feedback from the environment in the form of rewards is used as a guide for learning algorithms. For exampleQ learningThere is.
Supervised learning
Overview
Supervised learning(supervised learning), Unknown probability distributionTarget.In some sense in practical applicationxTheinput,yTheoutputIn many cases, for exampley ThexUnknown functionFValueF(x)There is a small noise on it.The algorithm isFollowxとyPair ofIs given as training data.The task that the algorithm should solve does not belong to (maybe) training dataxOn the other hand, the conditional probability distributionOr a value determined from it (for exampleIs to approximate the expected value of^{[9]}..The accuracy of the approximation is predeterminedLoss functionEvaluate using the function.Therefore, it can be said that the goal of supervised machine learning is to reduce the expected value of the loss function.
I mentioned earlierDefinition of machine learningAccording to, supervised machine learning can be said to be the following machine learning:
task  経 験  Performance 

Or approximate the value determined from it well  Training data  Expected value of loss function 
Prior knowledge in supervised learningFrom the unknownxCorresponding toyDistributionIs required to guess.Therefore, the algorithm is unknownxからThe operation to find (or the value determined from it)GeneralizationOrinference(inference).Depending on the task, it may be called "prediction", "judgment", "recognition", etc.
Algorithm is unknown dataxからxCorresponding toyIt is necessary to infer information on the distribution of, but the training data given as prior knowledge for this inferencex_{i}Must be inferred fromy_{i}Is attached as the "answer". The name "supervised learning" is thus a known "problem"x_{i}"Answer" toy_{i}The algorithm that is a "student" is unknown in the setting that the "teacher" teachesx"Answer" corresponding toyIt is named after inferring.For the same reason, in supervised learning, training data is used.Teacher dataAlso called.
Training phase and generalization phase
In many supervised machine learning models, before the actual generalizationTrainingOrLearningA work called "training algorithm" and "generalization algorithm" can be regarded as a pair of machine learning models.The training algorithm takes the training data as input andThe parameterValue calledθIs output.The parameter is intuitively the "learning result" obtained by extracting useful information from the training data, and is this "learning result" in the case of generalization.θGeneralize using.In other words, the generalization algorithm is inputxBesides parametersθAlso it receives as input,Find (or a value determined from it).
Variable name
Variables in supervised machine learningxTheExplanatory variable(explanation variable),yTheObjective variable,Target variable(target variable) Ortarget(target)^{[7]}..These are often referred to by different names,xThePredictive variables(predictor),yTheResponse variable(response variable)^{[8]},xTheIndependent variable(Independent variable),yTheDependent variable(dependent variable) May be called^{[8]}..Also, depending on the task, it may be called by a name other than these.
Regression and classification
Regression and classification are typical tasks that belong to supervised learning.Objective variable in supervised learningyWhen is a quantitative variableRegression(regression), If it is a categorical variable that takes a value in a finite setSort(classification) OrDiscriminationCall^{[8]}^{[10]}.
Regression
Regression goal is inputxWhen givenExpect information about.Typically
likeyIs an unknown functionFStatueF(x)Random noiseεInput in the case of data withxからyAs accurate as possible forecastIs required to be output.Objective variables handled in regressionyIs a continuous quantity, typically a numerical vector in which a plurality of real numbers are arranged.
Like other supervised machine learning algorithms, regression algorithmsA set of training data selected according toCan be received as, and input with these training data as hintsxCorresponding toyExpected value of
Is output.Forecast accuracy is a loss functionMeasured by.Loss function in regressionasSquare error loss
Is often used.
The goal of regression isGeneralization error(Prediction error,Forecast lossBoth)
Is to keep it small.hereIs the output of the generalization algorithm,E[・]Represents the expected value.
Sort
In the classification task, a finite number of predetermined classes are defined, and each class has "cat", "dog", etc.Class label(Or simplylabel) Is assigned a class name.The purpose of the classification task is given inputxIt is to guess which one belongs to.
There are roughly two types of algorithms for solving classification tasks: a "deterministic approach" and a "stochastic approach".^{[11]}, The former is input in the classification taskxWhen givenxIt outputs the class label to which it seems to belong, and is typically a loss function.01 loss
use^{[12]}.
The latter, on the other hand, does not output the class label directly,Confidence(confidence score）Is to be output.here Thex jA measure of how confident you are in the second class,とMeet
Training data for classification tasks that output confidence Ofy_{i}Is also encoded so that it is consistent with certainty.That is,x_{i} jIf it belongs to the second classAnd.heree_{j} ThejA vector in which the third component is 1 and the other components are 0 (thus a vector in which only one component is 1 and the others are 1).onehot vectorTo express data by onehot vectoronehot expression^{[13]} ).Typically as a loss functionCross entropy
use^{[12]}.
Relationship between regression and classification
A typical method for designing an algorithm for a classification task using conviction is to use the algorithm for a regression task.That is, training data in which the class is encoded by a onehot vector.It is a method of training the algorithm of the regression task using and using the algorithm of the training result for the classification task.However, regression task outputIs different from the conviction, which is the output of the classification task,とThe problem that the condition is not satisfied arises.So once
This problem is solved by applying.
On the contrary, the classification task using the certainty can be diverted to the regression task, and in this case, it is necessary to apply the inverse conversion of the softmax conversion for the same reason as above.
Bias and variance tradeoff
In regression, inputxCorresponding toyPredicted value ofIs required to output TheyIt is desirable that it is close to the expected value ofIt is desirable that the variation of is small.However, as shown below, these two requirements are in a tradeoff relationship.^{[14]} :
theorem (Bias and variance tradeoff)  p(x,y)TheWith the above probability distribution,DTheAs a set of training data selected according to some probability distribution above^{[Note 3]},As a regression algorithmDThe function obtained by training this regression algorithm byAnd the error function is the squared error
Defined byTheDChoose independently of
And
At this time, the training data set of the prediction errorDExpected value forExpected prediction error^{[15]}）
Meets:
here,
here
So,
The case of regression was described above, but the same applies to the classification that outputs the certainty.
Bayesian rules
L,p(x,y)Are the loss function and data distribution for supervised learning tasks such as regression and classification, respectively.FForecast lossIt is written as.At this time, the lower limit of the predicted loss
The loss functionLUnderBayesian error(Bayes error) To achieve the lower limitFTheBayesian rules(Bayes rule)^{[16]}.. here TheMeasurable functionIn the whole setLower limit.
Bayesian rule is the best predictive function in theory, but in reality it is a probability distributionp(x,y)Is unknownp(x,y)Forecast loss forCannot be calculated and Bayesian rules cannot be obtained.Therefore, known data in supervised learningIt is necessary to search for an algorithm that outputs a value as close to the Bayesian rule as possible.
Regression
When the square loss is selected as the loss function, the following theorem holds.^{[17]} :
theorem (Bayes' rule of regression on square loss)  p(x,y)TheWith the above probability distribution,
And.At this time, the generalization errorTo minimizeIt is,
Is. hereE Thep(x,y)Conditional probability distribution determined byRandomly fromyThis is the expected value when you select.
To minimize eachxAgainst
Should be minimized.
ThanSIs minimized
In the case of.
functionTheRegression functionSometimes called^{[17]}.
Sort
In a classification task (of the type that outputs the class directly rather than the confidence), the Bayesian rule for 01 loss is as follows:
Unsupervised learning
Unsupervised learning(unsupervised learning), Unlike supervised learning, the objective variableyIt is not possible to know if there is an equivalent to in the first place.
Unknown probability distribution in unsupervised machine learningVariables that followIs given to the algorithm as training data.The task that the algorithm should solve is the probability distributionAnd somehow learned its important properties,Is to directly estimate the characteristics of^{[9]}^{[18]}..A clear "correct answer" unlike supervised learningyThere is no evaluation scale that directly evaluates the validity of the output in unsupervised learning because there is no^{[18]}, The judgment of validity becomes subjective^{[18]}, Heuristic discussion required^{[18]}.
One of the interests of unsupervised learning is the probability density functionEstimate itselfDensity estimationIs the task ofKernel density estimationVarious nonparametric density estimation methods are known in statistics.^{[18]}.. HoweverxIf the dimension ofCurse of dimensionalityThis presumption does not work because of^{[18]}Therefore, in many unsupervised learning,With some parametric model ofAttempts to approximate or from training dataAn approach is taken, such as extracting some important property of.
Specific examples are as follows.
Reinforcement learning
Reinforcement learning(Kyogakushu,British: reinforcement learning) Means in an environmentAgentHowever, it is a type of machine learning that deals with the problem of observing the current state and deciding the action to be taken.Agents get rewards from the environment by choosing actions.Reinforcement learning is a measure that gives the most rewards through a series of actions (Policy) To learn.The environment isMarkov decision processIt is formulated as.As a typical methodTD learning,Q learningIt has been known.
 Reinforcement learning is a method of learning "behavior that maximizes value" through trial and error.
 Learning is possible even if the correct answer is not known in advance (= teacher data does not exist)
 There are many application examples in battle games and robots
 Reinforcement learning using deep learning is called deep reinforcement learning.
 The name reinforcement learning comes from operant learning, which is a learning mechanism of the brain advocated by Dr. Skinner.
 Dr. SkinnerSkinner boxThrough a rat experiment called "A reward for a specific movement, that movement is strengthened" was discovered, and this was called operant learning (around 1940).
Other machine learning
For example:
 It allows you to handle both labeled and unlabeled examples, thereby generating approximate functions or classifiers.
 (Transductive reasoning)
 Attempts to predict new output of concrete and fixed (test) cases from the observed concrete (training) cases.
 Learn about multiple related problems at the same time to improve the prediction accuracy of major problems.
Active learningThe algorithm accesses the desired output (training label) for a limited set of inputs on a budget and optimizes the selection of inputs to obtain the training label. When used interactively, they can be presented to human users for labeling. Reinforcement learning algorithms are fed back in the form of positive or negative reinforcement in a dynamic environment and are used to learn to play games with selfdriving cars and human opponents.^{[19]}.. Other specialized algorithms in machine learning include computer programsNatural languageThere is topic modeling that gives a set of documents and finds other documents that cover similar topics. Machine learning algorithms are unobservable in density estimation problemsProbability density functionCan be used to determine. Metalearning algorithms learn their own inductive bias based on past experience. In developmental robotics, robotic learning algorithms generate their own sequences of learning experiences, also known as curriculum, and accumulate new skills through selfguided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturity, motor synergies, and imitation.
Interaction with humans
Some machine learning systems are humanIntuitionIt is trying to eliminate the need for data analysis by humans, but some have incorporated the cooperative interaction between humans and machines. However, the data representation method of the system and the mechanism for exploring the characteristics of the data are designed by human beings, and human intuition cannot be completely excluded.
Relationship with data mining
With machine learningData MiningIs often confused because it has a large intersection and the same technique, but it can be defined as follows.
 The purpose of machine learning is to make predictions based on "known" features learned from training data.
 The purpose of data mining is to characterize data that was previously "unknown."発 見It is to be.
The two overlap in many ways. Data mining uses machine learning techniques, but their purpose is often slightly different. Machine learning, on the other hand, also uses data mining techniques as "unsupervised learning" or as a preprocess to improve learner accuracy. The two research areas areECML PKDD With the exception of, basically, academic societies and academic journals are separate. The biggest cause of confusion between them comes from their basic premise. Machine learning evaluates performance based on the ability to regenerate known knowledge, while data mining emphasizes discovering previously "unknown" knowledge. Therefore, "supervised technique" can easily show superior results than "unsupervised technique" when evaluated by known knowledge. However, in typical data mining, training data cannot be prepared, so "supervised technique" cannot be adopted.
theory
An analysis of machine learning algorithms and their performanceTheoretical computer scienceIs a fieldIt is called. Learning theories generally cannot guarantee the performance of algorithms because the training examples are finite, while the future is uncertain. Instead, it gives a stochastic range of performance. byThere is also an expression called statistical learning theory.^{[20]}
In addition to that, of learningTime complexityI am also studying the feasibility. In computational learning theory,Polynomial timeCalculations ending with are considered feasible.
With machine learningstatisticsAre similar in many respects, but use different terms.
Statistical machine learning
Statistical machine learning is the data of machine learning.Probabilistic generation ruleWhat to learn^{[21]} Refers to.
statistics ThepopulationAnd the specimen, which exists thereProbability distributionIt is a methodology focusing on. In statistical machine learning, we think that data can be obtained stochastically from the population, model the data generation process using a probability distribution, and train the model (or learn the model selection itself) based on the actual data. .. The model of statistical machine learning is also called a generative model / statistical model because it can be interpreted that the data is obtained from the population and the data is generated by sampling from the population.^{[22]}.
Specimenbased population (parameter) estimation and selection has long been studied in statistics and there are many theories. Since learning in statistical machine learning is exactly population estimation / selection, the theory of statistics can be applied to machine learning. Various machine learning issues such as learning convergence and generalization performance are being studied using the knowledge system of statistics.
An example of statistical machine learning isneural networkGenerative model in, eg, autoregressive generative net,Variational autoencoder(VAE),Adversarial Generation Network(GAN) and the like. Since data such as images and sounds can be generated by actually sampling from these models (= population), it was studied very well in the latter half of the 2010s, especially in the field of neural networks, and has achieved great results (WaveNet, VQ VAE2, BigGAN, etc.).
Mathematical optimization
Many machine learning methods define the error of the model output for the data and update (learn) the parameters so as to minimize the error. The function that calculates the error, that is, the academic system that minimizes the loss function, is used in applied mathematics.Mathematical optimization(The problem to be solved isOptimization problem) Called.
For example,neural networkLet's differentiate the loss functionGradient method(Stochastic gradient descentEtc.), learning is often done. Whether or not the optimization by the gradient method converges to the optimum solution is studied by the theory of mathematical optimization. Also, the constraints imposed on the neural network differ depending on the optimization method used, and all consecutive function applications are differentiable to use the gradient method ()Back propagationIs required (which strongly constrains the sampling of the generative model).
Technique
 Decision treeLearning
 Decision treeTheIt is a learning used as, and maps observations about an item with conclusions about the target value of that item. As a concrete exampleID3,Random forestThere is.
 A technique for discovering interesting relationships between variables in large databases.
 neural network
 HierarchicalnonlinearA network of transformations.in generalBackpropagation methodLearned at.It has high expressive ability due to nonlinearity and is used for various tasks such as classification, regression, and generation.
 Genetic programming (GP)
 Of living things進化ImitatedEvolutionary algorithmIs a technique based on, performing userdefined tasksProgramTo explore.Genetic algorithmIs an extension and specialization of. By the ability to perform a given taskFitness terrainIt is a machine learning technique that determines and thereby optimizes computer programs.
 (ILP)
 Use examples, background knowledge, and hypotheses as uniform expressionsLogic programmingIs a technique for regularizing learning using. Encode a set of known background knowledge and examples into a logical database of facts, with all positive examplesIncluding, Generate a hypothetical logic program that does not contain any negative examples.
 Support vector machine (SVM)
 Sort,RegressionA series used forSupervised learningIt is a technique. The label of the training example isBinary classification(Classified into two), build a model with a training algorithm and predict which new example will be classified.
 Clustering
 Clustering distributes the observed examples to a subset called a cluster, and the distribution is performed according to a preinstructed standard. The results of clustering differ depending on how a hypothesis (standard) is established for the structure of the data. Hypotheses are defined on a "similarity scale" and are evaluated by "internal compactness" (similarity between members within the same cluster) and distance between different clusters. There are also techniques based on "estimated density" and "graph connectivity". ClusteringUnsupervised learningIt ’s a technique,statisticsOften used in data analysis.
 Bayesian network
 Random variableFlock and thoseTheDirected acyclic graph Expressed in (DAG)Probabilistic graphical modelIs. For example, the relationship between illness and symptoms can be expressed stochastically. If you enter the symptoms in the network, you can output a list of possible diseases with probability. With thisinferenceThere is an efficient algorithm for learning.
 Unsupervised learningSome of the algorithms try to find a better representation of the input provided during training. As a classic examplePrincipal component analysis,Cluster analysisThere is. There are also algorithms that transform the input into a more convenient representation before classification or prediction, while retaining the information that the input has. At that time, the input can be reconstructed from the unknown probability distribution that the input data follows, but it is not necessary to faithfully reproduce the incredible example in the probability distribution. For exampleThe algorithm expresses the input dimension by converting it low under some restrictions.In the algorithm, the same expression is converted under the constraint that the input is sparse (many zeros). Of neural networkDeep learningDiscovers multiple levels of representation or hierarchy of features, from lowlevel extracted features to highlevel abstracted features. It is also argued that intelligent machines learn expressions that unravel the potential factors of deviations that explain the observed data.^{[23]}.
 Extreme learning machine (ELM)
 It is a feedforward neural network with one or more hidden layers, and can be applied to classification, regression, and clustering.
Application areas
Machine learning has the following application fields.


2006, online DVD rental companyNetflixOf the companyRecommender systemCompetitions looking for programs that are more than 10% more powerful (more accurately predicting user preferences) Netflix Prize Was held.The competition took several years and the AT & T Labs team called "Pragmatic Chaos."^{[24]} Won the machine learning program in 2009 and won $ 100 million^{[25]}.
Actual application
There are:
Sort  Concrete example  

recognition^{[26]}  Image recognition  Face recognition^{[27]} 
Monitoring work^{[27]}  
Inspection / inspection^{[27]}  
Organize images^{[27]}  
Medical diagnosis^{[27]}  
voice recognition  Voice input^{[28]}  
Automatic creation of minutes^{[28]}  
Call center assistance or alternative^{[28]}  
Sentence analysis / sentence recognition  Illegal sentence detection^{[29]}  
Understanding needs^{[29]}  
Search for similar cases in the past^{[29]}  
Anomaly detection  Failure detection^{[30]}  
Suspicious behavior detection^{[30]}  
DefaultDetection^{[30]}  
analysis^{[26]}(Many forecasts^{[31]}）  Numerical forecast  Demand forecast such as sales^{[32]} 
Forecast of stock prices and economic indicators^{[32]}  
Prediction of time required^{[32]}  
Prediction of deterioration^{[32]}  
Quality prediction^{[32]}  
Prediction of event occurrence  Forecast of purchases and cancellations^{[33]}  
Failure prediction^{[33]}  
Disease prediction^{[33]}  
Prediction of compatibility^{[33]}  
Coping^{[26]}  Behavioral optimization  Inventory optimization^{[34]} 
Advertising optimization^{[34]}  
Campaign optimization^{[34]}  
Optimizing store openings^{[34]}  
Delivery optimization^{[34]}  
Work optimization  selfdriving^{[35]}  
Robot control^{[35]}  
Q & A automation^{[35]}  
Expression generation  翻 訳^{[36]}  
要約^{[36]}  
Image generation^{[36]} 
software
Equipped with various machine learning algorithmsSoftware suiteAs,SASRapidMinerLION solverKNIMEPutODMShogun toolboxOrangeApache mahoutScikitlearnmlpyMCMLLOpenCVXGBoost· and so on.
Data Robot^{[37]} There is a method to compare multiple methods by parallel calculation^{[38]}.
Academic journals and international conferences
 Machine Learning(Academic journal)
 Journal of Machine Learning Research(Academic journal)
 Neural Computation(Academic journal)
 International Conference on Machine Learning (ICML) (International Conference)
 Neural Information Processing Systems (NeurIPS formerly known as NIPS) (International Conference)
footnote
注 釈
 ^ Machine learning and pattern recognition "can be viewed as two facets of the same field."^{[3]}^{: vii}
 ^ Because it is often provided by human experts by labeling the training exampleslabelAlso called.
 ^ Typicallyp(x,y)Independently according toDSelect each data ofDThe theorem can be proved regardless of the probability distribution selected from
Source
 ^ "Machine Learning textbook". www.cs.cmu.edu. 2020/5/28Browse.
 ^ (2008) “The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence”, in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer, Kluwer, pp. 23–66, ISBN 9781402067082
 ^ # bishop2006
 ^ (1998). “Data Mining and Statistics: What's the connection?”. Computing Science and Statistics 29 (1): 3–9.
 ^ Samuel, Arthur (1959). “Some Studies in Machine Learning Using the Game of Checkers”. IBM Journal of Research and Development 3 (3): 210–229. two:10.1147 / rd.33.0210.
 ^ Mitchell, T. (1997). Machine Learning. McGraw Hill. Pp. 2. ISBN 9780070428072
 ^ ^{a} ^{b} #waterfall p.20.
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} ^{f} #ESL p1112
 ^ ^{a} ^{b} #GBC Verse 5.1.3
 ^ #Kanamori p.3.
 ^ #waterfall p.8.
 ^ ^{a} ^{b} #waterfall p.36.
 ^ #waterfall p.30.
 ^ "Lecture 12: BiasVariance Tradeoff". CS4780 / CS5780: Machine Learning for Intelligent Systems [FALL 2018]. Cornell University. 2020/11/10Browse.
 ^ #Kanamori p.13.
 ^ #Kanamori p.9.
 ^ ^{a} ^{b} #ESL p2223
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} ^{f} #ESL p559561
 ^ (2006) Pattern Recognition and Machine Learning, Springer, ISBN 9780387310732
 ^ Statistical Learning Theory, Takafumi Kanamori, Machine Learning Professional Series, Kodansha, 2015, ISBN 9784061529052
 ^ "Statistical Machine Learning Theory and Boltzmann Machine Learning" Muneki Yasuda. Yamagata University
 ^ Ueda. "Introduction to Statistical Machine Learning" NII. https://www.youtube.com/watch?v=wqb3k22toFY&t=478
 ^ Yoshua Bengio (2009). Learning Deep Architectures for AI. Now Publishers Inc .. p. 1–3. ISBN 9781601982940
 ^ British: Pragmatic Chaos
 ^ "BelKor Home Page" research.att.com
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Near the beginning of Chapter 1.3 "Usage of Artificial Intelligence" and "Three Roles of Artificial Intelligence".
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 14 "Specific Examples of Image Recognition"
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 15 "Specific Examples of Voice Input"
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 16 "Specific Examples of Sentence Analysis / Sentence Recognition"
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Chapter 1.4 “Specific Examples of Recognition” Figure 17 “Specific Examples of Anomaly Detection”
 ^ # Motohashi 2018 Chapter 1.5 "What is Analysis?"
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} # Motohashi 2018 Chapter 1.5 "Specific Examples of Analysis" Figure 18 "Specific Examples of Numerical Prediction"
 ^ ^{a} ^{b} ^{c} ^{d} # Motohashi 2018 Chapter 1.5 "Specific Examples of Analysis" Figure 19 "Specific Examples of Prediction of Event Occurrence"
 ^ ^{a} ^{b} ^{c} ^{d} ^{e} # Motohashi 2018 Chapter 1.6 “Specific Examples of Coping” Figure 110 “Specific Examples of Behavior Optimization”
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Chapter 1.6 "Specific Examples of Countermeasures" Figure 112 "Specific Examples of Specific Work"
 ^ ^{a} ^{b} ^{c} # Motohashi 2018 Chapter 1.6 “Specific Examples of Countermeasures” Figure 113 “Specific Examples of Expression Generation”
 ^ British: DataRobot
 ^ DataRobot: https://www.datarobot.com
References
 Christopher M. Bishop (2006). Pattern Recognition And Machine Learning.SpringerVerlag. ISBN 9780387310732
(Intermediate and advanced textbooks) →Support page(From here, Chapter 8 "Graphical Models" is available in pdf format)
 Japanese version "Pattern recognition and machine learningstatistical prediction by Bayesian theory" Springer Japan (20072008) Volume XNUMX:ISBN978 4431100133 Volume XNUMX:ISBN978 4431100317 →Japanese version support page
 Motohashi, Yosuke (2018/2/15). This book that understands artificial intelligence system projects From planning / development to operation / maintenance (AI & TECHNOLOGY)Shoeisha. ASIN B078JMLVR2. ISBN 9784798154053
 Ian Goodfellow, Yoshua Bengio, Aaron Courville Translation: Hiroo Kurotaki, Shin Kono, Masashi Misono, Jun Hozumi, Naoki Nonaka, Shoji Tomiyama, Takahiro Tsunoda, Supervision: Yusuke Iwasawa, Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo / 2018/8). Deep learning (kindle version).Dwango. ASIN B07GQV1X76
 "Deep Learning An MIT Press book". 2020/10/30Browse.Web version of the original book
 Author: Trevor Hastie, Robert Tibshirani, Jerome Friedman, Translation: Masaru Sugiyama, Tsuyoshi Ide, Toshihiro Kamishima, Takio Kurita, Eisaku Maeda, Yoshihisa Ijiri, Toshiharu Iwata, Takafumi Kanamori, Atsushi Kanemura, Masayuki Karasuyama, Yoshinobu Kawahara, Shogo Kimura, Yoshinori Konishi, Tomoya Sakai, Daiji Suzuki, Ichiro Takeuchi, Toru Tamaki, Daisuke Deguchi, Ryota Tomioka, Hitoshi Habe, Shinichi Maeda, Daichi Mochihashi, Makoto Yamada (2014/6/25). Basics of Statistical LearningData Mining, Inference, PredictionKyoritsu Shuppan. ISBN 9784320123625
 "The Elements of Statistical Learning: Data Mining, Inference, and Prediction.". Stanford University. 2020/11/10Browse.: English version official website of the above books.Free pdf available.
 Masato Taki (2017/10/21). This is an introduction to deep learning.KS Information Science Specialized Book Machine Learning Startup Series. Kodansha. ISBN 9784061538283
 Takafumi Kanamori (2015/8/8). Statistical learning theory.KS Information Science Specialized Book Machine Learning Startup Series. Kodansha. ISBN 9784061529052
 Yasuaki Ariga, Shinta Nakayama, Takashi Nishibayashi, "Machine Learning Beginning at Work", January 2018, 1.ISBN 9784873118253.
Further reading
 Thomas Mitchell "Machine Learning" McGrawHill (1997) ISBN978 0071154673 (Introductory textbook) →Support page
 Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman "The Elements of Statistical Learning: Data Mining, Inference, and Prediction" SpringerVerlag (2001) ISBN978 0387952840 (Including advanced content. Mainly mathematical and statistical methods) →Support page(From here, all chapters are available in pdf format)
 David MacKay "Information Theory, Inference, and Learning Algorithms" (2003) (A textbook that comprehensively covers information theory and machine learning, with a focus on Bayesian inference) →Author page(The full text is available in PDF format here)
 Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern Recognition", 4th Edition, Academic Press, ISBN 9781597492720.
 Ethem Alpaydın (2004) Introduction to Machine Learning (Adaptive Computation and Machine Learning), MIT Press, ISBN0 262012111
 Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. springer, ISBN3 540378812
 Toby Segaran (2007), Programming Collective Intelligence, O'Reilly, ISBN0 596529325
 Ray Solomonoff, "An Inductive Inference Machine"A privately circulated report from the 1956 Dartmouth Summer Research Conference on AI.
 Ray Solomonoff, An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 5662, 1957.
 Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1983), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, ISBN0 935382054.
 Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1986), Machine Learning: An Artificial Intelligence Approach, Volume II, Morgan Kaufmann, ISBN0 934613001.
 Yves Kodratoff, Ryszard S. Michalski (1990), Machine Learning: An Artificial Intelligence Approach, Volume III, Morgan Kaufmann, ISBN1 558601198.
 Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A Multistrategy Approach, Volume IV, Morgan Kaufmann, ISBN1 558602518.
 Bishop, CM (1995). Neural Networks for Pattern Recognition, Oxford University Press. ISBN0 198538642.
 Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York, ISBN0 471056693.
 Huang T.M., Kecman V., Kopriva I. (2006), Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semisupervised, and Unsupervised Learning, SpringerVerlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN3 540316817.
 KECMAN Vojislav (2001), Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge, MA, 608 pp., 268 illus., ISBN0 262112558.
 Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp.,, ISBN978 0123748560.
 Sholom Weiss and Casimir Kulikowski (1991). Computer Systems That Learn, Morgan Kaufmann. ISBN1 558600655.
 Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD06), 2006.
 Vladimir Vapnik (1998). Statistical Learning Theory. WileyInterscience, ISBN0 471030031.
 Peter Flach, Akimichi Takemura (translated), "Machine LearningAlgorithm Techniques for Reading Data", Asakura Shoten,ISBN978 4254122183 (June 2017, 4).
 Nils J. Nilsson, Introduction to Machine Learning.
 , and (2001). The Elements of Statistical Learning, Springer.0387952845.
 (September 2015), , Basic Books,9780465065707
 Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp.,,9780123748560.
 Ethem Alpaydin (2004). Introduction to Machine Learning, MIT Press,9780262012430.
 . Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003.0521642981
 ,, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York,0471056693.
 (1995). Neural Networks for Pattern Recognition, Oxford University Press.0198538642.
 Stuart Russell & Peter Norvig, (2009). Artificial Intelligence – A Modern Approach. Pearson,9789332543515.
 , An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 56–62, 1957.
 , An Inductive Inference Machine A privately circulated report from the 1956.
Related item
 Automatic reasoning
 Computational intelligence
 Computational neuroscience
 Cognitive science
 Cognitive model
 Data Mining
 Pattern recognition
 Kernel method
 Enterprise search
 Curse of dimensionality
 Language acquisition
 Watson (computer)
 statistics
 Artificial intelligence
 big data
 Machine learning evaluation index
外部 リンク
 Institute of Electronics, Information and Communication Engineers Information Theory Learning Theory and Machine Learning (IBISML) Study Group
 Toki no Mori Wiki Wiki about machine learning and data mining
 International Machine Learning Society
 mloss is an academic database of opensource machine learning software.
 Machine Learning Crash Course by Google. This is a free course on machine learning through the use of TensorFlow.
 "Machine learning』 Koto bank