The papers of the research group led by Nakazawa Laboratory received the Information Processing Society of Japan "Payment Award".
If you write the contents roughly
In recent years, non-invasive devices that measure brain wave data in real time have been released as consumer products, and methods for analyzing brain wave data using machine learning are being actively researched.
Improved authentication accuracy and convenience through research on biometrics authentication using brain waves.Research group centered on Nakazawa Laboratory ... → Continue reading
Kyodo News PR Wire
Kyodo PR wire, which distributes press releases and news releases, connects information from "who wants to know" to "people who want to know."
This is a site that consumers should pay attention to, where news releases from major governments and government agencies including local governments and universities are gathered.
Wikipedia related words
If there is no explanation, there is no corresponding item on Wikipedia.
Machine learning(Kikaigakushu,British: Machine Learning) Is a computer algorithm or its research area that automatically improves by learning from experience.,Artificial intelligenceIs considered to be a type of. Learning is done using data called "training data" or "learning data", and some tasks are performed using the learning results.For example, in the pastSpam mailIt is possible to learn by using as training data and perform the task of spam filtering.
Machine learning is closely related to the following areas:
- : Areas focused on computer-based forecasting
- Mathematical optimization: Areas focused on searching for optimal solutions under defined conditions
- Data Mining: In unsupervised learning (described later)Areas focused on[Note 1]
The following concise definition by Tom M. Mitchell is widely cited, although the definition varies from one author to another:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Computer program from experience E with respect to task class T and performance indicator PLearningThen, it means that the performance measured by P of the task in T is improved by experience E. —
heretaskIs a problem to be solved by the program. For example, in the case of a sales forecasting task, it is a task such as "forecast tomorrow's sales".
経 験Is given to the program as some data.This dataTraining dataOrTraining dataIn the case of a sales forecasting task, for example, sales up to today, which is "past experience", are given as training data.The process of improving the performance of a program using training data, "ProgramTraining"Do" or "ProgramLearningLet me do it. "Also, a set of all data used to train a program (training or learning)data set(Data setTomo).
FinallyPerformanceIs an index for measuring how much performance the program has achieved the task, and in the case of the above-mentioned sales forecasting task, for example, an error from the actual sales can be used as the performance index.
In machine learning, dataxWhen is a continuous quantityxTheQuantitative variables(quantitative variable), A variable that represents the type of thing, such as a classification category such as "dog" or "cat".Qualitative variables(qualitative variable)..Qualitative variablesCategorical variable(categorical variable),factor(factorAlso called).
In addition to quantitative and qualitative variables, it takes discrete values ordered as "large", "medium", and "small".Ordered categorical variable(ordered categorical variable). AlsoNatural languageUnlike qualitative variables, machine learning handles things that are not continuous quantities and do not take values in a finite number of categories, unlike categorical variables.
Types of machine learning tasks
Machine learning tasks can be divided into the following three typical categories.However, these three do not cover all the tasks handled by machine learning, and some tasks belong to multiple categories and some tasks are ambiguous as to which category they belong to.
- Supervised learning
- Inputs and corresponding outputs[Note 2]Generate a function that maps.For exampleSortThe problem is given an example shown in the classification corresponding to the input vector and the output, and the function that maps them is approximated.
- Unsupervised learning
- Build a model from input only (unlabeled example).Data MiningSee also
- Reinforcement learning
- Learn how to act by observing the surrounding environment. Actions always affect the environment, and feedback from the environment in the form of rewards is used as a guide for learning algorithms. For exampleQ learningThere is.
Supervised learning(supervised learning), Unknown probability distributionTarget.In some sense in practical applicationxTheinput,yTheoutputIn many cases, for exampley ThexUnknown functionFValueF(x)There is a small noise on it.The algorithm isFollowxとyPair ofIs given as training data.The task that the algorithm should solve does not belong to (maybe) training dataxOn the other hand, the conditional probability distributionOr a value determined from it (for exampleIs to approximate the expected value of..The accuracy of the approximation is predeterminedLoss functionEvaluate using the function.Therefore, it can be said that the goal of supervised machine learning is to reduce the expected value of the loss function.
I mentioned earlierDefinition of machine learningAccording to, supervised machine learning can be said to be the following machine learning:
|Or approximate the value determined from it well||Training data||Expected value of loss function|
Prior knowledge in supervised learningFrom the unknownxCorresponding toyDistributionIs required to guess.Therefore, the algorithm is unknownxからThe operation to find (or the value determined from it)GeneralizationOrinference(inference).Depending on the task, it may be called "prediction", "judgment", "recognition", etc.
Algorithm is unknown dataxからxCorresponding toyIt is necessary to infer information on the distribution of, but the training data given as prior knowledge for this inferencexiMust be inferred fromyiIs attached as the "answer". The name "supervised learning" is thus a known "problem"xi"Answer" toyiThe algorithm that is a "student" is unknown in the setting that the "teacher" teachesx"Answer" corresponding toyIt is named after inferring.For the same reason, in supervised learning, training data is used.Teacher dataAlso called.
Training phase and generalization phase
In many supervised machine learning models, before the actual generalizationTrainingOrLearningA work called "training algorithm" and "generalization algorithm" can be regarded as a pair of machine learning models.The training algorithm takes the training data as input andThe parameterValue calledθIs output.The parameter is intuitively the "learning result" obtained by extracting useful information from the training data, and is this "learning result" in the case of generalization.θGeneralize using.In other words, the generalization algorithm is inputxBesides parametersθAlso it receives as input,Find (or a value determined from it).
Variables in supervised machine learningxTheExplanatory variable(explanation variable),yTheObjective variable,Target variable(target variable) Ortarget(target)..These are often referred to by different names,xThePredictive variables(predictor),yTheResponse variable(response variable),xTheIndependent variable(Independent variable),yTheDependent variable(dependent variable) May be called..Also, depending on the task, it may be called by a name other than these.
Regression and classification
Regression and classification are typical tasks that belong to supervised learning.Objective variable in supervised learningyWhen is a quantitative variableRegression(regression), If it is a categorical variable that takes a value in a finite setSort(classification) OrDiscriminationCall.
Regression goal is inputxWhen givenExpect information about.Typically
likeyIs an unknown functionFStatueF(x)Random noiseεInput in the case of data withxからyAs accurate as possible forecastIs required to be output.Objective variables handled in regressionyIs a continuous quantity, typically a numerical vector in which a plurality of real numbers are arranged.
Like other supervised machine learning algorithms, regression algorithmsA set of training data selected according toCan be received as, and input with these training data as hintsxCorresponding toyExpected value of
Is output.Forecast accuracy is a loss functionMeasured by.Loss function in regressionasSquare error loss
Is often used.
The goal of regression isGeneralization error(Prediction error,Forecast lossBoth)
Is to keep it small.hereIs the output of the generalization algorithm,E[・]Represents the expected value.
In the classification task, a finite number of predetermined classes are defined, and each class has "cat", "dog", etc.Class label(Or simplylabel) Is assigned a class name.The purpose of the classification task is given inputxIt is to guess which one belongs to.
There are roughly two types of algorithms for solving classification tasks: a "deterministic approach" and a "stochastic approach"., The former is input in the classification taskxWhen givenxIt outputs the class label to which it seems to belong, and is typically a loss function.0-1 loss
The latter, on the other hand, does not output the class label directly,Confidence(confidence score）Is to be output.here Thex jA measure of how confident you are in the second class,とMeet
Training data for classification tasks that output confidence OfyiIs also encoded so that it is consistent with certainty.That is,xi jIf it belongs to the second classAnd.hereej ThejA vector in which the third component is 1 and the other components are 0 (thus a vector in which only one component is 1 and the others are 1).one-hot vectorTo express data by one-hot vectorone-hot expression).Typically as a loss functionCross entropy
Relationship between regression and classification
A typical method for designing an algorithm for a classification task using conviction is to use the algorithm for a regression task.That is, training data in which the class is encoded by a one-hot vector.It is a method of training the algorithm of the regression task using and using the algorithm of the training result for the classification task.However, regression task outputIs different from the conviction, which is the output of the classification task,とThe problem that the condition is not satisfied arises.So once
This problem is solved by applying.
On the contrary, the classification task using the certainty can be diverted to the regression task, and in this case, it is necessary to apply the inverse conversion of the softmax conversion for the same reason as above.
Bias and variance trade-off
In regression, inputxCorresponding toyPredicted value ofIs required to output TheyIt is desirable that it is close to the expected value ofIt is desirable that the variation of is small.However, as shown below, these two requirements are in a trade-off relationship. :
The case of regression was described above, but the same applies to the classification that outputs the certainty.
L,p(x,y)Are the loss function and data distribution for supervised learning tasks such as regression and classification, respectively.FForecast lossIt is written as.At this time, the lower limit of the predicted loss
Bayesian rule is the best predictive function in theory, but in reality it is a probability distributionp(x,y)Is unknownp(x,y)Forecast loss forCannot be calculated and Bayesian rules cannot be obtained.Therefore, known data in supervised learningIt is necessary to search for an algorithm that outputs a value as close to the Bayesian rule as possible.
When the square loss is selected as the loss function, the following theorem holds. :
functionTheRegression functionSometimes called.
In a classification task (of the type that outputs the class directly rather than the confidence), the Bayesian rule for 0-1 loss is as follows:
Unsupervised learning(unsupervised learning), Unlike supervised learning, the objective variableyIt is not possible to know if there is an equivalent to in the first place.
Unknown probability distribution in unsupervised machine learningVariables that followIs given to the algorithm as training data.The task that the algorithm should solve is the probability distributionAnd somehow learned its important properties,Is to directly estimate the characteristics of..A clear "correct answer" unlike supervised learningyThere is no evaluation scale that directly evaluates the validity of the output in unsupervised learning because there is no, The judgment of validity becomes subjective, Heuristic discussion required.
One of the interests of unsupervised learning is the probability density functionEstimate itselfDensity estimationIs the task ofKernel density estimationVarious nonparametric density estimation methods are known in statistics... HoweverxIf the dimension ofCurse of dimensionalityThis presumption does not work because ofTherefore, in many unsupervised learning,With some parametric model ofAttempts to approximate or from training dataAn approach is taken, such as extracting some important property of.
Specific examples are as follows.
Reinforcement learning(Kyogakushu,British: reinforcement learning) Means in an environmentAgentDeales with the problem of observing the current state and deciding what action to takeMachine learningA kind of.Agents get rewards from the environment by choosing actions.Reinforcement learning is a policy that gives the most rewards through a series of actions (Policy) To learn.The environment isMarkov decision processIt is formulated as.As a typical methodTD learning,Q learningIt has been known.
Other machine learning
- It allows you to handle both labeled and unlabeled examples, thereby generating approximate functions or classifiers.
- (Transductive reasoning)
- Attempts to predict new output of concrete and fixed (test) cases from the observed concrete (training) cases.
- Learn about multiple related problems at the same time to improve the prediction accuracy of major problems.
Active learningThe algorithm accesses the desired output (training label) for a limited set of inputs on a budget and optimizes the selection of inputs to obtain the training label. When used interactively, they can be presented to human users for labeling. Reinforcement learning algorithms are fed back in the form of positive or negative reinforcement in a dynamic environment and are used to learn to play games with self-driving cars and human opponents... Other specialized algorithms in machine learning include computer programsNatural languageThere is topic modeling that gives a set of documents and finds other documents that cover similar topics. Machine learning algorithms are unobservable in density estimation problemsProbability density functionCan be used to determine. Meta-learning algorithms learn their own inductive bias based on past experience. In developmental robotics, robotic learning algorithms generate their own sequences of learning experiences, also known as curriculum, and accumulate new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturity, motor synergies, and imitation.
Interaction with humans
Some machine learning systems are humanIntuitionIt is trying to eliminate the need for data analysis by humans, but some have incorporated the cooperative interaction between humans and machines. However, the data representation method of the system and the mechanism for exploring the characteristics of the data are designed by human beings, and human intuition cannot be completely excluded.
Relationship with data mining
With machine learningData MiningIs often confused because it has a large intersection and the same technique, but it can be defined as follows.
- The purpose of machine learning is to make predictions based on "known" features learned from training data.
- The purpose of data mining is to characterize data that was previously "unknown."発 見It is to be.
The two overlap in many ways. Data mining uses machine learning techniques, but their purpose is often slightly different. Machine learning, on the other hand, also uses data mining techniques as "unsupervised learning" or as a pre-process to improve learner accuracy. The two research areas areECML PKDD With the exception of, basically, academic societies and academic journals are separate. The biggest cause of confusion between them comes from their basic premise. Machine learning evaluates performance based on the ability to regenerate known knowledge, while data mining emphasizes discovering previously "unknown" knowledge. Therefore, "supervised technique" can easily show superior results than "unsupervised technique" when evaluated by known knowledge. However, in typical data mining, training data cannot be prepared, so "supervised technique" cannot be adopted.
An analysis of machine learning algorithms and their performanceTheoretical computer scienceIs a fieldIt is called. Learning theories generally cannot guarantee the performance of algorithms because the training examples are finite, while the future is uncertain. Instead, it gives a stochastic range of performance. byThere is also an expression called statistical learning theory.
With machine learningstatisticsAre similar in many respects, but use different terms.
Statistical machine learning
Statistical machine learning is the data of machine learning.Probabilistic generation ruleWhat to learnRefers to.
statistics ThepopulationAnd the specimen, which exists thereProbability distributionIt is a methodology focusing on. In statistical machine learning, we think that data can be obtained stochastically from the population, model the data generation process using a probability distribution, and train the model (or learn the model selection itself) based on the actual data. .. The model of statistical machine learning is also called a generative model / statistical model because it can be interpreted that the data is obtained from the population and the data is generated by sampling from the population..
Specimen-based population (parameter) estimation and selection has long been studied in statistics and there are many theories. Since learning in statistical machine learning is exactly population estimation / selection, the theory of statistics can be applied to machine learning. Various machine learning issues such as learning convergence and generalization performance are being studied using the knowledge system of statistics.
An example of statistical machine learning isneural networkGenerative model in, eg, autoregressive generative net,Variational autoencoder(VAE),Adversarial Generation Network(GAN) and the like. Since data such as images and sounds can be generated by actually sampling from these models (= population), it was studied very well in the latter half of the 2010s, especially in the field of neural networks, and has achieved great results (WaveNet, VQ- VAE-2, BigGAN, etc.).
Many machine learning methods define the error of the model output for the data and update (learn) the parameters so as to minimize the error. The function that calculates the error, that is, the academic system that minimizes the loss function, is used in applied mathematics.Mathematical optimization(The problem to be solved isOptimization problem) Called.
For example,neural networkLet's differentiate the loss functionGradient method(Stochastic gradient descentEtc.), learning is often done. Whether or not the optimization by the gradient method converges to the optimum solution is studied by the theory of mathematical optimization. Also, the constraints imposed on the neural network differ depending on the optimization method used, and all consecutive function applications are differentiable to use the gradient method ()Back propagationIs required (which strongly constrains the sampling of the generative model).
- Decision treeLearning
- Decision treeTheIt is a learning used as, and maps observations about an item with conclusions about the target value of that item. As a concrete exampleID3,Random forestThere is.
- A technique for discovering interesting relationships between variables in large databases.
- neural network (NN)
- Also known as an artificial neural network (ANN),NerveIt is a learning algorithm born from the viewpoint of imitating the structure and function of the network.Artificial nerveStructure the calculation with interconnectedConnectionismProcess information with computational techniques. Modern neural networksnon-linearIstatisticsOfData modelingIt is a tool. Used to model complex relationships between inputs and outputs, of dataPattern recognitionAnd the unknown between the observed variablesJoint distributionThere are uses such as capturing the statistical structure in.
- Genetic programming (GP)
- Of living things進化ImitatedEvolutionary algorithmIs a technique based on, performing user-defined tasksProgramTo explore.Genetic algorithmIs an extension and specialization of. By the ability to perform a given taskFitness terrainIt is a machine learning technique that determines and thereby optimizes computer programs.
- Use examples, background knowledge, and hypotheses as uniform expressionsLogic programmingIs a technique for regularizing learning using. Encode a set of known background knowledge and examples into a logical database of facts, with all positive examplesIncluding, Generate a hypothetical logic program that does not contain any negative examples.
- Support vector machine (SVM)
- Sort,RegressionA series used forSupervised learningIt is a technique. The label of the training example isBinary classification(Classified into two), build a model with a training algorithm and predict which new example will be classified.
- Clustering distributes the observed examples to a subset called a cluster, and the distribution is performed according to a pre-instructed standard. The results of clustering differ depending on how a hypothesis (standard) is established for the structure of the data. Hypotheses are defined on a "similarity scale" and are evaluated by "internal compactness" (similarity between members within the same cluster) and distance between different clusters. There are also techniques based on "estimated density" and "graph connectivity". ClusteringUnsupervised learningIt ’s a technique,statisticsOften used in data analysis.
- Bayesian network
- Random variableFlock and thoseTheDirected acyclic graph Expressed in (DAG)Probabilistic graphical modelIs. For example, the relationship between illness and symptoms can be expressed stochastically. If you enter the symptoms in the network, you can output a list of possible diseases with probability. With thisinferenceThere is an efficient algorithm for learning.
- Unsupervised learningSome of the algorithms try to find a better representation of the input provided during training. As a classic examplePrincipal component analysis,Cluster analysisThere is. There are also algorithms that transform the input into a more convenient representation before classification or prediction, while retaining the information that the input has. At that time, the input can be reconstructed from the unknown probability distribution that the input data follows, but it is not necessary to faithfully reproduce the incredible example in the probability distribution. For exampleThe algorithm expresses the input dimension by converting it low under some restrictions.In the algorithm, the same expression is converted under the constraint that the input is sparse (many zeros). Of neural networkDeep learningDiscovers multiple levels of representation or hierarchy of features, from low-level extracted features to high-level abstracted features. It is also argued that intelligent machines learn expressions that unravel the potential factors of deviations that explain the observed data..
- Extreme learning machine (ELM)
- It is a feedforward neural network with one or more hidden layers, and can be applied to classification, regression, and clustering.
Machine learning has the following application fields.
2006, online DVD rental companyNetflixOf the companyRecommender systemCompetitions looking for programs that are more than 10% more powerful (more accurately predicting user preferences) Netflix Prize Was held.The competition took several years and the AT & T Labs team called "Pragmatic Chaos."Won the machine learning program in 2009 and won $ 100 million.
|recognition||Image recognition||Face recognition|
|Inspection / inspection|
|voice recognition||Voice input|
|Automatic creation of minutes|
|Call center assistance or alternative|
|Sentence analysis / sentence recognition||Illegal sentence detection|
|Search for similar cases in the past|
|Anomaly detection||Failure detection|
|Suspicious behavior detection|
|analysis(Many forecasts）||Numerical forecast||Demand forecast such as sales|
|Forecast of stock prices and economic indicators|
|Prediction of time required|
|Prediction of deterioration|
|Prediction of event occurrence||Forecast of purchases and cancellations|
|Prediction of compatibility|
|Coping||Behavioral optimization||Inventory optimization|
|Optimizing store openings|
|Q & A automation|
|Expression generation||翻 訳|
Academic journals and international conferences
- Machine Learning(Academic journal)
- Journal of Machine Learning Research(Academic journal)
- Neural Computation(Academic journal)
- International Conference on Machine Learning (ICML) (International Conference)
- Neural Information Processing Systems (NeurIPS formerly known as NIPS) (International Conference)
- ^ Machine learning and pattern recognition "can be viewed as two facets of the same field.": vii
- ^ Because it is often provided by human experts by labeling the training exampleslabelAlso called.
- ^ Typicallyp(x,y)Independently according toDSelect each data ofDThe theorem can be proved regardless of the probability distribution selected from
- ^ "Machine Learning textbook". www.cs.cmu.edu. 2020/5/28Browse.
- ^ (2008) “The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence”, in Epstein, Robert; Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer, Kluwer, pp. 23–66, ISBN 9781402067082
- ^ # bishop2006
- ^ (1998). “Data Mining and Statistics: What's the connection?”. Computing Science and Statistics 29 (1): 3–9.
- ^ Samuel, Arthur (1959). “Some Studies in Machine Learning Using the Game of Checkers”. IBM Journal of Research and Development 3 (3): 210–229. two:10.1147 / rd.33.0210.
- ^ Mitchell, T. (1997). Machine Learning. McGraw Hill. Pp. 2. ISBN 978-0-07-042807-2
- ^ a b #waterfall p.20.
- ^ a b c d e f #ESL p11-12
- ^ #GBC Verse 5.1.3
- ^ #Kanamori p.3.
- ^ #waterfall p.8.
- ^ a b #waterfall p.36.
- ^ #waterfall p.30.
- ^ "Lecture 12: Bias-Variance Tradeoff". CS4780 / CS5780: Machine Learning for Intelligent Systems [FALL 2018]. Cornell University. 2020/11/10Browse.
- ^ #Kanamori p.13.
- ^ #Kanamori p.9.
- ^ a b #ESL p22-23
- ^ #GBC Verse 5.1.3
- ^ a b c d e f #ESL p559-561
- ^ (2006) Pattern Recognition and Machine Learning, Springer, ISBN 978-0-387-31073-2
- ^ Statistical Learning Theory, Takafumi Kanamori, Machine Learning Professional Series, Kodansha, 2015, ISBN 9784061529052
- ^ "Statistical Machine Learning Theory and Boltzmann Machine Learning" Muneki Yasuda. Yamagata University
- ^ Ueda. "Introduction to Statistical Machine Learning" NII. https://www.youtube.com/watch?v=wqb3k22toFY&t=478
- ^ Yoshua Bengio (2009). Learning Deep Architectures for AI. Now Publishers Inc .. p. 1–3. ISBN 978-1-60198-294-0
- ^ British: Pragmatic Chaos
- ^ "BelKor Home Page" research.att.com
- ^ a b c # Motohashi 2018 Near the beginning of Chapter 1.3 "Usage of Artificial Intelligence" and "Three Roles of Artificial Intelligence".
- ^ a b c d e # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 1-4 "Specific Examples of Image Recognition"
- ^ a b c # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 1-5 "Specific Examples of Voice Input"
- ^ a b c # Motohashi 2018 Chapter 1.4 "Specific Examples of Recognition" Figure 1-6 "Specific Examples of Sentence Analysis / Sentence Recognition"
- ^ a b c # Motohashi 2018 Chapter 1.4 “Specific Examples of Recognition” Figure 1-7 “Specific Examples of Anomaly Detection”
- ^ # Motohashi 2018 Chapter 1.5 "What is Analysis?"
- ^ a b c d e # Motohashi 2018 Chapter 1.5 "Specific Examples of Analysis" Figure 1-8 "Specific Examples of Numerical Prediction"
- ^ a b c d # Motohashi 2018 Chapter 1.5 "Specific Examples of Analysis" Figure 1-9 "Specific Examples of Prediction of Event Occurrence"
- ^ a b c d e # Motohashi 2018 Chapter 1.6 “Specific Examples of Coping” Figure 1-10 “Specific Examples of Behavior Optimization”
- ^ a b c # Motohashi 2018 Chapter 1.6 "Specific Examples of Countermeasures" Figure 1-12 "Specific Examples of Specific Work"
- ^ a b c # Motohashi 2018 Chapter 1.6 “Specific Examples of Countermeasures” Figure 1-13 “Specific Examples of Expression Generation”
- ^ British: DataRobot
- ^ DataRobot: https://www.datarobot.com
- Christopher M. Bishop (2006). Pattern Recognition And Machine Learning.Springer-Verlag. ISBN 978-0387310732 (Intermediate and advanced textbooks) →Support page(From here, Chapter 8 "Graphical Models" is available in pdf format)
- Motohashi, Yosuke (2018/2/15). This book that understands artificial intelligence system projects From planning / development to operation / maintenance (AI & TECHNOLOGY)Shoeisha. ASIN B078JMLVR2. ISBN 978-4798154053
- Ian Goodfellow, Yoshua Bengio, Aaron Courville Translation: Hiroo Kurotaki, Shin Kono, Masashi Misono, Jun Hozumi, Naoki Nonaka, Shoji Tomiyama, Takahiro Tsunoda, Supervision: Yusuke Iwasawa, Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo / 2018/8). Deep learning (kindle version).Dwango. ASIN B07GQV1X76
- "Deep Learning An MIT Press book". 2020/10/30Browse.Web version of the original book
- Author: Trevor Hastie, Robert Tibshirani, Jerome Friedman, Translation: Masaru Sugiyama, Tsuyoshi Ide, Toshihiro Kamishima, Takio Kurita, Eisaku Maeda, Yoshihisa Ijiri, Toshiharu Iwata, Takafumi Kanamori, Atsushi Kanemura, Masayuki Karasuyama, Yoshinobu Kawahara, Shogo Kimura, Yoshinori Konishi, Tomoya Sakai, Daiji Suzuki, Ichiro Takeuchi, Toru Tamaki, Daisuke Deguchi, Ryota Tomioka, Hitoshi Habe, Shinichi Maeda, Daichi Mochihashi, Makoto Yamada (2014/6/25). Basics of Statistical Learning-Data Mining, Inference, PredictionKyoritsu Shuppan. ISBN 978-4320123625
- "The Elements of Statistical Learning: Data Mining, Inference, and Prediction.". Stanford University. 2020/11/10Browse.: English version official website of the above books.Free pdf available.
- Masato Taki (2017/10/21). This is an introduction to deep learning.KS Information Science Specialized Book Machine Learning Startup Series. Kodansha. ISBN 978-4061538283
- Takafumi Kanamori (2015/8/8). Statistical learning theory.KS Information Science Specialized Book Machine Learning Startup Series. Kodansha. ISBN 978-4061529052
- Yasuaki Ariga, Shinta Nakayama, Takashi Nishibayashi, "Machine Learning Beginning at Work", January 2018, 1.ISBN 978-4-87311-825-3.
- Thomas Mitchell "Machine Learning" McGraw-Hill (1997) ISBN-978 0071154673 (Introductory textbook) →Support page
- Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman "The Elements of Statistical Learning: Data Mining, Inference, and Prediction" Springer-Verlag (2001) ISBN-978 0387952840 (Including advanced content. Mainly mathematical and statistical methods) →Support page(From here, all chapters are available in pdf format)
- David MacKay "Information Theory, Inference, and Learning Algorithms" (2003) (A textbook that comprehensively covers information theory and machine learning, with a focus on Bayesian inference) →Author page(The full text is available in PDF format here)
- Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern Recognition", 4th Edition, Academic Press, ISBN 978-1-59749-272-0.
- Ethem Alpaydın (2004) Introduction to Machine Learning (Adaptive Computation and Machine Learning), MIT Press, ISBN-0 262-01211-1
- Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. springer, ISBN-3 540-37881-2
- Toby Segaran (2007), Programming Collective Intelligence, O'Reilly, ISBN-0 596-52932-5
- Ray Solomonoff, "An Inductive Inference Machine"A privately circulated report from the 1956 Dartmouth Summer Research Conference on AI.
- Ray Solomonoff, An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 56-62, 1957.
- Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1983), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, ISBN-0 935382-05-4.
- Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1986), Machine Learning: An Artificial Intelligence Approach, Volume II, Morgan Kaufmann, ISBN-0 934613-00-1.
- Yves Kodratoff, Ryszard S. Michalski (1990), Machine Learning: An Artificial Intelligence Approach, Volume III, Morgan Kaufmann, ISBN-1 55860-119-8.
- Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A Multistrategy Approach, Volume IV, Morgan Kaufmann, ISBN-1 55860-251-8.
- Bishop, CM (1995). Neural Networks for Pattern Recognition, Oxford University Press. ISBN-0 19-853864-2.
- Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York, ISBN-0 471-05669-3.
- Huang T.-M., Kecman V., Kopriva I. (2006), Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semi-supervised, and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN-3 540-31681-7.
- KECMAN Vojislav (2001), Learning and Soft Computing, Support Vector Machines, Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge, MA, 608 pp., 268 illus., ISBN-0 262-11255-8.
- Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp., ISBN-978 0123748560.
- Sholom Weiss and Casimir Kulikowski (1991). Computer Systems That Learn, Morgan Kaufmann. ISBN-1 55860-065-5.
- Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.
- Vladimir Vapnik (1998). Statistical Learning Theory. Wiley-Interscience, ISBN-0 471-03003-1.
- Peter Flach, Akimichi Takemura (translated), "Machine Learning-Algorithm Techniques for Reading Data-", Asakura Shoten,ISBN-978 4254122183 (June 2017, 4).
- Nils J. Nilsson, Introduction to Machine Learning.
- , and (2001). The Elements of Statistical Learning, Springer.0-387-95284-5.
- (September 2015), , Basic Books,978-0-465-06570-7
- Ian H. Witten and Eibe Frank (2011). Data Mining: Practical machine learning tools and techniques Morgan Kaufmann, 664pp.,978-0-12-374856-0.
- Ethem Alpaydin (2004). Introduction to Machine Learning, MIT Press,978-0-262-01243-0.
- . Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003.0-521-64298-1
- ,, David G. Stork (2001) Pattern classification (2nd edition), Wiley, New York,0-471-05669-3.
- (1995). Neural Networks for Pattern Recognition, Oxford University Press.0-19-853864-2.
- Stuart Russell & Peter Norvig, (2009). Artificial Intelligence – A Modern Approach. Pearson,9789332543515.
- , An Inductive Inference Machine, IRE Convention Record, Section on Information Theory, Part 2, pp., 56–62, 1957.
- , An Inductive Inference Machine A privately circulated report from the 1956.
- Automatic reasoning
- Computational intelligence
- Computational neuroscience
- Cognitive science
- Cognitive model
- Data Mining
- Pattern recognition
- Kernel method
- Enterprise search
- Curse of dimensionality
- Language acquisition
- Watson (computer)
- Artificial intelligence
- big data
- Machine learning evaluation index
- Institute of Electronics, Information and Communication Engineers Information Theory Learning Theory and Machine Learning (IBISML) Study Group
- Toki no Mori Wiki Wiki about machine learning and data mining
- International Machine Learning Society
- mloss is an academic database of open-source machine learning software.
- Machine Learning Crash Course by Google. This is a free course on machine learning through the use of TensorFlow.