# Chapter 1

Hello, my name is Kenneth Lee. I run a software startup named Deduction Theory, LLC in the United States. I work with my younger brother Chase Lee.

I am married to an American wife and living in America and Korea. I have 3 children.

I recently organized Deep Learning as I was studying. I questioned myself "How does deep Learning work?" I am not a machine learning majored. So I came from a different perspective. The way I used was logics. There is a way to reveal true and false from logics. That is the concept of an argumentation. So, what is the type of sentence and syntax structure for deep learning and how it looks like in terms of metaphysical logic? I have prepared this lecture following thease reasons.

At the beginning, we wanted to know how deep learning software works, so we read the papers that has already published out on the field. And we downloaded the open source code, executed it, changed the option value, and started experimenting. Maybe other people studying deep learning would have started like that too.

But we did not understand the whole information processing process behind it. So we took the Deep Learning Perceptron code which is an open source and ported it to another language. I ported C++ source to the Julia language. In the process, an individual called Perceptron, we call it an information structure. I tried to make whole process into one story, how that perceptron received information and how it processed information to output.

# Chapter 2

The left side is a single perceptron. To the right is a multilayer Perceptron. The difference between the two is that multilayer Perceptron is used as multiple layers of perceptrons. Deep Learning is also made up with multilayer Perceptron.

Perceptron deliberately distorts it when information comes in.

Look. There is a movie. Some people labeled this movie. One labeling is "Fun movie", which means the movie was fun. Input 2 says, "Boring Movie.” Some people did that labeling.

However, the perceptron begins to distort when this information comes in. I called this "Pathway Perspective" for the information pathway.

This is the name I created to describe the whole story. This is not a recognized opinion in academia. I did not mean to say this because I wanted to look smart for myself. I had to make a name because there was no one in the world to explain it like me. It's a word I made to explain, so please understand.

So "Pathway Perspective 1" is a relatively positive affirmation of incoming information. Positive means giving a relatively high weight value. "Pathway Perspective 2" is relatively negation. This means that the weight value is relatively low. What does these mean? This is distorting about the incoming information. So distorting is the essence of perceptron. That's how Perceptron's basic information processing is done.

So perceptron surely distorts information. This has embarrassed the researchers so far. Because those who have studied so far have tried to reduce the distortion. Those who are studying machine learning before Deep Learning were very embarrassed when they see the perceptron that makes distortion so much because what they do for their work has done a lot of research to reduce distortion. So the theoretical background of perceptron is called the “Black Box Hypothesis”, which means that it is an unknown area without being properly identified.

The black box hypothesis means that we do not know what is happening in there. But now I'm showing you what's going on inside the black box model. It is the world's first solution explains the black box model of the deep learning.

In this picture, Perceptron has two input pathway perspectives. It can have multiple input pathways. By the way, I deliberately made only two input pathways in this model. Because it's easy to explain.

So the information in the perceptron object came in. "Fun Movie" Information comes in "Pathway Perspective 1". Then what happened? If the perceptron affirms relatively and positively about the opinion of the movie, it will strengthen the opinion of “Fun Movie” relatively more and more. In other words, "Fun Movie" opinion is more recognized.

But look at "Pathway Perspective 2" where "Boring Movie" opinion comes in. Information comes in a denial pathway, which weighs the value relatively less. Then, it becomes relatively unreliable, less recognizable. So the conclusion is that the “Fun movie” possibility output is getting higher.

Here's another feature of deep learning. The incoming information was the result information. The word "result information" is what I have created from my research “Deduction Theory”. The result information is the data we say in computer science. It means the result of the information processing process. However in deep learning, the input contained the resulting data, but the output did not. The output is a probability value, not a data. So I named it there. "Probabilistic Result Information." So the “absolute data A” is not the output in deep learning, it comes with “the probability of A” with some percentage. This is a logically huge different feature.

# Chapter 3

In the past, what we call statistics, big data, machine learning, the program has given the way for the researchers to solve the problem. In a nutshell, the researcher becomes an absolute person to the program. The researcher suggests to the program, "Do work in this way." An information structure called a program takes absolute rules from the researcher and extracts the output. The information structure is a computer program.

This was the traditional way and people have been using it for so long. However, this method has the disadvantage that the problem solving ability in the real world has not exceeded a certain level. It has technically the limit of a deterministic worldview. The word determinism mean there is the certain outcome of this world's shape and principle, and the scientist has to figure out its exact figure as a concrete function. This determinism has solved many problems so far in the history. But there were a lot of things they could not solve either. This is what most modern scientists admit.

However, in the deep learning on the right side of the picture, the researcher does not give absolute rules to the program. The researcher is doing labeling and guiding. However, deep learning has significantly different features compared to past methods. Deep Learning runs the process for the information structure program itself to look for rules. Now let me explain how that works.

This is a well-known Perceptron’s diagram and formula.

I will explain how this mathematical expression develops into the story I said.

From now on, I will explain multilayer Perceptron. The input is the same as before. There is one movie, and there are "Fun Movie", "Boring Movie" opinions for the movie. These are opinions. These are labeling information. In fact, even if we look at services such as Netflix, people score on movies, expressing their opinions about movies and expressing their preferences are the results of the labeling I say here. Now the results of the labeling are beginning to get distorted due to the weights given to the pathway perspectives, depending on the path through which the information comes in. That's a very basic way which perceptron processes information.

So let’s give a question. If it does information distortions. How can we get a reliable output when it repeats distortions with multiple layers? Would not it be that the chaos by “information entropy” should be increased? This was the question that people who had studied Perceptron before. This is what I'm going to explain in this lecture. How to solve the problem in this way, and how the interior of the black box model works.

The input came in. When it entered into perceptron, there was a relatively positive pathway, and also a relatively negative pathway. Then in a multi-layered structure, the output of each perceptron is the intermediate result. Relatively fun movie, and relatively boring movie. And then the second-stage output perceptron object is judged again. Positive, negative, and now “Boring Movie” is denied and affirming the “Fun Movie” more, which eventually brings a lot of chance to the “Fun Movie”. It has gathered a lot of weight. So the final probability came out.

We can see that the probability finally came out. However, people are still doubtful about the accuracy of information processing. This was the problem Rosenblat had for decades after he made the first perceptron. He observed the human synapse and implemented it as a simulation. But he did not even know how this would take place in the future. But then, with another extraordinary way to add to this ambiguous, distorted, superficial program, suddenly began to get more accurate.

# Chapter 4

The researcher made the data from the reality and entered the data into the perceptron program. it distorted the data, removed the form from vector calculations, and replicated the "relationships" of the data. I call it "Similar Replication of Information". Because if you do an exact duplication, you have to get the same thing. But perceptron create “similar information” because it is distorted while it processes. So I'm describing that a similar duplication is made. So I say “Similar Replication”.

The perceptron program finds similarities and differences in such replicated information. This is also the concept I created. This is what perceptron works on finding and inferring similarities and differences of information. I call this, "infering information coupling and asymmetry.". It is the information processing characteristic of perceptron that makes the following judgment with the information coupling and asymmetry made. So the final outcome is a stochastic result.

Let's compare this problem solving method with Plato's ideals. Plato used the analogy of "We are looking at the shadows by being tied in a cave." This means we do not know the reality. The truth is veiled. Because we can only see shadows. The shadow I said is the result information. What happened is that we do not know the cause of it. But we can see the data that is the result from reality.

This was the perspective that traditional statistics, big data, and machine learning those accepting result information which are shadows, and data as absolute causes and using it to solve problems. But because the shadow was mistaken as a reality, an error occurred. But if we admit that "shadows are just shadows," we can get special insights. That's the Deduction Theory I claim.

This is the structure of a digital camera. In reality, we collect light information, process it, and make it into data called a digital photo file. I can tell by looking at this, that we make the data through the information processing process, and the data is not real. However, from a deterministic point of view and in the past, traditional academia have tried to make this data to the basis of absolute judgment. That's why the error happened.

Likewise, even in the case of human, we experience reality and use sensory organs and brains to make them result information. The result of information processing in our brain is "cognition, memory, imagination". These are also the result information. So, cognition, memory, imagination are all made by the process. They are not an absolute real.

The difference is that cognition and memory are believed as it happened in reality in our brain, but imagination considered as not reality. This is the same activity as labeling. Depending on how we do labeling, we think of something as memories and something as imagination. That's why people are deceived easily.

Humans are tricked into magic because of the distortion of their senses. And they are persuaded by advertising and agitation because of the distortion of memory and imagination. The stereotypes and cognitive biases that people have are the words that represent these distortion. Yet, at the same time, humans make creative thoughts and judgments. It is a human’s special ability. Humans are not perfect. Human beings do not fully remember what they experience, always distortions and alterations occur. And yet humans make judgments with insight. Now let me explain how this judgment and insight is made in Deep Learning.

# Chapter 5

I explained that the perceptron can give different weights to the pathway perspectives, distorting information in such a way as relative affirmation and negation, and ultimately making probabilistic result information. But how can it make an accurate judgment with this whole process? From now on, I will explain this magical part. There is called “Backpropagation” as a way of deep learning proccedure.

The researcher made this suggestion. "I think this movie is about boring, and I hope that every my perceptrons have will support my opinion." In detail, there is an indication that all perceptrons should adjust their weights from changing their pathway perspectives so that they can make "Boring Movie Probability 60%" comes out.

Before that, "Probability of Fun Movie 80% camed out. But the researcher said, "No, I do not think it's fun, because I think that is boring. So you change.”. This is how backpropagation works. Can you see there is a change in orange in the picture? This is where all of this has been done under the direction of the researcher. How did it change? Pathway perspective 1 was positive at first. However, from “a lot of positive” to “little positive”, it adjusted the weight value. Why did it adjust? It was because instructed to change the output to come out with “Boring Movie” opinion from the researcher. So it found the place where the output was obstructed by becoming “Boring Movie” and changed everything which influenced on that. This is an information processing process that happens in backpropagation.

This is backpropagation to go backwards from the output in this way. So after this, what will be changed? The perceptron program that mimic the researcher's way of thinking are coming.

What this means is that the perceptron program did not know exactly what was happening in the researcher's brain. I showed you Plato's cave before. Here, the part of the information processing process was covered. However, when it perform backpropagation, perceptron similarly restores this obscured part.

Perceptron is based on distortion as I have said so far. Distortion is the basis of its judgment. However, backpropagation means that the deep learning program distort the information process as the way the researcher want. By pointing the direction of the output that the researcher wants, and modifying the pathway perspective weights with backpropagation, so perceptron mimics the researcher's way of thinking, the information processing that we could not see. That's the way we've never been before deep learning. And this is the first time I've described this as a logical process.

# Chapter 6

This time, I'll explain how the XOR processing, which is often mentioned for demonstrating perceptron's ability to solve problems.

It's the first stage in the XOR formula. Input 1 is boring, Input 2 is also boring.

"Perceptron 1" in the "hidden layer" had pathway perspectives that negated the Input 1 and Input 2 slightly. So it expressed as "I don’t know."

Perceptron 2 of the hidden layer has pathway perspectives that slightly affirms the input 1 and input 2. So it expressed as “It’s boring”.

The perceptron in the "Output Layer" was largely reversed for the input of Perceptron 1 and had a slightly positive pathway perspective for Perceptron 2 input. It is this process that says "pathway 1 said it doesn’t know and pathway 2 said it is boring. So I finally express it’s boring.” The logic diagram shows that mathematically the same information processing took place.

It's the second stage of the XOR formula. Input 1 is boring, Input 2 is fun. Since the neural network has already been aligned, the weight of each perceptron remains the same.

Perceptron 1 of the hidden layer expressed a little bit of boring. Perceptron 2 output a bit of fun. The output layer says, "Pathway 1 is about boring, but do not believe it. Conversely, think of it as a little bit fun instead. And pathway 2 said it’s a little bit fun.” So it finally expressed that it’s fun. Check out the logic diagram.

It's the third stage of the XOR formula. Input 1 is fun, Input 2 is boring.

Perceptron 1 of the hidden layer expressed a little bit of boring. Perceptron 2 expressed output as a bit of fun. The perceptron in the output layer expressed fun. Look at the logic diagram.

It's the fourth step of the XOR formula. Input 1 said it’s fun, Input 2 said also it’s fun.

Perceptron 1 of the hidden layer expressed a bit of fun. Perceptron 2 also expressed fun.

The output layer said "I can not believe that the pathway perspective 1 that said it’s fun, I will reverse it hugely. And even if the pathway perspective 2 said it’s fun, it can not win on the first reversed opinion." So it finally expressed it’s boring. Let’s also check the logical diagram.

This XOR logic decision has the same pattern in mathematics. I have expressed this in a language that people can understand.

# Chapter 7

When we logically explain the information processing that takes place in perceptron, we can find a process of information processing that is different from the way we think about it with only mathematical grammar. This is how my Deduction Theory works. The above figure demonstrates how to deduce information structure and relativity information from my theory.

When there is a sentence, try to empty the part corresponding to the result information there. Once you have empty the result information, you can infer relativity information and information structures. You can use this information structure to create another information.

Do you know what the “Five Ws and One H” is? It's a way to classify information into "who, when, where, what, how, why." The result information are simply "who, when, where and what". The relativity information are "how and why".

Then I'll check if this inference method applies equally to the Deep Learning Perceptron. I've removed the input and output from the Perceptrons that contain the result information. Then I summarized how the processing of the information that corresponds to the relativity information takes place. I have confirmed that this method can actually explain the information processing of Perceptron. So that’s why I could do this lecture.

Now I'll finish it out. My Deduction Theory sees Deep Learning Perceptron as an "information structure that can produce information." The way to deduce an information structure in deduction theory is to find out the relativity information that serves to release the result information and produce other information. I was able to solve the Black Box model of Deep Learning. I have proved the theory of black box of Deep Learning as locgics with my research.

This is the end of the video. For more information, please visit my blog and read it. Thank you.

# Related Articles

# Contact