# Pastebin bEq3kpwu 10:37 PM I am pretty sure the implementation of transposed convolution layer has multiple interrelated mistakes 10:37 PM Also, I think one thing can be changed in convolutional layer as well 10:37 PM agreed, I am disappointed by the spam... 10:38 PM rcurtin: Is it happening to all freenode channels? 10:38 PM most that I am in, yes 10:38 PM I wonder if it is happening in #gsoc, I'll join there 10:38 PM Atharva: If you can provide a simple test case with the expected output, I'll take a look, or perhaps you already fixed it? 10:39 PM zoq: I can fix it, I know how. I just thought I will discuss with you first. 10:39 PM Atharva: What's your idea with the conv layer? 10:39 PM Atharva: Oh, nice, yeah sure. 10:39 PM I think I will start with the transposed convolution, because with conv it's just performance improvemnent 10:40 PM So, the output size of transposed conv is being calculated incorrectly 10:40 PM according to this paper https://arxiv.org/pdf/1603.07285.pdf 10:40 PM Atharva: Okay, I think any improvment would have a huge effect on the overall model performance. 10:40 PM Yes 10:41 PM Sorry for being slow, I am a little confused what to start with, there are a lot of things 10:41 PM I will try my best to explain it all 10:42 PM I think Shikhar opened an issue regarding the otuput formula, let's see if I can find it. 10:43 PM zoq: That's great, but the problem is not just with the formula 10:43 PM https://github.com/vdumoulin/conv_arithmetic/issues/18, no response so far 10:44 PM The forward function of Transposed Conv is flipping the filter, that's not needed in this case 10:44 PM It performs full convolution irrespective of stride and padding, which is incorrect 10:46 PM This brings us to the stride, it's true that the effective final stride in transposed conv layer is always 1, but the operation does depend on the stride of the equivalent conv layer 10:46 PM for example 4.5 and 4.6 of the above paper 10:49 PM I see the issue with the full convolution and the stride parameter, but I'm not sure about not flipping the kernel. 10:49 PM I think for trans conv layer, the stride parameter should take s instead of s`. s` is always 1, but we need s(stride of corresponsing conv operation of the given trans conv layer) 10:49 PM We can flip the kernel in the backwars function instead of forward 10:50 PM it seems more apt 10:50 PM It won't matter mathematically I think 10:50 PM yeah, that's the same 10:52 PM Also, we are never inserting zeros between input units when the stride is not 1, section 4.5 and 4.6 of that paper 10:52 PM What we instead do is always perform full convolution operation which is incorrect and extremely inefficient. 10:53 PM For example, in my encoder network, I have a conv layer that goes from 64x64 to 32x32 with s = 2, p = 2, k = 5 10:54 PM I think, inserting zeros is a result of the incorrect dimension, which I think isn#t effecting the output 10:54 PM But, for the transposed equivalent in the decoder, I am forced to use a kernel size of 33 to go from 32x32 to 64x64(a full convolution) 10:54 PM I see 10:55 PM As mentioned in the paper, we are always doing 4.3 which they have said to be extremely inefficient 10:56 PM The transposed conv layer took 30 seconds on mylaptop while the conv layer took about 0.5 10:56 PM We need to correct the output size formula in transposed conv 10:56 PM yeah, especially if you use such a huge kernel size 10:56 PM agreed 10:57 PM Yes, sadly that's the only option right now 10:57 PM we also need to take the stride of the equivalent conv layer instead of actual stride of transposed conv(which is always 1) 10:57 PM right 10:58 PM We need to insert zeros between the input units when it's > 1 10:58 PM Similarly, the backward function is also wrong 10:58 PM It always performs a valid convolution with no padding on the error matrix even when it's needed 10:58 PM yeah, same issues 10:59 PM I think, it will be better to use valid convolution for both forward and backward in transposed conv and take care of padding and zeros in between manually 11:00 PM that would also speed things up 11:00 PM Yes 11:01 PM Also, a minor thing, we can use the correct typenames `ForwardConvolutionRule` and `BackwardConvolutionRule` for the corresponding functions :) 11:02 PM :) 11:02 PM about conv layer now 11:02 PM any corrections are much appreciated 11:03 PM The backward function of conv layer performs a full convolution irrespective of the output(input) size it needs to output which leads to too many unnecessary operations. 11:04 PM for example, let's say a conv layer gets input 5x5 with k 4x4 and padding 2, s =1 11:05 PM it goes to 6x6 11:06 PM in the backward function, what happens is, it pads the 6x6 input error to 12x12 first, then takes it to 9x9 with k(inverted) 4x4 and then only used the centre 5x5 of it 11:06 PM we should instead just pad 6x6 to 8x8 and directly take it to 5x5 11:10 PM interested to see the performance improvement, i think this would mostly effect bigger kernels 11:10 PM Yes, should I change the conv part or just the transposed conv? 11:11 PM I am not sure how better the performance will be in conv layer after that change, it was just somthing I noticed 11:12 PM it should be faster, so if you like to take a look into it, I'm happy to merge this in 11:13 PM zoq: Great! I should open a PR tomorrow if I don't run into any issues 11:13 PM wow, that's fast 11:13 PM thanks! 11:14 PM zoq: Happy to help, I had to clear a lot of concepts to solve this, it was fun! 11:14 PM I think we should discuss some implementation details for transposed conv 11:14 PM 1) we take stride of equivalent conv operation 11:14 PM what I really like is that this will effect all sorts of models including the rl code 11:14 PM 2) change the output formula 11:15 PM zoq: That's great! even the transposed conv? 11:16 PM Right now, only the GAN and VAE code will use the transposed conv operation, but who knows. 11:16 PM Yeah, right 11:16 PM so I will continue 11:17 PM 3) we need to add zeros between input units if s > 1, this we will do in the forward function and not in the naive conv class, is that okay? 11:17 PM so, we will just perform valid conv operation after taking care of the zeros 11:18 PM about 3) fine with me, that way we don't have to touch the naive conv code 11:18 PM Yes 11:19 PM 4) same thing with backward, we don't touch naive conv code 11:19 PM right 11:19 PM We will need to take care of the case when s > 1, because we only have to take alternate points from the output 11:21 PM hm, in this case it would be nice to modify the conv rules, don't you think? 11:21 PM Oh, so that we can take care of it when pointers in the valid conv function? 11:21 PM Yeah, I think it will be more efficient 11:23 PM in this case we would have to modify all conv rules, but we could start with the naive rule 11:23 PM oh, okay. I don't think I know how the other conv rules work, can you suggest me something to read on it 11:23 PM but I agree it should be faster if we do it inside the rule class 11:24 PM the other one is based on fft 11:25 PM don't think there is an easy way to skip the input, as we could do for the naive rule 11:26 PM let's focus on the naive rule, I think it will outperform the fft rule afterwards (for small kernels) 11:26 PM Oh, is it okay if we do it after gsoc? 11:27 PM so, we could remove the code 11:27 PM Yeah, I think 11:27 PM of course 11:28 PM 5) for conv layer backward function, for that change, we would need to manually add padding and use valid conv instead of full conv 11:29 PM sounds reasonable 11:30 PM Btw. everyone should think about the final report and put some work into it. 11:31 PM Yes, will surely do 11:32 PM I was thinking I will create a repo on my account and explain what I did over the summers with links to the PR and some results to show 11:35 PM Yeah, if you like you can also write the report in the form of a blog post, something like: 11:35 PM - http://www.mlpack.org/gsocblog/implementation-of-tree-types-summary.html 11:35 PM - http://www.mlpack.org/gsocblog/deep-reinforcement-learning-methods-summary.html 11:35 PM - http://www.mlpack.org/gsocblog/summary-of-lsh-changes-for-gsoc-2016.html 11:36 PM but that's up to you. As for me the final report is somewhat of a living document, which can be updated even after GSoC has ended, like if we change/merge something afterwards I think the report should reflect that. 11:41 PM Also, I think it's important that the report is visible, so if anyone is interested in what you did over the summer, there is an easy way to find out (the GSoC page will link to the final report).