# Pastebin bEq3kpwu
10:37 PM <Atharva> I am pretty sure the implementation of transposed convolution layer has multiple interrelated mistakes
10:37 PM <Atharva> Also, I think one thing can be changed in convolutional layer as well
10:37 PM <rcurtin> agreed, I am disappointed by the spam...
10:38 PM <Atharva> rcurtin: Is it happening to all freenode channels?
10:38 PM <rcurtin> most that I am in, yes
10:38 PM <rcurtin> I wonder if it is happening in #gsoc, I'll join there
10:38 PM <zoq> Atharva: If you can provide a simple test case with the expected output, I'll take a look, or perhaps you already fixed it?
10:39 PM <Atharva> zoq: I can fix it, I know how. I just thought I will discuss with you first.
10:39 PM <zoq> Atharva: What's your idea with the conv layer?
10:39 PM <zoq> Atharva: Oh, nice, yeah sure.
10:39 PM <Atharva> I think I will start with the transposed convolution, because with conv it's just performance improvemnent
10:40 PM <Atharva> So, the output size of transposed conv is being calculated incorrectly
10:40 PM <Atharva> according to this paper https://arxiv.org/pdf/1603.07285.pdf
10:40 PM <zoq> Atharva: Okay, I think any improvment would have a huge effect on the overall model performance.
10:40 PM <Atharva> Yes
10:41 PM <Atharva> Sorry for being slow, I am a little confused what to start with, there are a lot of things
10:41 PM <Atharva> I will try my best to explain it all
10:42 PM <zoq> I think Shikhar opened an issue regarding the otuput formula, let's see if I can find it.
10:43 PM <Atharva> zoq: That's great, but the problem is not just with the formula
10:43 PM <zoq> https://github.com/vdumoulin/conv_arithmetic/issues/18, no response so far
10:44 PM <Atharva> The forward function of Transposed Conv is flipping the filter, that's not needed in this case
10:44 PM <Atharva> It performs full convolution irrespective of stride and padding, which is incorrect
10:46 PM <Atharva> This brings us to the stride, it's true that the effective final stride in transposed conv layer is always 1, but the operation does depend on the stride of the equivalent conv layer
10:46 PM <Atharva> for example 4.5 and 4.6 of the above paper
10:49 PM <zoq> I see the issue with the full convolution and the stride parameter, but I'm not sure about not flipping the kernel.
10:49 PM <Atharva> I think for trans conv layer, the stride parameter should take s instead of s`. s` is always 1, but we need s(stride of corresponsing conv operation of the given trans conv layer)
10:49 PM <Atharva> We can flip the kernel in the backwars function instead of forward
10:50 PM <Atharva> it seems more apt
10:50 PM <Atharva> It won't matter mathematically I think
10:50 PM <zoq> yeah, that's the same
10:52 PM <Atharva> Also, we are never inserting zeros between input units when the stride is not 1, section 4.5 and 4.6 of that paper
10:52 PM <Atharva> What we instead do is always perform full convolution operation which is incorrect and extremely inefficient.
10:53 PM <Atharva> For example, in my encoder network, I have a conv layer that goes from 64x64 to 32x32 with s = 2, p = 2, k = 5
10:54 PM <zoq> I think, inserting zeros is a result of the incorrect dimension, which I think isn#t effecting the output
10:54 PM <Atharva> But, for the transposed equivalent in the decoder, I am forced to use a kernel size of 33 to go from 32x32 to 64x64(a full convolution)
10:54 PM <zoq> I see
10:55 PM <Atharva> As mentioned in the paper, we are always doing 4.3 which they have said to be extremely inefficient 
10:56 PM <Atharva> The transposed conv layer took 30 seconds on mylaptop while the conv layer took about 0.5
10:56 PM <Atharva> We need to correct the output size formula in transposed conv
10:56 PM <zoq> yeah, especially if you use such a huge kernel size
10:56 PM <zoq> agreed
10:57 PM <Atharva> Yes, sadly that's the only option right now
10:57 PM <Atharva> we also need to take the stride of the equivalent conv layer instead of actual stride of transposed conv(which is always 1)
10:57 PM <zoq> right
10:58 PM <Atharva> We need to insert zeros between the input units when it's > 1
10:58 PM <Atharva> Similarly, the backward function is also wrong
10:58 PM <Atharva> It always performs a valid convolution with no padding on the error matrix even when it's needed
10:58 PM <zoq> yeah, same issues
10:59 PM <Atharva> I think, it will be better to use valid convolution for both forward and backward in transposed conv and take care of padding and zeros in between manually
11:00 PM <zoq> that would also speed things up
11:00 PM <Atharva> Yes
11:01 PM <Atharva> Also, a minor thing, we can use the correct typenames `ForwardConvolutionRule` and `BackwardConvolutionRule` for the corresponding functions :)
11:02 PM <zoq> :)
11:02 PM <Atharva> about conv layer now
11:02 PM <zoq> any corrections are much appreciated
11:03 PM <Atharva> The backward function of conv layer performs a full convolution irrespective of the output(input) size it needs to output which leads to too many unnecessary operations.
11:04 PM <Atharva> for example, let's say a conv layer gets input 5x5 with k 4x4 and padding 2, s =1
11:05 PM <Atharva> it goes to 6x6
11:06 PM <Atharva> in the backward function, what happens is, it pads the 6x6 input error to 12x12 first, then takes it to 9x9 with k(inverted) 4x4 and then only used the centre 5x5 of it
11:06 PM <Atharva> we should instead just pad 6x6 to 8x8 and directly take it to 5x5
11:10 PM <zoq> interested to see the performance improvement, i think this would mostly effect bigger kernels
11:10 PM <Atharva> Yes, should I change the conv part or just the transposed conv?
11:11 PM <Atharva> I am not sure how better the performance will be in conv layer after that change, it was just somthing I noticed
11:12 PM <zoq> it should be faster, so if you like to take a look into it, I'm happy to merge this in
11:13 PM <Atharva> zoq: Great! I should open a PR tomorrow if I don't run into any issues
11:13 PM <zoq> wow, that's fast
11:13 PM <zoq> thanks!
11:14 PM <Atharva> zoq: Happy to help, I had to clear a lot of concepts to solve this, it was fun!
11:14 PM <Atharva> I think we should discuss some implementation details for transposed conv
11:14 PM <Atharva> 1) we take stride of equivalent conv operation
11:14 PM <zoq> what I really like is that this will effect all sorts of models including the rl code
11:14 PM <Atharva> 2) change the output formula
11:15 PM <Atharva> zoq: That's great! even the transposed conv?
11:16 PM <zoq> Right now, only the GAN and VAE code will use the transposed conv operation, but who knows.
11:16 PM <Atharva> Yeah, right
11:16 PM <Atharva> so I will continue
11:17 PM <Atharva> 3) we need to add zeros between input units if s > 1, this we will do in the forward function and not in the naive conv class, is that okay?
11:17 PM <Atharva> so, we will just perform valid conv operation after taking care of the zeros
11:18 PM <zoq> about 3) fine with me, that way we don't have to touch the naive conv code
11:18 PM <Atharva> Yes
11:19 PM <Atharva> 4) same thing with backward, we don't touch naive conv code
11:19 PM <zoq> right
11:19 PM <Atharva> We will need to take care of the case when s > 1, because we only have to take alternate points from the output
11:21 PM <zoq> hm, in this case it would be nice to modify the conv rules, don't you think?
11:21 PM <Atharva> Oh, so that we can take care of it when pointers in the valid conv function?
11:21 PM <Atharva> Yeah, I think it will be more efficient
11:23 PM <zoq> in this case we would have to modify all conv rules, but we could start with the naive rule
11:23 PM <Atharva> oh, okay. I don't think I know how the other conv rules work, can you suggest me something to read on it
11:23 PM <zoq> but I agree it should be faster if we do it inside the rule class
11:24 PM <zoq> the other one is based on fft
11:25 PM <zoq> don't think there is an easy way to skip the input, as we could do for the naive rule
11:26 PM <zoq> let's focus on the naive rule, I think it will outperform the fft rule afterwards (for small kernels)
11:26 PM <Atharva> Oh, is it okay if we do it after gsoc? 
11:27 PM <zoq> so, we could remove the code
11:27 PM <Atharva> Yeah, I think
11:27 PM <zoq> of course
11:28 PM <Atharva> 5) for conv layer backward function, for that change, we would need to manually add padding and use valid conv instead of full conv
11:29 PM <zoq> sounds reasonable
11:30 PM <zoq> Btw. everyone should think about the final report and put some work into it.
11:31 PM <Atharva> Yes, will surely do
11:32 PM <Atharva> I was thinking I will create a repo on my account and explain what I did over the summers with links to the PR and some results to show
11:35 PM <zoq> Yeah, if you like you can also write the report in the form of a blog post, something like:
11:35 PM <zoq> - http://www.mlpack.org/gsocblog/implementation-of-tree-types-summary.html
11:35 PM <zoq> - http://www.mlpack.org/gsocblog/deep-reinforcement-learning-methods-summary.html
11:35 PM <zoq> - http://www.mlpack.org/gsocblog/summary-of-lsh-changes-for-gsoc-2016.html
11:36 PM <zoq> but that's up to you. As for me the final report is somewhat of a living document, which can be updated even after GSoC has ended, like if we change/merge something afterwards I think the report should reflect that.
11:41 PM <zoq> Also, I think it's important that the report is visible, so if anyone is interested in what you did over the summer, there is an easy way to find out (the GSoC page will link to the final report).