0:02 There are three AI tools that I've been
0:04 really impressed with with literature
0:06 reviews recently, but here's the thing.
0:09 You need to know which ones are
0:11 hallucinating and how often they just
0:14 make up references. You see, when you're
0:15 doing a literature review, you want
0:18 exactly that, literature. So, here are
0:20 the best three AI tools that I've tested
0:23 and how much they actually lie to you.
0:26 So I took three tools, Manis, Gen Spark,
0:29 and Gemini AI from Google. And I wanted
0:31 to know, are they just lying to you? So
0:33 the prompt that I put in for every
0:35 single research tool was this. If we
0:37 head over to Manis to look at it, it
0:39 says here, generate a structured
0:41 literature review on the topic and then
0:43 the topic. that I had. Please include,
0:45 and this is the important bit, the key
0:47 research themes, how they've evolved,
0:49 important studies, conflicting
0:50 viewpoints, gaps in the current
0:52 literature, suggestions for future
0:54 research. And I says, please use
0:56 peer-reviewed sources. Present the
0:57 output in a format suitable for
0:59 inclusion into the introduction of a
1:04 thesis. Add intex citations in I e
1:07 e format even if they are placeholders
1:09 and make the review as long as it needs
1:10 to be to cover the literature
1:14 effectively. So I did this for the three
1:16 things and then if you can blur this
1:19 editor these are the results. Oh let's
1:21 get into it. So the first thing was uh
1:24 Manis was uh the fastest to kick it out.
1:28 It did it in about 3 minutes. So that's
1:30 really good. But did it actually produce
1:32 enough references? And were those
1:34 references completely made up? So let's
1:36 have a look here. These were the things
1:38 that it kicked out. I love Manis because
1:40 it gives you all of the kind of files
1:42 that it used to generate the literature
1:43 review. But this is what we're really
1:46 interested in. Literature review PDF. We
1:47 can open it up here. Let's go full
1:50 screen on that. And you can see that it
1:52 is a 14page document. And we'll go all
1:54 the way down to the bottom. and it
1:57 produced 38 references and I wanted to
1:59 know are those references first of all
2:01 actually sort of like saying what they
2:03 want to say but also are they just made
2:05 up. So overall this literature review is
2:07 good. It contains all of the important
2:09 things that I want to know about
2:11 critical analysis. Um and then it's got
2:13 you know different kind of themes that
2:15 pop up in this kind of research field.
2:17 So it's a really good first start. But
2:20 let's head over to my magical Excel
2:22 document so we can see what has actually
2:25 happened. So I went through every single
2:26 reference and I actually looked on
2:29 Google Scholar to see if it exists. And
2:32 here are the results for Manis IM. So
2:36 here it was 38 references down here. And
2:38 you can see that it did an good enough
2:42 job and it was actually um making up
2:44 references and hallucinating for 16% of
2:47 the time. So, I went through and you can
2:49 see here where I put across this
2:52 reference. It was just sort of like the
2:53 wrong journal and the wrong year, but
2:55 the title was right. And then I went
2:58 down here and it actually repeated two
3:00 references that were repeated up here in
3:03 four and five. And then these ones, it
3:05 just made up completely. Here it doesn't
3:07 exist. Doesn't exist. And here the
3:10 journal year and title was wrong, but it
3:13 was kind of like a plausible one. So,
3:15 let's have a look at 18. And what I did
3:18 for each one is if I couldn't find it
3:20 like just by using Google Scholar, I
3:23 went to the actual um journal where it
3:26 was published. So, uh where are we? This
3:28 one here. Look, morphology control blah
3:30 blah. I was like, hm, I wonder if that
3:32 exists. So, I went to journal of fizz
3:34 chem and I tried to find it. And uh
3:36 yeah, journal material chemistry. I used
3:38 to put in the page number and the year
3:41 and you can see that records not found.
3:43 So, ultimately this is what I did. I
3:45 went in after using Google Scholar if I
3:46 couldn't find it there. I went to the
3:49 actual journal and uh yeah, it just
3:51 didn't exist. So overall, Manis did a
3:54 pretty good job. And then you can see
3:55 after that, I was expecting it to get
3:57 worse as it kind of like went along as
3:59 it was using more and more of its memory
4:01 to hold on to different references. But
4:03 in fact, you know, early on we got this
4:06 one, which was wrong in a certain way.
4:08 These ones repeated, then these three
4:10 were wrong, and I was like, uh-oh, this
4:12 doesn't bode well. But then all the rest
4:15 of them were absolutely fine. So overall
4:18 this gives Manis a 16% hallucination
4:21 rate or just failure rate in some sense.
4:23 Now let's have a look to see how Gen
4:26 Spark did. So Gen Spark is another tool
4:28 that I've been absolutely amazed by for
4:30 all sorts of different academic tasks.
4:32 Let's see if it's lying to us. So I put
4:34 in the exact same prompt as before and
4:37 then I just waited. This one was the
4:40 second to finish and it did it in about
4:43 sort of like 5 to 7 minutes. So overall
4:45 it was still relatively quick and you
4:47 can see it understood the fact I wanted
4:49 I e e e e e e e e e e e e e e e e e e e
4:49 e e e e e e e e e e e e e e e e e e e e
4:49 e e e e e e e e e e e e e e e e e e e e
4:49 e e e e e e e e e e e e e e e e e e e e
4:51 e e e ref referencing and you can see
4:53 that, you know, it did a relatively good
4:55 job at getting all of this. One thing I
4:57 don't like about GenSpark is that it's
5:00 hard to extract it into like a a file.
5:03 At least Manis gave us a PDF. Genspark.
5:05 Um, yeah, it's not really a useful
5:07 format. You can ask it to put it out in
5:10 maybe like a latte or latte, whatever,
5:11 however you want to say it. It's
5:14 completely up to you. Um, but um, yeah,
5:16 it it is a little bit annoying. But if I
5:19 scroll down, you can see it gave us 19
5:22 references, which isn't quite as good as
5:25 uh, Manis, which did it quicker, but Gen
5:28 Spark uh, did uh, lie to us a little bit
5:30 more. Let's check it out. So, here, this
5:32 is what we ended up. GenSpark actually
5:34 ended up lying to us
5:38 26% of the time. And if I go through
5:40 each of the references, you can see yes,
5:41 yes, it was fine. It was fine. It
5:44 existed. Then it just didn't exist. And
5:46 then it just didn't exist. And then it
5:49 just didn't exist. So this one lies more
5:51 because over on Manis, we can see that
5:53 we only had sort of two that didn't
5:54 exist. Then we had ones that was
5:57 repeated. And then here and here, um,
5:59 you know, it uh, it just kind of like
6:01 got it a little bit wrong. there was a
6:05 very similar article but uh not exactly
6:08 the same. So uh yeah, we just need to I
6:10 think be aware of how different tools
6:12 are actually hallucinating. And
6:14 unfortunately, Gen Spark was the worst
6:16 because it literally just didn't exist.
6:18 There wasn't sort of like anything
6:20 similar apart from the last title where
6:22 it was like similar. It was close to
6:24 another peer-reviewed article, but
6:26 really it just yeah, didn't really
6:30 exist. So overall, this gave us a 26%
6:32 um sort of like hallucination or failure
6:35 rate of Gen Spark. So at the moment for
6:37 me, Manis is winning. But there's one
6:39 more tool that I've tested in the past
6:42 that uh is just really great. Check out
6:45 this one. So up till now, we need to
6:46 make sure that every single reference
6:48 that these tools give us, we need to
6:51 double check. Now, I have produced a
6:53 literature review, a really deep,
6:56 detailed literature review with Gemini.
6:57 Here's Gemini. I've paid for the
6:59 advanced one and I used deep research. I
7:02 used exactly the same prompt in this one
7:04 and this is what it gave me. Now, this
7:05 took the longest. This probably took
7:09 about 20 minutes and it is detailed. It
7:13 is deep. It has um lots of tables um you
7:15 know presenting the information in a
7:17 really sort of like useful way. This is
7:19 something that the other AI tools I've
7:21 tested in this video didn't do. They
7:23 just gave us text. Here we've got sort
7:25 of like tables and not just one table,
7:28 two tables. Then we got it split up into
7:30 really useful kind of like um uh
7:33 sections. It really is detailed. And if
7:34 we scroll all the way down here, we can
7:37 see that the references are actually
7:40 there we go are actually links. So, do
7:43 these links go somewhere and do they
7:44 actually sort of like lead to real
7:46 research? That's what I wanted to know.
7:48 So, I actually exported this to docs
7:50 because I did want to see how long it
7:53 was. This is a massive document. It's 61
7:58 pages and 105 references. H, that was a
8:00 long time going through each one, but I
8:02 did it for you because I love you. Um,
8:05 and so you can see that uh, yeah, it's a
8:07 really detailed document. You've got all
8:09 of the different um, things referenced
8:14 here. It's not E though. it is just kind
8:15 of like numbers. So, it didn't really
8:17 understand that brief. And also, you'll
8:19 see in a minute that it didn't stick to
8:21 peer reviewed in a sort of like very
8:24 strict sense. It did include other
8:26 references, but let's talk about that.
8:28 So, over here, um, yeah, you can see
8:31 that it's just a really detailed thing.
8:33 And if, look, let's just cut to the
8:35 chase. If you were wanting to do a
8:36 literature review at the moment and you
8:39 are only interested in sort of like the
8:40 uh finding the literature, getting
8:43 themes, um coming up with a generated
8:45 document, there's no doubt at the moment
8:49 that Gemini AI is just doing so so well
8:51 in that space. But that's not what we're
8:54 here for today. Is it lying to us? I
8:56 went through all of them and let's uh
9:04 column. Boom. Here we are. Gemini was a
9:07 1% hallucination, but it wasn't really
9:08 hallucination. And you'll see in a
9:11 minute. Stay around because uh yeah, it
9:13 went I went through each one and you can
9:15 see that it didn't sort of like strictly
9:18 stick to peer-reviewed papers that were
9:21 published in journals. Um it did include
9:24 thesis. It did include a website. I did
9:26 like this that it did include it was the
9:28 only one where I was like, "Oh, wow.
9:31 They've used 2025 research this year."
9:33 the other ones um they got close but I
9:35 don't think I saw any like really up
9:37 to-date references. I really like that
9:40 for Gemini. And then the one it failed
9:42 down here was just like a book that
9:45 wasn't accessible and that was the sort
9:47 of like issue is that it did take me to
9:50 a like a live web page but I couldn't
9:53 access what it was actually citing. But
9:55 I would take that over a completely
9:58 madeup reference any day. So here you
10:00 can see we got a thesis thesis and then
10:02 down here we have another thesis but
10:06 ultimately it gave us 105 references and
10:10 all of them existed in some like way but
10:12 this one did not exist. Now uh there are
10:15 some drawbacks like I said you know you
10:17 can't extract this into something you
10:20 work with in terms of like uh mendlay or
10:22 endnote. You'd have to sort of like go
10:24 and find each source and put it in to
10:27 your reference manager um manually. But
10:30 look ultimately it is a really great
10:33 introduction and a great literature
10:35 review. Look I love this. I love the
10:37 tables. I love it that you know you can
10:39 see here it's actually referenced. It
10:41 says I E. You're just lying to me a
10:43 little bit there, matey. That's a bit
10:45 cheeky, isn't it? It says I there in
10:48 reference, but it's not even in IIE. Oh
10:50 well, it's okay. It's okay. I'll let you
10:51 get away with that because you're doing
10:54 well in other ways. But ultimately,
10:56 yeah, this is just such a really great
10:57 literature review. It's got all of the
11:00 things. It's detailed. It's deep. it is
11:03 the definition and I think the benchmark
11:06 at the moment for academic um kind of uh
11:09 searching the literature and producing a
11:12 literature review. So really really
11:14 impressive. Give them a go and let me
11:16 know what you think. So overall this is
11:18 where we're at. Gemini there is no
11:20 hallucination. It sort of like got a
11:23 little bit wrong. Um Manis I really
11:25 like. If you're going to use Manis just
11:26 go through and double check. It's got
11:29 about a 16% hallucination or like
11:31 failure rate. I guess I can't strictly
11:33 call it hallucination because it was
11:34 only two titles that really didn't
11:37 exist. The other one kind of got a
11:38 little bit wrong, but there was kind of
11:41 a reference like it. Um, and then it did
11:42 repeat. So overall, this is probably
11:45 less of a, you know, hallucination rate.
11:47 Um, and then we've got Jen Spark, which
11:49 yeah, it really did just make up stuff
11:51 and it didn't give us many references.
11:56 So in order, Gemini wins the crown.
11:59 put the crown on. Manis in the second
12:01 place. Oh, thanks very much. And then
12:03 Jen Spark in third place. Oh, I'm sorry
12:06 I did so badly, but I I tried my best.
12:07 All right, then. That's where we're at
12:09 with those. Uh yeah, give them a go for
12:11 yourself and let me know what you find.
12:13 If you like this video, go check out
12:14 this one where I talk about writing a